Tags:
internationalization1Add my vote for this tag create new tag
view all tags

Question

I have a problem with German Umlaute breaking Wiki Words.

I have read GermanUmlauteBreakWikiWords and I changed the file twiki/trunk/lib/TWiki/Render.pm according to the fix.

I have also read CyrillicTopicNameError and I changed the file twiki/trunk/lib/TWiki.pm accordingly.

Still, German Umlaute break the Wiki Word. My assumption is that this is due to cygwin under windows installation. My cygwin does not know the command 'locale' and my testenv ends with "User Authentication".

When I log in at http://lenz.uni-koblenz.de/twiki/bin/view using WikiGast/gast and change to testenv (greetings to ChristianKohl), the last entry is the "Internationalisation and Locale Setup". I don't get this on my machine. Is this, because the command 'locale' is missing?

If this is the case, I would like to know which cygwin module I need to download. (I checked the http://www.cygwin.com/packages/ but could not find an entry for 'locale')

If you need more information or files, please let me know. I will attach at once.

Environment

TWiki version: TWikiRelease02Sep2004
TWiki plugins: default
Server OS: Windows 2000 server
Web server: Apache 1.3.33
Perl version: 5.8.5-3
Client OS: Windows XP or NT
Web Browser: Internet Explorer
Categories: Installation, Documentation, Internationalisation

-- JuditMays - 23 Feb 2005

Answer

Locales don't work on Windows from within Perl, so you have to do the Windows workaround mentioned in the installation guide under Trouble with I18N - basically, set $localeRegexes to 0 (i.e. don't use locales for regex support) and set the $upperNational and $lowerNational settings to the upper and lower case accented characters that you require.

No new Cygwin modules are needed (though in any case it's Perl that does the locale operations - locale is an admin command to find out what locales exist). Unfortunately, TWiki I18N for Windows is a bit messy because of the need to work around Perl/Windows issues, but the installation docs should be improved.

-- RichardDonkin - 25 Feb 2005

Thank you, Richard. It works perfectly smile

-- JuditMays - 28 Feb 2005

Oh, no. It does not work after all frown Stupid me, I didn't test it properly the first time around. The only thing that's working alright, is the automatical linking of the WikiWithUmlautWord. But if I try to create this new page by clicking the questionmark at the end of the wikiword, the Umlaut is messing everything up. I get to the editing page(with different spelling of the original wikiword), but as soon as I save this, I end up at a page saying: "NOTE: This Wiki topic does not exist yet". And if I click on create, TWiki again offers me a different spelling of the wikiword, and then saving works al right. But this page does not have the right name, so there is no parent to the page, and no way to ever get there again.

By the way, I also checked my testenv again. The last heading is still on "User Authentication" and not "Internationalisation and Locale Setup".

I attached a new copy of my locale setup. according to Richards advice.

Is there anyone with an idea, what else I could check out or try to fix this, please? pretty please??

-- JuditMays - 07 Mar 2005

Can you be a bit more specific about what goes wrong, with test case saying exactly what topic name was originally, then on page creation what you end up with.

Also, check that the browser is set to ISO-8859-15 on viewing a page generated by your TWiki - if not you may need to use the siteCharsetOverride config parameter.

Not sure what else is going wrong as it works OK for me using following setup on Cygwin with Cygwin Perl:

$siteLocale = 'en_US.ISO8859-1';
$siteCharsetOverride =  "" ;
$localeRegexes =  0 ;
$upperNational = 'AAAAÄÅÆÇEÉEEIIIIDÑOOOOÖOUUUÜY';
$lowerNational = 'àáâaäåæçèéêëìíîïdñòóôoöoùúûüy';

One thing to try is de_DE.ISO8859-1 - while this loses the Euro character support, it does work without requiring any extra modules.

However, I have re-tested from IE6 and Firefox 1.0.1 with locale de_DE.ISO-8859-15 as in your setup, and it works fine.

Maybe if you could try my exact setup above? And check that IE is sending UTF-8 URLs, though that should make no difference.

-- RichardDonkin - 19 Mar 2005

I attached two files, describing exactly what spelling my umlaute get turned into (one file for ie and one for firefox). I did check the language encoding in the browser and changed it to 8859-15. But that made no difference. The IE-settings is UTF8 for URL (the box was checked all along), so that should not make a difference.

Next, I will try out your whole setting and let you know about the success.

-- JuditMays - 22 Mar 2005

After changing all variables to the values given above, my new test wiki word BadMünsterEifel is turned into BadMAnsterEifel.
With Firefox, this happens straight away. There is no delay as described in the two attached files. So I'm getting closer to what I want.

But with IE it is still the same strange behaviour. The topic name is changed to BadMA¬nsterEifel though the URL is what it should be: http://somewhere/bin/edit/Sandbox/BadMünsterEifel?topicparent=Sandbox.JuditMaysSandbox. After hitting the save button the URL is changed to http://somewhere/bin/view/Sandbox/BadMAAªnsterEifel. And after this, I get to create a topic called BadMAAAÝnsterEifel with the final URL of http://somewhere/bin/view/Sandbox/BadMAAAAnsterEifel.

I don't understand at all, why with copying the setting of Richards parameters, it still doesn't work for me?
Next I will try what happens, if I change to de_DE.ISO8859-1. I will let you know.

-- JuditMays - 22 Mar 2005

This doesn't help either.

-- JuditMays - 22 Mar 2005

The UTF-8 URL code isn't working for some reason, hence the ever-lengthening URLs. Just wanted to check this is a clean install, and the TWiki.cfg is new as well - is that right?

I suspect that testenv is malfunctioning since it's not showing the locale information. In testenv, please add the line in bold to this part of the file:

# Do locale settings if TWiki.pm was found
my $showLocales = 0;
if ($twikiFound) {
    if( eval 'TWiki::setupLocale()' ){   # Not in older TWiki.pm versions
   # Ignore errors silently
   $showLocales = 1;
    }
}
print "\$showLocales is set to $showLocales<br>";

Then post the updated testenv output HTML here (the output of the above line may be somewhere unexpected).

One other thing to try: turn off UTF-8 URLs in IE, which will tell me whether the UTF-8 code is the root of the problem.

Also, is it possible that TWiki is having permission problems at run-time? Unlikely since the Encode.pm module is being loaded OK by testenv.

You might also want to look at the lib/TWiki.pm module, specifically the convertUtf8URLtoSiteCharset routine, and uncomment the writeDebug calls (see TWikiDebugging), and maybe put in some extra calls around where require Encode is done (should be called given your setup when using ISO-8859-1 - the code for ISO-8859-1 is just above that).

-- RichardDonkin - 24 Mar 2005

My installation is a first Sept-02-2004 installation. And the TWiki.cfg is the one I got with the download. I only made the necessary changes as described in windows install cookbook. Whether this qualifies as a clean install, I don't know. I did have several problems that needed fixings in various places. Most have been solved by the help of the support web (mainly Richard Donkin and Matt Wilkie, Thank You !) and the installation guide.
I'll attach the TWiki.cfg anyway.

The result of my change in the testenv script is $showLocales is set to 0 (as can be seen in the new attached testenv output).

Turning off the UFT8 URL in IE does not solve the saving problem. BadMünsterEifel is turned into BadMnsterEifel, so the umlaut is simply droped out of the WikiWord.

For the permission problems at run time:
How do I know? What I have checked seems to be working all right. (edit, save, lock, unlock, ...) (Except for mailnotify in case of WebChanges, but I haven't tried to solve this yet.)

I haven't checked the lib/TWiki.pm yet. I will comment as soon as I have done so.

-- JuditMays - 29 Mar 2005

I tried some debugging in /lib/TWiki.pm. This is what I found out:
Within the sub convertUtf8URLtoSiteCharset routine the code around require Encode never gets executed because the code in the first elsif (that is: elsif( \$fullTopicName = ~ \$regex{validUtf8StringRegex} ) never turns true. Therefore, require Encode is never used. This happens with Firefox as well as IE (with/without UTF8 enabled). See the debugging.txt file for details.

-- JuditMays - 29 Mar 2005

Actualy, I have to admit, I don't understand this. My testenv says that locale is set to 0. But my debugging says that $siteLocale is de_DE.ISO-8859-1 and $useLocale is 1. Why?

-- JuditMays - 29 Mar 2005

The only place $useLocale gets set is in TWiki.cfg; so you must be setting it. Or your debugging is wrong.

-- CrawfordCurrie - 06 Apr 2005

Sorry Crawford, but what do you mean?
In 20050329_TWiki.cfg the variable is set: $useLocale = 1 . So that would mean my debugging is wrong. But in which way? Did I do the wrong things? What I did is documented in debugging.txt. If I need to do more or different things, could you please be a bit more specific? Thank you.

-- JuditMays - 08 Apr 2005

Ok, here are two other things:

  • in the cygwin bash I cannot use the command locale. So I don't know what locale settings are relevant for and used by cygwin. So, maybe, loading the command from cygwin.org would help?
  • files that should contain Umlaute or other special characters get displayed in a funny way in the cygwin shell. BadMAAªnsterEifel is diplayed as BadMAA?nsterEifel. It seems that the cygwin shell can't handle the character set which I would like to use.

-- JuditMays - 11 Apr 2005

Hi - the $showLocales debug line shows that the call to TWiki::setupLocale() is failing, which needs to be debugged - some suitable writeDebug() calls within that routine should help figure out what's happening. This problem is also likely to be at the root of the I18N problems, as some variables may not be set at all.

As a first step, just set $showLocales to 1 where it is set to 0 (keep a backup copy of testenv of course), so we can see some of the I18N settings and maybe get some error output. Also, try doing the TWiki::setupLocale() call outside the eval (e.g. just after setting $showLocales) to see the error message (will break testenv but useful to know what happens). The code should look like this (first line ensures error messages are shown by CPAN:Carp):

# Do locale settings if TWiki.pm was found
$CGI::Carp::WRAP = $CGI::Carp::WRAP = 1;   
my $showLocales = 1; # Temporary hack
TWiki::setupLocale();
if (...)

Probably the associated setupRegexes routine in TWiki.pm is failing to be called as well, which is why the 'valid UTF8' check fails and hence UTF-8 URL processing doesn't work. Try putting a writeDebug call within this routine too (see TWikiDebugging).

Re your last 2 questions:

  • you don't need the locale command at all - see my explanation above. Locales are useless for TWiki on Windows since they don't work with Perl - the only reason (on Windows) to set the $siteLocale variable is to get a character set that can be sent in the HTTP headers to the browser.
  • Cygwin's support for 8-bit characters is quite poor, I've never managed to get them working in bash... Best to use Putty if you need to log in to other systems using I18N from Cygwin, by the way.

Crawford: setupLocale has vanished in DevelopBranch so testenv really needs some other way of testing that the installed TWiki is I18N capable (Feb 2003 or later) so that it can show I18N settings if they're relevant. Worth fixing this in latest testenv (ImproveTestenv).

-- RichardDonkin - 12 Apr 2005

Thanks, Richard. I added the code fragment into the testenv script and finaly the I18N settings are displayed. (see new attached file)
What does this comment on siteLocale mean: locale is set to 'C' ?

The writeDebug I put into TWiki.pm state something different (in /data/debug.txt):

13 Apr 2005 - 12:14 sub setupLocale firstDebug: $siteLocale is de_DE.ISO-8859-1
13 Apr 2005 - 12:14 sub setupLocale secondDebug: $useLocale is 1
13 Apr 2005 - 12:14 sub setupLocale firstDebug: $siteLocale is de_DE.ISO-8859-1
13 Apr 2005 - 12:14 sub setupLocale secondDebug: $useLocale is 1

I also added a writeDebug line into the sub setupRegexes of TWiki.pm

# 20050413:  added following line; Judit Mays
writeDebug "sub setupRegexes: this routine is called.";
Since there is no according output in debug.txt, the routine actually is not called, just as you assumed already.

Where do I go from here?

BTW: My TWiki is going to be used by a larger community starting from May, and I would happily avoid prohibiting the use of Umlaute in wikiwords. (Especially as some of the users' names contain umlaute)

-- JuditMays - 13 Apr 2005

I seem to have a problem uploading my testenv output. (The error message says something like: "file contains no data", but this is definitly not true.) I will try the upload again later.

-- JuditMays - 13 Apr 2005

ach, never mind the upload. Here's the relevant code:

Internationalisation and Locale Setup

$useLocale: 1
Note: This TWiki.cfg setting controls whether locales are used by Perl and 'grep'.
Warning: Using Perl on Windows, which may have missing or incorrect locales (in Cygwin or ActiveState Perl, respectively) - use of $useLocale = 0 is recommended unless you know your version of Perl has working locale support.
$siteLocale: de_DE.ISO-8859-1
Note: This TWiki.cfg parameter sets the site-wide locale - for example, de_AT.ISO-8859-1 where 'de' is the language code, 'AT' the country code and 'ISO-8859-1' is the character set. Use the locale -a command on your system to determine available locales.
Warning: Unable to set locale to 'de_DE.ISO-8859-1'. The actual locale is 'C' - please test your locale settings. This warning can be ignored if you are not planning to use locales (e.g. your site uses English only) - or you can set $siteLocale to C, which should always work.
$siteCharset: iso-8859-1
Note: This value is derived from the site-wide locale setting. It may have been overridden by $siteCharsetOverride (currently ''). It is used in TWiki's HTML pages and HTTP headers, so it must be acceptable to web browsers even if it is different to the locale-derived setting (e.g. 'euc-jp' instead of 'eucjp')
$upperNational: AAAAÄÅÆÇEÉEEIIIIDÑOOOOÖOUUUÜY
Note: This TWiki.cfg parameter is used when $useLocale is 0, to work around missing or non-working locales. It is also used with Perl 5.005 for efficiency reasons - upgrading to Perl 5.6.1 with working locales is recommended, and removes the need for this. If required, this parameter should be set to the upper case accented characters you require in your locale.
$lowerNational: àáâaäåæçèéêëìíîïdñòóôoöoùúûüyß
Note: This TWiki.cfg parameter is used whenever $upperNational is used. This parameter should be set to the lower case accented characters you require in your locale.

-- JuditMays - 13 Apr 2005

I've recently installed TWiki on a new Windows XP Home Edition machine, and reproduced the minor bug about testenv not showing the I18N settings. The attached patch fixes this, although of course this is not the main problem... I have not been able to reproduce your problem on this machine, since the call to setupRegexes works fine.

Just to be clear - setupRegexes is not called by testenv normally. So your test needs to include a view transaction, then checking to see if the debug.txt line appears.

-- RichardDonkin - 17 Apr 2005

Closing thsi request, please reopen if needed.

-- PeterThoeny - 06 Jun 2005

Hi,

I have the same issue here on my Windows Server 2003 installation, using the latest production release with all I18n-patches applied. Has this been fixed in any of the betas? P.S.: I've changed the status back to AskedQuestions.

-- JoachimBlum - 17 Oct 2005

Joachim - please submit a new request as per SupportGuidelines, including your testenv output as HTML attachment, and most importantly your TWiki.cfg file. Also, have you tried the troubleshooting steps at TWiki.TWikiInstallationGuide#Trouble_with_I18N? There is some specific config required for Windows.

FWIW, I have TWiki I18N working fine on Windows XP using the WindowsInstallCookbook...

-- RichardDonkin - 18 Oct 2005

Richard - thanks for the fast answer. I'm currently trying to figure out what's going wrong. Somehow the $topicNames get messed up during the processing of the edit-query. I don't know yet whether the error comes from the browser or the server or Perl or the script. I've been fiddling around with the scripts almost all day and got the first step to work, so that clicking on TestUmläute brings up the edit-screen with correct spelling of the topic. However, save is not possible because then urls get messed up again. I will try some more ideas tomorrow (can do all that at work because I'm setting up the TWiki there - that's the good part of the story wink ). If I fail to find a solution, I'll post a new request. FWIW: I've installed TWiki on both Linux and Windows 2k before and never ran into such a problem. I'm sure there's a solution somewhere...

-- JoachimBlum - 19 Oct 2005

Seems like EncodeURLsWithUTF8 is not working - if you can log a separate Support question with browser details, TWiki version, etc, it will be a lot easier than trying to guess! You could try turning UTF-8 URL encoding on or off to see if that helps, also see TWikiDebugging to check log files for any errors. Not sure what the error message is that you are talking about. Please do read SupportGuidelines and have a go at submitting as much info as you can.

I assume you are not using UserInterfaceInternationalisation (DakarRelease only).

-- RichardDonkin - 19 Oct 2005

Moved my question to GermanUmlauteOnWindowsServer2003AndWindowsXP.

-- JoachimBlum - 23 Oct 2005

Topic attachments
I Attachment History Action Size Date Who Comment
Unknown file formatcfg 20050329_TWiki.cfg r1 manage 23.9 K 2005-03-29 - 10:12 UnknownUser TWiki config file
HTMLhtm 20050329_testenv_ie.htm r1 manage 13.6 K 2005-03-29 - 09:40 UnknownUser my newest testenv output with IE (UFT8 turned off)
Texttxt NEW_locale_setup.txt r1 manage 0.2 K 2005-03-07 - 09:28 UnknownUser fixed locale settup according to "Trouble with I18N?"
Texttxt debugging.txt r1 manage 7.1 K 2005-03-29 - 16:40 UnknownUser some parts of debug.txt and TWiki.pm
Texttxt firefox_behaviour.txt r1 manage 1.1 K 2005-03-22 - 11:07 UnknownUser what happens with firefox
Texttxt ie_behaviour.txt r1 manage 1.4 K 2005-03-22 - 11:07 UnknownUser what happens with internet explorer
Texttxt my_locale_setup.txt r1 manage 0.2 K 2005-02-23 - 10:55 UnknownUser the settings of my locale variables
Unknown file formatpatch testenv-call-setupLocale.patch r1 manage 1.3 K 2005-04-17 - 09:11 UnknownUser Fix for testenv to always show I18N settings even if setlocale fails
HTMLhtm testenv.htm r1 manage 13.8 K 2005-02-23 - 10:52 UnknownUser my testenv output
Edit | Attach | Watch | Print version | History: r23 < r22 < r21 < r20 < r19 | Backlinks | Raw View | Raw edit | More topic actions
Topic revision: r23 - 2005-10-23 - JoachimBlum
 
  • Learn about TWiki  
  • Download TWiki
This site is powered by the TWiki collaboration platform Powered by Perl Hosted by OICcam.com Ideas, requests, problems regarding TWiki? Send feedback. Ask community in the support forum.
Copyright © 1999-2026 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.