Question
I have a problem with German Umlaute breaking Wiki Words.
I have read
GermanUmlauteBreakWikiWords and I changed the file twiki/trunk/lib/TWiki/Render.pm according to the fix.
I have also read
CyrillicTopicNameError and I changed the file twiki/trunk/lib/TWiki.pm accordingly.
Still, German Umlaute break the Wiki Word. My assumption is that this is due to cygwin under windows installation. My cygwin does not know the command 'locale' and my testenv ends with "User Authentication".
When I log in at
http://lenz.uni-koblenz.de/twiki/bin/view
using WikiGast/gast and change to testenv (greetings to
ChristianKohl), the last entry is the "Internationalisation and Locale Setup". I don't get this on my machine. Is this, because the command 'locale' is missing?
If this is the case, I would like to know which cygwin module I need to download.
(I checked the
http://www.cygwin.com/packages/
but could not find an entry for 'locale')
If you need more information or files, please let me know. I will attach at once.
Environment
--
JuditMays - 23 Feb 2005
Answer
Locales don't work on Windows from within Perl, so you have to do the Windows workaround mentioned in the installation guide under
Trouble with I18N - basically, set $localeRegexes to 0 (i.e. don't use locales for regex support) and set the $upperNational and $lowerNational settings to the upper and lower case accented characters that you require.
No new Cygwin modules are needed (though in any case it's Perl that does the locale operations -
locale is an admin command to find out what locales exist). Unfortunately, TWiki
I18N for Windows is a bit messy because of the need to work around Perl/Windows issues, but the installation docs should be improved.
--
RichardDonkin - 25 Feb 2005
Thank you, Richard. It works perfectly
--
JuditMays - 28 Feb 2005
Oh, no. It does not work after all

Stupid me, I didn't test it properly the first time around. The only thing that's working alright, is the automatical linking of the WikiWithUmlautWord. But if I try to create this new page by clicking the questionmark at the end of the wikiword, the Umlaut is messing everything up. I get to the editing page(with different spelling of the original wikiword), but as soon as I save this, I end up at a page saying: "NOTE: This Wiki topic does not exist yet". And if I click on create, TWiki again offers me a different spelling of the wikiword, and then saving works al right. But this page does not have the right name, so there is no parent to the page, and no way to ever get there again.
By the way, I also checked my testenv again. The last heading is still on "User Authentication" and not "Internationalisation and Locale Setup".
I attached a new copy of my locale setup. according to Richards advice.
Is there anyone with an idea, what else I could check out or try to fix this, please? pretty please??
--
JuditMays - 07 Mar 2005
Can you be a bit more specific about what goes wrong, with test case saying exactly what topic name was originally, then on page creation what you end up with.
Also, check that the browser is set to ISO-8859-15 on viewing a page generated by your TWiki - if not you may need to use the siteCharsetOverride config parameter.
Not sure what else is going wrong as it works OK for me using following setup on Cygwin with Cygwin Perl:
$siteLocale = 'en_US.ISO8859-1';
$siteCharsetOverride = "" ;
$localeRegexes = 0 ;
$upperNational = 'AAAAÄÅÆÇEÉEEIIIIDÑOOOOÖOUUUÜY';
$lowerNational = 'àáâaäåæçèéêëìíîïdñòóôoöoùúûüy';
One thing to try is
de_DE.ISO8859-1 - while this loses the Euro character support, it does work without requiring any extra modules.
However, I have re-tested from IE6 and Firefox 1.0.1 with locale
de_DE.ISO-8859-15 as in your setup, and it works fine.
Maybe if you could try my exact setup above? And check that IE is sending UTF-8 URLs, though that should make no difference.
--
RichardDonkin - 19 Mar 2005
I attached two files, describing exactly what spelling my umlaute get turned into (one file for ie and one for firefox). I did check the language encoding in the browser and changed it to 8859-15. But that made no difference. The IE-settings is UTF8 for URL (the box was checked all along), so that should not make a difference.
Next, I will try out your whole setting and let you know about the success.
--
JuditMays - 22 Mar 2005
After changing all variables to the values given above, my new test wiki word
BadMünsterEifel is turned into
BadMAnsterEifel.
With Firefox, this happens straight away. There is no delay as described in the two attached files. So I'm getting closer to what I want.
But with IE it is still the same strange behaviour. The topic name is changed to
BadMA¬nsterEifel though the URL is what it should be:
http://somewhere/bin/edit/Sandbox/BadMünsterEifel?topicparent=Sandbox.JuditMaysSandbox. After hitting the save button the URL is changed to
http://somewhere/bin/view/Sandbox/BadMAAªnsterEifel.
And after this, I get to create a topic called
BadMAAAÝnsterEifel with the final URL of
http://somewhere/bin/view/Sandbox/BadMAAAAnsterEifel.
I don't understand at all, why with copying the setting of Richards parameters, it still doesn't work for me?
Next I will try what happens, if I change to
de_DE.ISO8859-1. I will let you know.
--
JuditMays - 22 Mar 2005
This doesn't help either.
--
JuditMays - 22 Mar 2005
The UTF-8 URL code isn't working for some reason, hence the ever-lengthening URLs. Just wanted to check this is a clean install, and the TWiki.cfg is new as well - is that right?
I suspect that
testenv is malfunctioning since it's not showing the locale information. In
testenv, please add the line in bold to this part of the file:
# Do locale settings if TWiki.pm was found
my $showLocales = 0;
if ($twikiFound) {
if( eval 'TWiki::setupLocale()' ){ # Not in older TWiki.pm versions
# Ignore errors silently
$showLocales = 1;
}
}
print "\$showLocales is set to $showLocales<br>";
Then post the updated testenv output HTML here (the output of the above line may be somewhere unexpected).
One other thing to try: turn off UTF-8 URLs in IE, which will tell me whether the UTF-8 code is the root of the problem.
Also, is it possible that TWiki is having permission problems at run-time? Unlikely since the Encode.pm module is being loaded OK by
testenv.
You might also want to look at the
lib/TWiki.pm module, specifically the
convertUtf8URLtoSiteCharset routine, and uncomment the
writeDebug calls (see
TWikiDebugging), and maybe put in some extra calls around where
require Encode is done (should be called given your setup when using ISO-8859-1 - the code for ISO-8859-1 is just above that).
--
RichardDonkin - 24 Mar 2005
My installation is a first Sept-02-2004 installation. And the TWiki.cfg is the one I got with the download. I only made the necessary changes as described in windows install cookbook. Whether this qualifies as a clean install, I don't know. I did have several problems that needed fixings in various places. Most have been solved by the help of the support web (mainly Richard Donkin and Matt Wilkie, Thank You !) and the installation guide.
I'll attach the
TWiki.cfg anyway.
The result of my change in the
testenv script is
$showLocales is set to 0 (as can be seen in the new attached testenv output).
Turning off the UFT8 URL in IE does not solve the saving problem. BadMünsterEifel is turned into BadMnsterEifel, so the umlaut is simply droped out of the WikiWord.
For the permission problems at run time:
How do I know? What I have checked seems to be working all right. (edit, save, lock, unlock, ...)
(Except for
mailnotify in case of WebChanges, but I haven't tried to solve this yet.)
I haven't checked the
lib/TWiki.pm yet. I will comment as soon as I have done so.
--
JuditMays - 29 Mar 2005
I tried some debugging in
/lib/TWiki.pm. This is what I found out:
Within the
sub convertUtf8URLtoSiteCharset routine the code around
require Encode never gets executed because the code in the first
elsif (that is:
elsif( \$fullTopicName = ~ \$regex{validUtf8StringRegex} ) never turns true. Therefore,
require Encode is never used. This happens with Firefox as well as IE (with/without UTF8 enabled). See the
debugging.txt file for details.
--
JuditMays - 29 Mar 2005
Actualy, I have to admit, I don't understand this. My testenv says that
locale is set to 0. But my debugging says that
$siteLocale is de_DE.ISO-8859-1 and
$useLocale is 1. Why?
--
JuditMays - 29 Mar 2005
The
only place $useLocale gets set is in TWiki.cfg; so you must be setting it. Or your debugging is wrong.
--
CrawfordCurrie - 06 Apr 2005
Sorry Crawford, but what do you mean?
In
20050329_TWiki.cfg the variable is set:
$useLocale = 1 . So that would mean my debugging is wrong. But in which way? Did I do the wrong things? What I did is documented in
debugging.txt. If I need to do more or different things, could you please be a bit more specific? Thank you.
--
JuditMays - 08 Apr 2005
Ok, here are two other things:
- in the cygwin bash I cannot use the command
locale. So I don't know what locale settings are relevant for and used by cygwin. So, maybe, loading the command from cygwin.org would help?
- files that should contain Umlaute or other special characters get displayed in a funny way in the cygwin shell.
BadMAAªnsterEifel is diplayed as BadMAA?nsterEifel. It seems that the cygwin shell can't handle the character set which I would like to use.
--
JuditMays - 11 Apr 2005
Hi - the
$showLocales debug line shows that the call to
TWiki::setupLocale() is failing, which needs to be debugged - some suitable
writeDebug() calls within that routine should help figure out what's happening. This problem is also likely to be at the root of the
I18N problems, as some variables may not be set at all.
As a first step, just set $showLocales to 1 where it is set to 0 (keep a backup copy of testenv of course), so we can see some of the
I18N settings and maybe get some error output. Also, try doing the TWiki::setupLocale() call outside the eval (e.g. just after setting $showLocales) to see the error message (will break testenv but useful to know what happens). The code should look like this (first line ensures error messages are shown by
CPAN:Carp
):
# Do locale settings if TWiki.pm was found
$CGI::Carp::WRAP = $CGI::Carp::WRAP = 1;
my $showLocales = 1; # Temporary hack
TWiki::setupLocale();
if (...)
Probably the associated
setupRegexes routine in TWiki.pm is failing to be called as well, which is why the 'valid UTF8' check fails and hence UTF-8 URL processing doesn't work. Try putting a writeDebug call within this routine too (see
TWikiDebugging).
Re your last 2 questions:
- you don't need the
locale command at all - see my explanation above. Locales are useless for TWiki on Windows since they don't work with Perl - the only reason (on Windows) to set the $siteLocale variable is to get a character set that can be sent in the HTTP headers to the browser.
- Cygwin's support for 8-bit characters is quite poor, I've never managed to get them working in bash... Best to use Putty if you need to log in to other systems using I18N from Cygwin, by the way.
Crawford: setupLocale has vanished in
DevelopBranch so testenv really needs some other way of testing that the installed TWiki is
I18N capable (Feb 2003 or later) so that it can show
I18N settings if they're relevant. Worth fixing this in latest testenv (
ImproveTestenv).
--
RichardDonkin - 12 Apr 2005
Thanks, Richard. I added the code fragment into the testenv script and finaly the
I18N settings are displayed. (see new attached file)
What does this comment on
siteLocale mean:
locale is set to 'C' ?
The writeDebug I put into
TWiki.pm state something different (in
/data/debug.txt):
13 Apr 2005 - 12:14 sub setupLocale firstDebug: $siteLocale is de_DE.ISO-8859-1
13 Apr 2005 - 12:14 sub setupLocale secondDebug: $useLocale is 1
13 Apr 2005 - 12:14 sub setupLocale firstDebug: $siteLocale is de_DE.ISO-8859-1
13 Apr 2005 - 12:14 sub setupLocale secondDebug: $useLocale is 1
I also added a writeDebug line into the
sub setupRegexes of
TWiki.pm
# 20050413: added following line; Judit Mays
writeDebug "sub setupRegexes: this routine is called.";
Since there is no according output in debug.txt, the routine actually is not called, just as you assumed already.
Where do I go from here?
BTW: My TWiki is going to be used by a larger community starting from May, and I would happily avoid prohibiting the use of Umlaute in wikiwords. (Especially as some of the users' names contain umlaute)
--
JuditMays - 13 Apr 2005
I seem to have a problem uploading my testenv output. (The error message says something like: "file contains no data", but this is definitly not true.) I will try the upload again later.
--
JuditMays - 13 Apr 2005
ach, never mind the upload. Here's the relevant code:
Internationalisation and Locale Setup
| $useLocale: |
1 |
|
Note: This TWiki.cfg setting controls
whether locales are used by Perl and 'grep'. |
|
Warning: Using Perl on Windows, which
may have missing or incorrect locales (in Cygwin or ActiveState Perl,
respectively) - use of $useLocale = 0 is recommended unless you
know your version of Perl has working locale support. |
| $siteLocale: |
de_DE.ISO-8859-1 |
|
Note: This TWiki.cfg parameter sets
the site-wide locale - for example, de_AT.ISO-8859-1 where 'de' is
the language code, 'AT' the country code and 'ISO-8859-1' is the character
set. Use the locale -a command on your system to determine
available locales. |
|
Warning: Unable to set locale to
'de_DE.ISO-8859-1'. The actual locale is 'C' - please test your locale
settings. This warning can be ignored if you are not planning to use
locales (e.g. your site uses English only) - or you can set
$siteLocale to C, which should always work. |
| $siteCharset: |
iso-8859-1 |
|
Note: This value is derived from the
site-wide locale setting. It may have been overridden by
$siteCharsetOverride (currently ''). It is used in TWiki's HTML pages and
HTTP headers, so it must be acceptable to web browsers even if it is
different to the locale-derived setting (e.g. 'euc-jp' instead of 'eucjp')
|
| $upperNational: |
AAAAÄÅÆÇEÉEEIIIIDÑOOOOÖOUUUÜY |
|
Note: This TWiki.cfg parameter is used
when $useLocale is 0, to work around missing or non-working
locales. It is also used with Perl 5.005 for efficiency reasons -
upgrading to Perl 5.6.1 with working locales is recommended, and removes
the need for this. If required, this parameter should be set to the upper
case accented characters you require in your locale. |
| $lowerNational: |
àáâaäåæçèéêëìíîïdñòóôoöoùúûüyß |
|
Note: This TWiki.cfg parameter is used
whenever $upperNational is used. This parameter should be set to
the lower case accented characters you require in your locale.
|