GermanUmlauteOnWindows < Support

Question

Summary: Some versions of CPAN:CGI (CGI.pm) cause problems on Windows with Cygwin Perl, though ActiveState Perl seems to be OK. Not a TWiki bug but we need to document which versions of CGI.pm cause this issue. -- RD

As per request in GermanUmlauteAndWindows, I moved this into a seperate topic.

The following error shows on both Windows Server 2003 and Windows XP. Each installation was done by following the instructions described in the WindowsInstallCookbook. The latest Cygwin/Perl and the latest Apache in the 1.3xx-branch are being used. All necessary Perl-modules as per output of testenv have been installed. UTF-8-encoding of urls has been enabled in all browsers. Settings in TWiki.cfg and output of testenv can be checked in these attached files:

20051023_testenv_1.htm:

TWiki.cfg:

Error: When clicking on a non-existant WikiWord that contains umlauts, the following edit-screen shows other characters than the umlauts. Saving the topic results in a "NOTE: This Wiki topic does not exist yet"-page, again with altered characters. See the following screenshots:

If you click on the highlighted WikiWord...

...you get this on the edit-screen.

Url is displayed correct, though.

After saving, you get this:

I've tried all possible and non-possible combinations of TWiki-configuration-variables to no avail. I also tried to debug the code in TWiki.pm, edit and view, maily the utf-8-implementation. Seemed all to work ok. Finally I've found something in the bug-section of CPAN. It looks like CGI.pm has trouble with utf-8. Read more about that here. Could this be the cause of all the trouble? And what could be a workaround?

Environment

TWiki version:	TWikiRelease04Sep2004
TWiki plugins:	DefaultPlugin, EmptyPlugin, InterwikiPlugin
Server OS:	Windows Server 2003; Windows XP Professional
Web server:	Apache 1.333
Perl version:	5.8.7
Client OS:	Windows XP
Web Browser:	Opera 8.50, Firefox 1.07
Categories:	Internationalisation

-- JoachimBlum - 23 Oct 2005

Answer

If you answer a question - or someone answered one of your questions - please remember to edit the page and set the status to answered. The status selector is below the edit box.

Hello? Suggestions? Anyone?

Heeeelp! wink

-- JoachimBlum - 02 Nov 2005

Thanks for providing a complete set of relevant info. First of all, have you applied the two patches linked from InternationalisationIssues (last bullet in the fixed issues for 01Sep2004 list)? Without that, TWiki I18N is quite broken.

To other CoreTeam members: any chance of a maintenance release to Sep2004 code so that at least basic I18N has a chance of working without patches?

TWiki does its own UTF-8 URL encoding so I'd be surprised if a CGI.pm bug is an issue here. However, recent versions of CGI.pm may have broken things - I use CGI.pm v3.04 OK on Linux, so perhaps you could downgrade to that.

You might want to investigate the Apache request log, and check whether you have any proxies between TWiki and browser (probably not on the XP box!). Also, the suggestions and patches in GermanUmlauteOnWindows will help in debugging what's going on here since you're OK with looking at the Perl code. If the UTF-8 URL decoding support (EncodeURLsWithUTF8) is not working for some reason, it should be evident from suitable writeDebug statements.

One other idea: since you already have the Unicode::MapUTF8 modules installed, you could try tweaking the version check code in the UTF-8 URL encoding routine so that you use this module - just using the Perl 5.6 part of the conversion should be OK.

You could also try using ISO-8859-1 as the character set - would lose the Euro but doesn't require any conversion modules and is a useful debugging step.

-- RichardDonkin - 05 Nov 2005

Good news: Finally I got it to work. Two things did the trick:

Changed the Perl-interpreter from Cygwin 5.8.7 to ActivePerl 5.8.7
~~Patched the $regex{validUtf8StringRegex} in TWiki.pm, line 661 as described below~~

Here's the patch that I applied to $regex{validUtf8StringRegex} in line 661 of TWiki.pm:

Old: qr/^ (?: $regex{validUtf8CharRegex} ) $/x

New: qr/^ (?: $regex{validUtf8CharRegex} )+ $/x (Note the added '+' between ')' and ' $').

The regex wouldn't work in its original form. No utf-8-compliant url would be recognized and so convertUtf8URLtoSiteCharset() would always fail. I don't know regexes good enough to give an answer why this happens. Just a guess: '?' matches exactly 1 or 0 times. So, any string that is longer than 1 character would not be matched. The '+'-modifier on the other hand says "match 1 or more times", so it extends the scope of the regex to the whole string. This was one killer. The other one was Cygwin Perl which doesn't fully support locales. Utf-8-encoded urls would appear false in scripts. For instance, the test case TestUml�ute would appear encoded as TestUml\xC7\xB3ute, which is not valid utf-8. The correct encoding is TestUml\xC3\xA4ute which is exactly what ActivePerl delivers.

Could someone please verify my findings?

Status of this question changed to answered.

-- JoachimBlum - 08 Nov 2005

Not sure why changing from Cygwin would have an effect - perhaps because of the associated CPAN:Encode version, which is what would be used with Perl 5.8.

Please submit a Codev.BugReport that points to this topic - your patches are important and I'd like to get them into DakarRelease.

-- RichardDonkin - 09 Nov 2005

I think it's not CPAN:Encode that has a flaw, I think it's CPAN:CGI, because the false encoded TestUml\xC7\xB3ute comes from the $thePathInfo = $cgi->path_info(); -call in view/edit/.... As soon as I changed to ActivePerl, the url appeared encoded correct.

BugReport has been sent.

-- JoachimBlum - 09 Nov 2005

Did you try upgrading or downgrading the CPAN:CGI module? CGI.pm is pure Perl so you should be able to just copy the ActivePerl CGI.pm into the Cygwin Perl library path (twiki lib directory should work for testing purposes).

Could you provide the CGI.pm versions you use on the two Perl variants? Doing the following should work in both Cygwin bash and Windows cmd.exe (for ActivePerl):

   perl -e "use CGI; print \$CGI::VERSION"

By the way, the regex you showed as 'new' above is the same as the version I have in the 02Sep2004 code - perhaps the '+' was deleted by mistake, but it is included in the TWiki release, and if omitted would have broken all UTF-8 URLs of course.

-- RichardDonkin - 11 Nov 2005

Ok, I have to make an apology here. I was victim of my own desperation. Of course, the regex in the current 04Sep2004 code (which I'm using) is correct. The '+' got deleted by myself when I temporarily changed the regex to match always in order to test convertUtf8URLtoSiteCharset(). When I changed it back, somehow the '+' didn't make it back. Sorry for that.

That leaves us with the error of wrongly encoded urls. CGI.pm versions of both Cygwin and ActiveState are 3.10. I haven't tried up- or downgrading yet, maybe that's an option, although I guess I'll stay with ActiveState anyways because it's got a better performance.

Again, sorry for my dumbness and all the fuss it created.

-- JoachimBlum - 14 Nov 2005

Hi Joachim - easy mistake to make. I try to always copy and comment out lines like that, especially when not using a version control system (you can of course just use RCS for simple version control, just type ci -l to checkin and lock a Perl file).

It would be good to know which CGI.pm version broke things, particularly on the Cygwin side, so if you get time to try an older version or two on Cygwin that would help other people avoid this problem. If you are short of time, perhaps you could just try CGI.pm version 3.04, which I know works on Linux.

For now, I'll link to this from InternationalisationIssues.

-- RichardDonkin - 15 Nov 2005

WebForm
SupportStatus	AnsweredQuestions

Attachments

Topic attachments
I	Attachment	History	Action	Size	Date	Who
htm	20051023_testenv_1.htm	r1	manage	11.1 K	2005-10-23 - 19:15	UnknownUser
jpg	20051023_umlaute_error_1.jpg	r1	manage	74.8 K	2005-10-23 - 19:04	UnknownUser
jpg	20051023_umlaute_error_2.jpg	r1	manage	56.8 K	2005-10-23 - 19:04	UnknownUser
jpg	20051023_umlaute_error_3.jpg	r1	manage	21.9 K	2005-10-23 - 19:05	UnknownUser
jpg	20051023_umlaute_error_4.jpg	r1	manage	113.6 K	2005-10-23 - 19:06	UnknownUser
cfg	TWiki.cfg	r1	manage	23.1 K	2005-10-23 - 19:16	UnknownUser

Topic revision: r10 - 2005-11-15 - RichardDonkin

Account
- Log In
- Register User

Edit
Attach

Ideas, requests, problems regarding TWiki? Send feedback. Ask community in the support forum.
Copyright © 1999-2026 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.