Tags:
internationalization1Add my vote for this tag create new tag
view all tags

Bug: Utf-8 encoded anchor brokes page rendering

Anchor for a UTF-8-encoded header can be truncated inside a UTF-8 char. This makes InternetExplorer screw up whole page frown

Test case

Site charser = utf-8, almost any utf-8 encoded header in page text.

Environment

TWiki version: TWikiRelease04Sep2004
TWiki plugins: DefaultPlugin, EmptyPlugin, InterwikiPlugin
Server OS: linux
Web server: apache2
Perl version: 5.8.4
Client OS: win2k
Web Browser: IE5

-- VasilyRedkin - 26 Jul 2006

Impact and Available Solutions

I've developed the following patch. It is not very beauty, but works for me.

--- lib/TWiki/orig/Render.pm    2006-06-25 20:19:11.000000000 +0400
+++ lib/TWiki/Render.pm 2006-07-26 15:14:46.881104037 +0400
@@ -399,7 +399,7 @@
     if ( !$compatibilityMode ) {
         $anchorName =~ s/^[\s\#\_]*//;  # no leading space nor '#', '_'
     }
-    $anchorName =~ s/^(.{32})(.*)$/$1/; # limit to 32 chars - FIXME: Use Unicode chars before truncate
+    $anchorName =~ s/^(.{32,}?)([\x00-\x7F\xC0-\xFF].*)$/$1/; # limit to 32..37 chars, cut on utf-8 char boundary
     if ( !$compatibilityMode ) {
         $anchorName =~ s/[\s\_]*$//;    # no trailing space, nor '_'
     }

-- VasilyRedkin - 26 Jul 2006

Follow up

Thanks Vasily for the report and fix, some people might find this useful. Nevertheless, the TWikiRelease04Sep2004 is no longer actively maintained.

-- PeterThoeny - 29 Jul 2006

This bug also applies to TWiki 4.x, since the code is the same up to 4.0.4 at least.

I've not yet decrypted the regex to determine that it's correct and it's likely not to work when we turn on Unicode character mode or with other 8-bit character sets (e.g. those that use almost entirely 8-bit-high characters such as KOI-8). Presumably any European 2-byte UTF-8 character would be enough as a test case.

This code should not go in as it is, since it will break with non-UTF-8 character sets. However, it may be useful for people using UTF-8 as their site character set.

This is somewhat like other TOC issues listed at InternationalisationIssues, which should really be resolved at the same time.

-- RichardDonkin - 31 Jul 2006

I filed Bugs:Item2711 for TWiki 4.

-- PeterThoeny - 01 Aug 2006

This Bug not fixed in TWiki 4.1.1 !!!

-- AndreyTkachenko - 11 Feb 2007

Tracked now in Bugs:Item4074.

-- PeterThoeny - 17 May 2007

Edit | Attach | Watch | Print version | History: r6 < r5 < r4 < r3 < r2 | Backlinks | Raw View | Raw edit | More topic actions
Topic revision: r6 - 2007-05-17 - PeterThoeny
 
  • Learn about TWiki  
  • Download TWiki
This site is powered by the TWiki collaboration platform Powered by Perl Hosted by OICcam.com Ideas, requests, problems regarding TWiki? Send feedback. Ask community in the support forum.
Copyright © 1999-2026 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.