Bug: Utf-8 encoded anchor brokes page rendering
Anchor for a UTF-8-encoded header can be truncated inside a UTF-8 char. This makes
InternetExplorer screw up whole page
Test case
Site charser = utf-8, almost any utf-8 encoded header in page text.
Environment
--
VasilyRedkin - 26 Jul 2006
Impact and Available Solutions
I've developed the following patch. It is not very beauty, but works for me.
--- lib/TWiki/orig/Render.pm 2006-06-25 20:19:11.000000000 +0400
+++ lib/TWiki/Render.pm 2006-07-26 15:14:46.881104037 +0400
@@ -399,7 +399,7 @@
if ( !$compatibilityMode ) {
$anchorName =~ s/^[\s\#\_]*//; # no leading space nor '#', '_'
}
- $anchorName =~ s/^(.{32})(.*)$/$1/; # limit to 32 chars - FIXME: Use Unicode chars before truncate
+ $anchorName =~ s/^(.{32,}?)([\x00-\x7F\xC0-\xFF].*)$/$1/; # limit to 32..37 chars, cut on utf-8 char boundary
if ( !$compatibilityMode ) {
$anchorName =~ s/[\s\_]*$//; # no trailing space, nor '_'
}
--
VasilyRedkin - 26 Jul 2006
Follow up
Thanks Vasily for the report and fix, some people might find this useful. Nevertheless, the
TWikiRelease04Sep2004 is no longer actively maintained.
--
PeterThoeny - 29 Jul 2006
This bug also applies to TWiki 4.x, since the code is the same up to 4.0.4 at least.
I've not yet decrypted the regex to determine that it's correct and it's likely not to work when we turn on Unicode character mode or with other 8-bit character sets (e.g. those that use almost entirely 8-bit-high characters such as KOI-8). Presumably any European 2-byte UTF-8 character would be enough as a test case.
This code should not go in as it is, since it will break with non-UTF-8 character sets. However, it may be useful for people using UTF-8 as their site character set.
This is somewhat like other TOC issues listed at
InternationalisationIssues, which should really be resolved at the same time.
--
RichardDonkin - 31 Jul 2006
I filed
Bugs:Item2711
for TWiki 4.
--
PeterThoeny - 01 Aug 2006
This Bug not fixed in TWiki 4.1.1 !!!
--
AndreyTkachenko - 11 Feb 2007
Tracked now in
Bugs:Item4074
.
--
PeterThoeny - 17 May 2007