Tags:
create new tag
view all tags

Question

Our site is working fine with mixed language characters in topic content. However, if a user makes, for example, a Korean topic name, that topic's Korean content is corrupted.

You can see this in our sandbox: https://gopedia.gopetslive.com/twiki/bin/view/Sandbox/WebHome

Relevant settings from TWiki.cfg:

$useLocale = 1;
$siteLocale = "ko_KR.utf8";
$siteCharsetOverride = "";
$localeRegexes = 1;

Environment

TWiki version: TWikiRelease04Sep2004
TWiki plugins: DefaultPlugin, EmptyPlugin, InterwikiPlugin
Server OS: Linux Fedora Core 4, kernel 2.6.11
Web server: Apache 2.0.54
Perl version: 5.8.6
Client OS: Windows XP, OS X 10.4
Web Browser: Firefox, IE
Categories: Internationalisation

-- TWikiGuest - 06 Jan 2006

Answer

ALERT! If you answer a question - or someone answered one of your questions - please remember to edit the page and set the status to answered. The status selector is below the edit box.

It looks like the page contents is having the problem, but the page name is OK with Firefox 1.5 and its built-in fonts (i.e. this page.) This is odd, as usually it's the URLs that have the problem and the page contents that are fine. Also, the https://gopedia.gopetslive.com/twiki/bin/view/Sandbox/WebHome page was fine for embedding those characters in UTF-8 as part of the URL, so it seems only that one page that has the problem.

You don't seem to have set the CHARSET parameter in TWikiPreferences at all, but see the TWikiInstallationGuide section on I18N troubleshooting.

It's interesting that this URL using XML entity codes, i.e. https://gopedia.gopetslive.com/twiki/bin/view/Sandbox/고피디아, seems to work - I wouldn't have thought Apache would accept that sort of URL, but somehow it is working. Do you have any additional Apache modules for I18N, e.g. mod_fileiri as mentioned in EncodeURLsWithUTF8?

You might also want to try commenting out the following line in TWiki.pm since it doesn't really help matters.

      $fullTopicName = Encode::decode("utf8", $fullTopicName);   # 'decode' into UTF-8   
I'm on holiday until 16th Jan from tomorrow, and busy thereafter, but ping me by email if this doesn't work.

There are some Chinese sites using UTF-8 successfully, e.g. http://www.pgsqldb.org/, so it may help to check their testenv settings and other setup, or email their administrators.

-- RichardDonkin - 06 Jan 2006

Uncommenting that line in TWiki.pm worked like a charm! Many, many thanks!

-- TWikiGuest - 09 Jan 2006

Interesting - this sounds like a bug in using TWiki with UTF-8 as the $siteCharset. Could you log this as a bug in Codev? This should really be fixed in Dakar since it's a low-impact fix.

-- RichardDonkin - 16 Jan 2006

Posted Bugs:Item1421.

-- PeterThoeny - 17 Jan 2006

Edit | Attach | Watch | Print version | History: r7 < r6 < r5 < r4 < r3 | Backlinks | Raw View | Raw edit | More topic actions
Topic revision: r7 - 2006-01-17 - PeterThoeny
 
  • Learn about TWiki  
  • Download TWiki
This site is powered by the TWiki collaboration platform Powered by Perl Hosted by OICcam.com Ideas, requests, problems regarding TWiki? Send feedback. Ask community in the support forum.
Copyright © 1999-2026 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.