Tags:
internationalization1Add my vote for this tag localization1Add my vote for this tag create new tag
, view all tags

Japanese and Chinese Support

Safe character sets - use EUC-* or UTF-8 only

You can use TWiki with Japanese and Chinese characters, at least in the limited testing I've done, as long as you use EUC-JP, EUC-CN or EUC-TW (or EUC-KR for Korean) - there are some pages linked from http://donkin.org/bin/view/Test/TestTopic5 that demo this (not all updated re use of safe character sets) - or just see the screenshots below.

WARNING: All other East Asian character encodings that I know of, including ISO-2022-*, HZ, Shift-JIS, Big5, GB2312, GBK, UHC, Johab and others, will not work due to their not being 'ASCII safe' - some East Asian characters include ASCII bytes, usually as the second byte, that can be confused with special TWiki characters, and this has occurred in real TWiki usage for Chinese (see SomeChineseCharactersBreakWikiLinks). The only safe East Asian character encodings for TWiki are (1) EUC variants and (2) UTF-8.

  • JapaneseText (EUC-JP charset) - now with an update from a Japanese speaker using Mozilla and Japanese input method
    • NEW - At first sight, TWiki appears to work with ISO-2022-JP (a 7 bit encoding using escape sequences, in which Japanese characters are rendered as ASCII sequences), with only some fixes necessary to support such embedded ASCII characters, which interfere with HTML parsing by the browser (i.e. &, < and >). However, in reality supporting ISO-2022-JP natively would be very, very difficult, due to the need for TWiki formatting and other code to understand these escape sequences. So please don't use ISO-2022-JP as a character set with current TWiki code - ProposedUTF8SupportForI18N should enable this as a browser character set, with storage in UTF-8, but it's best to just use UTF-8 in the browser at that point.
    • NEW - Shift-JIS (a Microsoft charset) will not work in general. You would need to use EUC-JP instead.
  • Chinese sites should use EUC-CN or EUC-TW, or see ProposedUTF8SupportForI18N - UTF-8 is basically usable as a site character set as long as you don't need non-English WikiWords to work (which is fine for all-Chinese sites typically).
    • NEW - TWiki Quick Start - short TWiki guide on STL (Standard Template Libarary) China site (in Chinese) [Note that this site uses the GB2312 character encoding, which is not recommended. Don't use GBK, GB2312 or Big5 for Chinese TWiki sites! --RD]

Test site etc

Not currently available.

You can edit and save the pages if you register on the donkin.org site - please put any test pages in the Test web. You may need to re-set your browser character set on each page, because the page says 'use ISO-8859-1' and overrides your manual browser setting. If you are running your own site in Japanese or Chinese, it is very easy to avoid this - just set the TWiki.cfg file's $siteLocale setting to something like ja_JA.EUC-JP, which specifies the character set.

Now that TWiki (in TWikiRelease01Feb2003) can set browser character sets based on a variable, it would even be possible to embed a TWiki variable defining the required character set within a page, which could cause WikiWord behaviour to change (for Latin/Cyrillic type alphabetic languages) and define the character set in the HTTP headers and the HTML. This might be a pain to manage since cutting and pasting between different charset topics would not work, but some people might find this useful. Not a short term feature but worth thinking about...

This isn't really a new feature, but it's now easier to experiment with various languages given the InternationalisationEnhancements that are being developed. Other languages such as Korean can easily be supported in the same way.

See also CyrillicSupport for Russian, Ukrainian and other Cyrillic languages.

Open issues for Japanese and Chinese TWiki sites

See InternationalisationIssues for general I18N issues (and Google for similar issues to yours). Here are some specific issues found by Japanese and Chinese TWiki sites:

Screenshot

  • Screenshot of TWiki using Japanese:
    japanese-demo.gif

Comments and feedback

NathanOllerenshaw is using TWiki Feb2003 code for Japanese, using UTF-8 as the character set even though this code base isn't meant to support UTF-8 yet - see discussion on InternationalisationUTF8.

-- RichardDonkin - 08 Sep 2003

Some updates above - if you are using TWiki with multi-byte character sets for Chinese, Japanese or Korean, please comment here! I would like to find a few people who are willing to test this support, as I don't read or write these languages.

-- RichardDonkin - 24 Sep 2003

I have just setup TWiki at wiki.oreno.org, and interested in how Japanese support will work with it. I haven't done anything special with the settings, but looks like the default settings works with UTF-8.

-- ChristopherKobayashi - 15 Feb 2004

With the current release of TWiki (including TWikiAlphaRelease), the best character set to use is EUC-JP, which seems to work well. UTF-8 is not the best option since further changes are really required (and underway in ProposedUTF8SupportForI18N) to make it work properly - since you are using Perl 5.6.1 based on your testenv, it would be better to switch to EUC-JP to get all TWiki features working.

For demo sites in both character sets, see:

Let me know how you get on - it's very useful to have any feedback as I don't work with Japanese myself.

-- RichardDonkin - 15 Feb 2004

I set my TWiki.cfg's $siteLocale setting to ja_JA.EUC-JP. Then I created a page using a Japanese Forced Links. The title and URL seem to work fine without any Mojibake, which means the characters become unreadable. Mojibake often occurs when you don't have the correct encoding, or the double byte string becomes corrupted.

-- ChristopherKobayashi - 16 Feb 2004

I live and work in Japan, and have been looking for a good Wiki that supports Japanese. There are several Japanese ones, including a couple built in Ruby, which is interesting to me. Regarding Twiki, I wanted to share a few comments from a dev using Twiki for a Japanese site at a university:

* in twiki/lib/Twiki.cfg, commented out $storeTopicImpl = "RcsWrap";

and uncommented:

#$storeTopicImpl = "RcsLite";

* to fix a bug with a variable counting down to zero depending on the month and then failing when it hits zero, in twiki/lib/TWiki/Store/RcsFile.pm, subroutine _epochToRcsDateTime, changed:

my $rcsDateTime = sprintf "%d.%02d.%02d.%02d.%02d.%02d", ( $year, $mon, $mday, $hour, $min, $sec ); to my $rcsDateTime = sprintf "%d.%02d.%02d.%02d.%02d.%02d", ( $year, $mon+1, $mday, $hour, $min, $sec );

* to fix date updating, in twiki/lib/TWiki/Store.pm, subroutine saveNew, changed: my $dataError = $topicHandler->replaceRevision( $text, $theComment, $user, $date ); to my $dataError = $topicHandler->replaceRevision( $text, $theComment, $user, $epochSec );

* to make Twiki "speak Japanese" in twiki/lib/TWiki.cfg, changed:

$useLocale = 1; to $siteLocale = "ja_JP.EUC-JP";

-- RickCogley - 24 Feb 2004

中文测试页面 -- ChunhuaLiao - 02 Nov 2004

Can I change the form and mark this as MergedToCore against BeijingRelease since it is mentioned in it's release topic: TWikiRelease01Feb2003.

-- SamHasler - 15 Feb 2005

This isn't really a feature, it's an application note that encourages people to use TWiki for Chinese and Japanese. So it's best not to mark as MergedToCore.

-- RichardDonkin - 14 Sep 2005

Chinese, input after setting encoding of browser to UTF-8: 中��

OK, the UTF-8 characters are correctly displayed (rather than in &#...; form) when viewing and editing, as long as the UTF-8 is manually set with the browser since the "content-type" here is always forced to "ISO-8859-1". Set $siteCharsetOverride in TWiki.cfg as "UTF-8".

-- GeorgesKo - 15 Nov 2005

Georges, please post a question in Support providing your TWiki.cfg file, TWiki version and testenv output (as described in SupportGuidelines). It's not clear what your setup is here, but there are people using UTF-8 successfully for Chinese sites with the Sep2004 releases.

EUC-JP should definitely work as it was tested when I18N support was first coded. UTF-8 may require a bit of code tweaking to disable the experimental UTF-8 support - comment out this line in TWiki.pm if you have problems:

    $fullTopicName = Encode::decode("utf8", $fullTopicName);   # 'decode' into UTF-8

InternationalisationEnhancements has a recent link to a Chinese site in the 'I18N sites' section, maybe you could check their setup against yours as well?

-- RichardDonkin - 16 Nov 2005

Please note that Chinese sites should not use GB2312, as mentioned above - this causes problems because it is a two-byte character encoding in which some bytes can also be interpreted as ASCII. So the TWiki code can potentially think it has found some TWiki markup (TWikiML) within a two-byte GB2312 character - it may then insert some HTML at that point, breaking the character. This will only occur quite infrequently, and I don't have any data on which characters would cause this, but reading the GB2312 parts of the CJKV book would help in determining the size of this issue.

EUC-CN or UTF-8 are recommended instead, as mentioned in the above warning.

-- RichardDonkin - 16 Dec 2005

Good TWiki introduction in Japanese, several pages long at http://app.blog.livedoor.jp/sourcewalker/tb.cgi/50610284

I posted note stating that a Japanese transation could make TWiki popular in Japan.

-- PeterThoeny - 21 Apr 2006

Good idea, something might come from that! Btw: The Babel Fish version makes it more accessible to us ignorants that perhaps understand Engrish a bit better than Japanese smile

(Links to the next pages of the review are translated automatically).

-- SteffenPoulsen - 21 Apr 2006

WebForm has problem handling Chinese in UTF-8. Chinese UTF-8 can be stored in text field of WebForm correctly, but it is broken when you try to edit it again.

Is it a solved bug? EditChineseFormTextInUTF8 I'm having the same problem in TWiki 4.1.1

-- ThYang - 06 Feb 2007

This looks like a bug. Could you please file one in Bugs:WebHome?

-- PeterThoeny - 06 Feb 2007

The bug has been reported for TWik 4.0.x.

Please see Bugs:Item2032, and the patch in it works well in TWiki 4.1.1.

-- ThYang - 07 Feb 2007

There is now a JapaneseTranslation of TWiki - thanks are due to Peter for getting this done!

-- RichardDonkin - 06 May 2007

I'm not sure if I can upload the Korean-named file with attachment. When I tried it, TWiki Automatically changed Korean to _(underbar char). Except this, I don't have any problem to use Korean. I use TWiki for internal project and I and my colleagues have many Korean files to upload. Do you have any idea about this?

-- SunahLim - 2011-08-10

TWiki can be configured to work pretty well with Chinese/Japanese/Korean content in pages. At this time there is no support for attachments with Chinese/Japanese/Korean characters in file names. I invite you and your programmer friends to get involved with the community.

-- PeterThoeny - 2011-08-10

This configration works fine with chinese, base on twiki 5.1.4

$TWiki::cfg{UseLocale} = 1; $TWiki::cfg{Site}{Locale} = 'zh-CN.UTF-8'; $TWiki::cfg{Site}{CharSet} = 'UTF-8'; $TWiki::cfg{Site}{Lang} = 'zh-CN'; $TWiki::cfg{Site}{FullLang} = 'zh-CN';

-- Zhibiao Pan - 2013-08-20

Topic attachments
I Attachment History Action Size Date Who Comment
Microsoft Excel Spreadsheetxls #20013#25991#34920#26684.xls r1 manage 13.5 K 2008-01-29 - 06:21 ConanHsia 中文表格
GIFgif japanese-demo.gif r1 manage 12.1 K 2002-12-11 - 16:14 RichardDonkin Screenshot of TWiki using Japanese
Edit | Attach | Watch | Print version | History: r37 < r36 < r35 < r34 < r33 | Backlinks | Raw View | Raw edit | More topic actions
Topic revision: r37 - 2013-08-20 - ZhibiaoPan
 
  • Learn about TWiki  
  • Download TWiki
This site is powered by the TWiki collaboration platform Powered by Perl Hosted by OICcam.com Ideas, requests, problems regarding TWiki? Send feedback. Ask community in the support forum.
Copyright © 1999-2017 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.