Tags:
create new tag
, view all tags
This started in WikiNamesWithUmlauts.

There are no upper/lowercase characters for Kanji, there is only one case and all words in a sentence are sticked together. That means, unfortunately it is not possible to use autolink topics in Wiki-style.

To make things more complicated, there are 4 character sets in the Japanese language, and all are used in a typical technical article:

  • Kanji: The imported characters from China, about 2000 commonly used characters.
  • Hiragana: Syllable character set "made in Japan", for Japanese words, about 50 characters.
  • Katakana: Syllable character set "made in Japan", for foreign words, about 50 characters.
  • Alphabet and special characters: As you know it, it is the 7 bit ASCII code.

Now it gets even more complicated: The character encoding depends on the platform. Windows and Mac use Shift-JIS, some Unix systems use EUC, others use Shift-JIS. Both are mixed single byte / double byte encodings, the first byte determines if there is a second byte (to form a double byte char) or not (to have a single byte char). Unfortunately there is no web standard for character encoding. Most browsers nowadays support an automatic detection of the encoding (Shift-JIS or EUC)

BTW, string parsing must be done with the proper character encoding. For example if you do a byte scan for the '<' character without the proper encoding, you might find a match in the second byte of a double byte character - ouch.

(In case you wonder, I do speak the language and I used to live and write programs in Japan for many years.)

-- PeterThoeny - 29 Apr 2000


I don't think it's that important to write WikiNames in Japanese. Japanese computer users are quite used to the injustice of having to chose English names for things, such as when an OS doesn't support Japanese filenames or more commonly when a computer language doesn't support Japanese variables.

On the other hand, there are more serious problems besides WikiName detection. The most popular encoding (ISO-2022-JP, or "JIS") uses <>& characters which TWiki will make substitutions to or strip (in the case of search result display), corrupting the text.

Other common encodings are available (Shift-JIS and EUC) which do no use the special HTML characters, however TWiki seems to be corrupting these also for some other reason.

I've had success using Shift-JIS and EUC encodings with the PyWiki clone (very similar to Ward's Wiki), so I recommend it to those wanting to use Japanese text. See http://voght.com/cgi-bin/pywiki-demo?JapaneseText. PikiPiki is another Python clone that has more compact code and possibly fewer bugs than PyWiki, but I haven't tried it yet. For Wiki clone info see http://www.c2.com/cgi/wiki?WikiWikiClones.

An excellent site describing Japanese character encodings and use with HTML is http://www.lfw.org/text/jp.html.

(The file I attached to this page is an example of HTML generated by TWiki that causes IE 5.0 set to Japanese encoding to go into la-la-land for a few minutes.)

-- JohnBelmonte - 30 May 2000


TWiki is just using the standard Perl string manipulations (with one exception). That means, TWiki should handle double byte characters correctly as long as the underlying Perl is multibyte aware. The PyWiki clone is based on Python, and Python seems to be handling multibyte characters better then Perl. Does anybody know if there is a multibyte aware Perl?

The only exception in TWiki is when displaying the start of a topic (in lists of changes and search), as JohnBelmonte pointed out. The current code can break a multibyte character apart:
  $head =~ s/(.{162})([a-zA-Z0-9]*)(.*?)$/$1$2 \.\.\./go;

Regarding topic names, automatic linking is not an option when writing Japanese text, because there are no spaces. A simple link mechanism could be defined for that. Example (spaces are removed to simulate Japanese text) :
  thistexthasan%LINK{"explicitlink"}toanotherpage.

Having text with no spaces has another implication: Text formatting rules like *bold* or _italic_ does not work either. Some other formatting rules need to be defined.

-- PeterThoeny - 30 May 2000


Regarding the text formatting, I don't think per-word italics or bold are a big deal. Written Japanese uses katakana for emphasis. However I've not quite figured out why TWiki requires the surrounding spaces, as Ward's implementation gets along fine without this constraint.

-- JohnBelmonte - 01 Jun 2000


As you know, jperl.pl in default library on Perl5.
Also successer is avaibale. see http://openlab.ring.gr.jp/Jcode/
I don't know whether Japanization patched is required or not

-- MuneSaka - 20 Aug 2000


I found this at http://www.c2.com/cgi/wiki?YukiWiki

> YukiWiki is one of WikiClones.
>
> It is written in Perl5 by HiroshiYuki.
> jcode.pl is used to handle Japanese coding systems (EUC and SJIS).
> More simpler text formatting rules than original Wiki.
> YukiWiki is still under development in http://www.hyuki.com/yukiwiki/
> (Sorry, Japanese only).

Some ideas / code samples of YukiWiki could be used to make TWiki aware of the Japanese locale.

-- PeterThoeny - 29 Aug 2000


>Some ideas / code samples of YukiWiki could be used to make
>TWiki aware of the Japanese locale.

I'm grad to hear that.Some Check you need,I'll help something.

moreover, some wikiclone on Japanese language base can deal with Japanese WikiName. Use branket or some charactors. For Japanese, meny people want to use Japanese Charactor as Wiki Name.

I found at http://todo.org/ (another wiki clone written in Ruby ) They prepared English Wiki-World. Their Rule is also interesting.

-- MuneSaka - 11 Sep 2000

TWiki I18N now supports Japanese characters without any problems that I can see, but I have done only very basic tests. See JapaneseAndChineseSupport for details, including a demo page where you can try it out.

-- RichardDonkin - 10 Dec 2002

TWiki for Japanese sites should only be used with EUC-JP, since ISO-2022-JP and Shift-JIS are not 'ASCII-safe' - however EUC-JP is fine based on the specific research I've done. However, TWiki does work fine for Japanese characters within 'forced links' (e.g. [[Japanese chars here]]. See JapaneseAndChineseSupport for details and a test site.

-- RichardDonkin - 15 Nov 2004

Topic attachments
I Attachment History Action Size Date Who Comment
HTMLhtml bad_page.html   manage 28.5 K 2000-05-22 - 12:43 JohnBelmonte will crash IE 5.0 set to Japanese encoding
Edit | Attach | Watch | Print version | History: r17 < r16 < r15 < r14 < r13 | Backlinks | Raw View | Raw edit | More topic actions
Topic revision: r17 - 2004-11-15 - RichardDonkin
 
  • Learn about TWiki  
  • Download TWiki
This site is powered by the TWiki collaboration platform Powered by Perl Hosted by OICcam.com Ideas, requests, problems regarding TWiki? Send feedback. Ask community in the support forum.
Copyright © 1999-2017 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.