In my project
WebTeach I would like to offer users the choice
of the language in which pages are displayed. This affects the selection of the right template and - maybe - text formatting rules: in all languages but english accented letters are present, and I would like to convert (for instance)
e' to è
Why would you want to do that?
If you're in a non-English community,
you can simply type the accented letter on your keyboard,
and it will be stored as it is.
If you're not in such a community,
you don't need any way to express accented letters.
[Main.JoachimDurchholz - 25 Nov 2001]
Any suggestion for the implementation?
I proceeded in this way: in
WebPreferences, or in the preferences of some web one can set the variable DEFAULTLANGUAGE to be "it" or "es" or "en" etc. Moreover, in its own page each user can set the variable LANGUAGE to be some different extension. Finally, even inside a topic one ca set this variable.
In wiki.pm, inside the function
initialize, after reading all
variables I "getPrefList" from the topic itself
getPrefsList( $wikiUserName ); # user-level
#FB multi-language
#language can be defined inside the document
getPrefsList( "$webName.$topicName" );
#/FB
# some remaining init
$TranslationToken= "\263";
$code="";
At the beginning of the function
internalLink I put:
#FB multi-language
my $defaultLanguage = getPrefsValue("DEFAULTLANGUAGE");
my $language = getPrefsValue("LANGUAGE"); # it should be it, es, etc.
$language = $defaultLanguage unless $language;
$language = "en" unless $language;
$language = lc($language);
if ($language eq "en") {
if( $doPluralToSingular && $page =~ /s$/ && ! topicExists( $web, $page)
# page is a non-existing plural
my $tmp = $page;
$tmp =~ s/ies$/y/; # plurals like policy / policies
$tmp =~ s/sses$/ss/; # plurals like address / addresses
$tmp =~ s/xes$/x/; # plurals like box / boxes
$tmp =~ s/([A-Za-rt-z])s$/$1/; # others, excluding ending ss like addre if( topicExists( $web, $tmp ) ) {
$page = $tmp;
}
}
#/FB
And I replaced the body of
readTemplate with
#FB multi-language
# first try with language ext, then with defaultLanguage ext,
# finally resort to no ext (english)
# to really exploit per-user prefs you need to authenticate
# all users even to view a page
my $defaultLanguage = lc(getPrefsValue("DEFAULTLANGUAGE"));
my $language = lc(getPrefsValue("LANGUAGE")); # it should be it, es, etc # templates are view.tmpl (default), view-it.tmpl (italian) etc.
$language = "-$language" if $language;
$defaultLanguage = "-$defaultLanguage" if $defaultLanguage;
my $lang;
foreach $lang ($language, $defaultLanguage, "") {
my $webtmpl = "$templateDir/$webName/$name.$topic$lang.tmpl";
if( -e $webtmpl ) {
return &readFile( $webtmpl );
}
$webtmpl = "$templateDir/$webName/$name$lang.tmpl";
if( -e $webtmpl ) {
return &readFile( $webtmpl );
}
#AS
# look for template in all path from the web to the templates root dir - my $webPath = "$webName";
foreach (split(/\//, $webPath))
{
$webPath =~ s#(\.*)\/[^\/]*#$1#;
$webtmpl = "$templateDir/$webPath/$name.$topic$lang.tmpl";
if( -e $webtmpl )
{
return &readFile( $webtmpl);
}
$webtmpl = "$templateDir/$webPath/$name$lang.tmpl";
if( -e $webtmpl )
{
return &readFile( $webtmpl);
}
};
#/AS
$webtmpl = "$templateDir/$name.$topic$lang.tmpl";
if( -e $webtmpl ) {
return &readFile( $webtmpl );
}
}
#/FB
(Here I included also
AndreaSterbini's
MultiLevelWikiWebs).
Finally, in wikicfg, in
extendGetRenderedVersionOutsidePRE
#FB multi-language
my $defaultLanguage = getPrefsValue("DEFAULTLANGUAGE");
my $language = getPrefsValue("LANGUAGE"); # it should be it, es, etc.
$language = $defaultLanguage unless $language;
$language = "en" unless $language;
$language = lc($language);
if ($language eq "it") {
# accented letters in italian
# avoid accents with \'
# first replace common apotrophes
s/(\s)po\'/$1po\\\'/go;
#now regular accents
s/a\'/à/go;
s/i\'/ì/go;
s/o\'/ò/go;
s/u\'/ù/go;
s/he\'/hé/go;
s/e\'/è/go;
s/A\'/À/go;
s/I\'/Ì/go;
s/O\'/Ò/go;
s/U\'/Ù/go;
s/HE\'/HÉ/go;
s/E\'/È/go;
#and finally the escaped apostrophes
s/\\\'/\'/go;
}
#/FB
I attach wiki.pm e wikicfg.pm (which contains several other modifications, marked with #FB)
--
FrancoBagnoli - 26 Sep 2000
Very interesting idea for localization. Some thoughs to consider:
- Is it necessary to offer two variables, DEFAULTLANGUAGE and LANGUAGE. How about just LANGUAGE that is defined on site-level and that can be overridden on web-level and user-level?
- What is the reasoning to offer preferences per topic? E.g. your
getPrefsList( "$webName.$topicName" ) addition in initialize. Is it necessary to set the language per topic, as opposed to per web?
- Doing the
$language variable query in function internalLink is expensive because this function is called very frequently. Probably better to do it once at the end of initialize, store it as a global variable, and use it in other places.
- Above code includes AndreaSterbini's MultiLevelWikiWebs extensions. FWIW, I am still not convinced that the added benefit of sub-webs justifies the added complexity (for users and for TWiki programmers).
- Once TWiki is modularized it will be easier to add language dependent custom rendering rules.
--
PeterThoeny - 01 Oct 2000
I was just experimenting a bit with the options, probably most of my experiments will be thrown away.
- You are right, probably just one LANGUAGE variable is sufficient.
- The preference per topic was just an essay, I'm not sure that it is worth for the language. However, I am considering an extension of it, see NewTemplateScheme. Anyhow, you are right to move it to the initialize function.
- I found MultiLevelWikiWebs useful to organize the logical structure of the web, but I'll try to remove it.
--
FrancoBagnoli - 01 Oct 2000
(Refactored from
SeveralIdeas by
MartinCleaver)
It would be good if all english texts were removed from perl code. It's trivial but I don't like changing the code to localize the display (concerns f.g. the
edit link at the bottom). This is peanuts though.
--
MichaelUtech - 14 Nov 2001
I have noticed that
WikiWords with accented letters don't work, not it seems by policy but through explicit characters ranges A-Za-z etc. in TWiki.pm. This means that words like
TechniquesdIngénierie don't work since the hyperlink breaks off at the first accented letter.
Not using accents is ugly but servicable, but it completely breaks when using double-bracketed phrases like [[adéquation de la soution proposée]] since the word generated by TWiki,
AdéquationDeLaSolutionProposée, doesn't work.
(See it here:
adéquation de la soution proposée.)
Note: bracketed phrases are a Good Thing for readability in languages that naturally use lots of prepositions.
Writing all words with
HTML entities would work, but is really painful and (it seems to me) goes against the Wiki spirit that text should be natural to type.
I made a quick hack to use locales in TWiki.pm: all ranges A-Z and a-z are replaced by [:upper:] and [:lower:], respectively, and I added a "use locale; setlocale($wikiLocale);" to TWiki.pm. The variable $wikiLocale is site-wide; it should be web-wide and defined by a variable in
WebPreferences.
How does this fit in the larger issue of localization?
--
DavidSherman - 24 Nov 2001
Searching is also a case where lack of i18n impacts functionality. I filed
CaseInsensitiveSearchInternational when I noticed that a page with the name Östen wouldn't be found by searching for lower case "ö". By the way, ö is not an accented o, it's a completely different character so stripping accents isn't an option (Osten would mean
the cheese).
Since external egrep/fgrep is used and they get their locale settings from the environment of the web server process, I put these lines in TWiki.cfg as a quick fix, and it works.
$ENV{LC_CTYPE}="en_US.ISO8859-15";
$ENV{LC_COLLATE}="en_US.ISO8859-15";
This is related to a plugin I'm working on to allow
WikiWords that look ok in Swedish, since the
StudlyCaps imposed by default
WikiWords makes the text look really horrible in Swedish. This becomes very obvious if you cut and paste text from an internal TWiki page into an email as a response to a customer question. Instead of modifying the character cases throughout the text, the plugin will allow normal_words_like_these to become links. In the rendered version, the underscores are replaced by spaces and the text will cut & paste without any problems and become normal words like these. The link would look like this ->
normal words like these.
One major hurdle in the plugin coding is that the code is sprinkled with regexps using [A-Za-z] which makes it difficult for links like Östens_ärliga_åsikt to make it through TWiki.pm in one piece (and possible other places as well). Site-specific changes to TWiki.pm et al would work, but complicate upgrades and complicate coding and distribution of the funcionality to other users/sites.
So, if the powers that be think that this is a topic to be worked on, I'd be happy to throw in my efforts. Is this the way to do it? I'm pretty new to this, so I don't even know how tightly monitored these pages are by the maintainers of the code.
--
StefanLindmark - 02 Feb 2002
IME, the TWiki developers, including
PeterThoeny, read the pages in the Codev web fairly regularly, so I think you'll see some response within the next week (or less). I have no major interest in internationalization at this point (except that Wikilearn might eventually attract an international audience?!), but I do think the whole
WikiWord thing (
StudlyCaps, etc.) is an impediment to some users and for some actions, and would prefer (I think) a TWiki that did not require
StudlyCaps, allowed spaces in topic names, and rendered topic names with spaces
with spaces.
--
RandyKramer - 03 Feb 2002
For en example of how powerful i18n can be, look at
http://susning.nu
which is a Swedish site with both i18n and l10n that has gone from zero to 3000 pages in just 3 months. This would have been very difficult to accomplish if there would have been problems using åäö. (Not a TWiki site.)
The very same would be true of a corporate intranet. I would never be able to launch a successful site on our network unless plain Swedish with åäö was working in every place it is possible for a user to enter text.
So, how about complementing the locale stuff I talked about above to form the following:
# StefanLindmark (stefanl): Code to enable different character sets
$charset = "en_US.ISO8859-15"; # this is my setting, should probably go into WebPreferences
$ENV{LC_CTYPE}=$charset; # For searching with grep & frinds
$ENV{LC_COLLATE}=$charset; # For sorting
use locale;
use POSIX;
setlocale(LC_CTYPE, $charset); # Enables proper use of e.g. [[:upper:]] instead of [A-Z]
...and then go over the code, replacing
[A-Z] with [[:upper:]]
[a-z] with [[:lower:]]
[A-Za-z] with [[:alpha:]]
[A-Za-z0-9] with [[:alnum:]]
etc...
This will also make sure that the (already used) case-replace function \u in s/// works properly regardless of which character set is currently in use.
The suggested changes, even if not implemented verbatim, will also make the code much more readable. Just look at the following example from TWiki.pm:
# 'Web.TopicName#anchor' link: (with quick-fix i18n for just Swedish)
s/([\s\(])([A-ZÅÄÖÜ]+[a-zåäöü0-9]*)\.([A-ZÅÄÖÜ]+[a-zåäöü]+[A-ZÅÄÖÜ]+[a-zåäöüA-ZÅÄÖÜ0-9]*)(\#[a-zåäöüA-ZÅÄÖÜ0-9_]*)/&internalLink($1,$2,$3,"$TranslationToken$3$4$TranslationToken",$4,1)/geo;
that would become
s/([\s\(])([[:upper:]]+[[:lower:]\d])\.([[:upper:]]+[[:lower:]]+[[:upper:]]+[[:alnum:]]*)(\#[[:alnum:]]*)/&internalLink($1,$2,$3,"$TranslationToken$3$4$TranslationToken",$4,1)/geo;
Which would not only support all possible characters in the charset, but in fact also be more compact code.
--
StefanLindmark - 03 Feb 2002
Unfortunately
[:upper:] etc are only in Perl 5.6 or higher, and since locales are broken on some systems the
use locale can't be done unconditionally, but fortunately there are solutions for these issues

... See
InternationalisationEnhancements for more info and please comment there.
--
RichardDonkin - 26 Nov 2002
Is there any need to localize variable names?
--
SamHasler - 17 Sep 2004