Tags:
create new tag
, view all tags
In my project WebTeach I would like to offer users the choice of the language in which pages are displayed. This affects the selection of the right template and - maybe - text formatting rules: in all languages but english accented letters are present, and I would like to convert (for instance) e' to è

Why would you want to do that? If you're in a non-English community, you can simply type the accented letter on your keyboard, and it will be stored as it is. If you're not in such a community, you don't need any way to express accented letters. [Main.JoachimDurchholz - 25 Nov 2001]

Any suggestion for the implementation?

I proceeded in this way: in WebPreferences, or in the preferences of some web one can set the variable DEFAULTLANGUAGE to be "it" or "es" or "en" etc. Moreover, in its own page each user can set the variable LANGUAGE to be some different extension. Finally, even inside a topic one ca set this variable.

In wiki.pm, inside the function initialize, after reading all variables I "getPrefList" from the topic itself


    getPrefsList( $wikiUserName );                       # user-level
#FB multi-language
#language can be defined inside the document
    getPrefsList( "$webName.$topicName" );
#/FB
    # some remaining init
    $TranslationToken= "\263";
    $code="";

At the beginning of the function internalLink I put:

#FB multi-language
        my $defaultLanguage = getPrefsValue("DEFAULTLANGUAGE");
        my $language = getPrefsValue("LANGUAGE"); # it should be it, es, etc.
        $language = $defaultLanguage unless $language;
        $language = "en" unless $language;
        $language = lc($language);
        if ($language eq "en") {
        if( $doPluralToSingular && $page =~ /s$/ && ! topicExists( $web, $page)
         # page is a non-existing plural
         my $tmp = $page;
         $tmp =~ s/ies$/y/;      # plurals like policy / policies
         $tmp =~ s/sses$/ss/;    # plurals like address / addresses
         $tmp =~ s/xes$/x/;      # plurals like box / boxes
         $tmp =~ s/([A-Za-rt-z])s$/$1/; # others, excluding ending ss like addre         if( topicExists( $web, $tmp ) ) {
             $page = $tmp;
         }
         }
#/FB             

And I replaced the body of readTemplate with

#FB multi-language
# first try with language ext, then with defaultLanguage ext,
# finally resort to no ext (english)
# to really exploit per-user prefs you need to authenticate
# all users even to view a page
      my $defaultLanguage = lc(getPrefsValue("DEFAULTLANGUAGE"));
        my $language = lc(getPrefsValue("LANGUAGE")); # it should be it, es, etc        # templates are view.tmpl (default), view-it.tmpl (italian) etc.
        $language = "-$language" if $language;
        $defaultLanguage = "-$defaultLanguage" if $defaultLanguage;
        my $lang;
        foreach $lang ($language, $defaultLanguage, "") {
        my $webtmpl = "$templateDir/$webName/$name.$topic$lang.tmpl";
        if( -e $webtmpl ) {
            return &readFile( $webtmpl );
        }
        $webtmpl = "$templateDir/$webName/$name$lang.tmpl";
        if( -e $webtmpl ) {
            return &readFile( $webtmpl );          
        }
#AS
        # look for template in all path from the web to the templates root dir -        my $webPath = "$webName";
        foreach (split(/\//, $webPath))
        {
         $webPath =~ s#(\.*)\/[^\/]*#$1#;
     $webtmpl = "$templateDir/$webPath/$name.$topic$lang.tmpl";
         if( -e $webtmpl )
         {
             return &readFile( $webtmpl);
         }
     $webtmpl = "$templateDir/$webPath/$name$lang.tmpl";
         if( -e $webtmpl )
         {
           return &readFile( $webtmpl);
         }
        };
  #/AS
        $webtmpl = "$templateDir/$name.$topic$lang.tmpl";
        if( -e $webtmpl ) {
            return &readFile( $webtmpl );
        }
    }
#/FB            

(Here I included also AndreaSterbini's MultiLevelWikiWebs).

Finally, in wikicfg, in extendGetRenderedVersionOutsidePRE

#FB multi-language
        my $defaultLanguage = getPrefsValue("DEFAULTLANGUAGE");
        my $language = getPrefsValue("LANGUAGE"); # it should be it, es, etc.
        $language = $defaultLanguage unless $language;
        $language = "en" unless $language;
        $language = lc($language);
        if ($language eq "it") {
 
            # accented letters in italian
            # avoid accents with \'
            # first replace common apotrophes
                    s/(\s)po\'/$1po\\\'/go;
            #now regular accents
                    s/a\'/à/go;
                    s/i\'/ì/go;
                    s/o\'/ò/go;
                    s/u\'/ù/go;
                    s/he\'/hé/go;
                    s/e\'/è/go;
                      s/A\'/À/go;
                    s/I\'/Ì/go;
                    s/O\'/Ò/go;
                    s/U\'/Ù/go;
                    s/HE\'/HÉ/go;
                    s/E\'/È/go;
            #and finally the escaped apostrophes
                    s/\\\'/\'/go;
        }
 
#/FB    

I attach wiki.pm e wikicfg.pm (which contains several other modifications, marked with #FB)

-- FrancoBagnoli - 26 Sep 2000

Very interesting idea for localization. Some thoughs to consider:

  • Is it necessary to offer two variables, DEFAULTLANGUAGE and LANGUAGE. How about just LANGUAGE that is defined on site-level and that can be overridden on web-level and user-level?
  • What is the reasoning to offer preferences per topic? E.g. your getPrefsList( "$webName.$topicName" ) addition in initialize. Is it necessary to set the language per topic, as opposed to per web?
  • Doing the $language variable query in function internalLink is expensive because this function is called very frequently. Probably better to do it once at the end of initialize, store it as a global variable, and use it in other places.
  • Above code includes AndreaSterbini's MultiLevelWikiWebs extensions. FWIW, I am still not convinced that the added benefit of sub-webs justifies the added complexity (for users and for TWiki programmers).
  • Once TWiki is modularized it will be easier to add language dependent custom rendering rules.

-- PeterThoeny - 01 Oct 2000

I was just experimenting a bit with the options, probably most of my experiments will be thrown away.

  • You are right, probably just one LANGUAGE variable is sufficient.
  • The preference per topic was just an essay, I'm not sure that it is worth for the language. However, I am considering an extension of it, see NewTemplateScheme. Anyhow, you are right to move it to the initialize function.
  • I found MultiLevelWikiWebs useful to organize the logical structure of the web, but I'll try to remove it.

-- FrancoBagnoli - 01 Oct 2000

(Refactored from SeveralIdeas by MartinCleaver)

It would be good if all english texts were removed from perl code. It's trivial but I don't like changing the code to localize the display (concerns f.g. the edit link at the bottom). This is peanuts though.

-- MichaelUtech - 14 Nov 2001

I have noticed that WikiWords with accented letters don't work, not it seems by policy but through explicit characters ranges A-Za-z etc. in TWiki.pm. This means that words like TechniquesdIngnierie don't work since the hyperlink breaks off at the first accented letter.

Not using accents is ugly but servicable, but it completely breaks when using double-bracketed phrases like [[adquation de la soution propose]] since the word generated by TWiki, AdquationDeLaSolutionPropose, doesn't work. (See it here: adquation de la soution propose.)

Note: bracketed phrases are a Good Thing for readability in languages that naturally use lots of prepositions.

Writing all words with HTML entities would work, but is really painful and (it seems to me) goes against the Wiki spirit that text should be natural to type.

I made a quick hack to use locales in TWiki.pm: all ranges A-Z and a-z are replaced by [:upper:] and [:lower:], respectively, and I added a "use locale; setlocale($wikiLocale);" to TWiki.pm. The variable $wikiLocale is site-wide; it should be web-wide and defined by a variable in WebPreferences.

How does this fit in the larger issue of localization?

-- DavidSherman - 24 Nov 2001

Searching is also a case where lack of i18n impacts functionality. I filed CaseInsensitiveSearchInternational when I noticed that a page with the name sten wouldn't be found by searching for lower case "". By the way, is not an accented o, it's a completely different character so stripping accents isn't an option (Osten would mean the cheese).

Since external egrep/fgrep is used and they get their locale settings from the environment of the web server process, I put these lines in TWiki.cfg as a quick fix, and it works.

$ENV{LC_CTYPE}="en_US.ISO8859-15";
$ENV{LC_COLLATE}="en_US.ISO8859-15";

This is related to a plugin I'm working on to allow WikiWords that look ok in Swedish, since the StudlyCaps imposed by default WikiWords makes the text look really horrible in Swedish. This becomes very obvious if you cut and paste text from an internal TWiki page into an email as a response to a customer question. Instead of modifying the character cases throughout the text, the plugin will allow normal_words_like_these to become links. In the rendered version, the underscores are replaced by spaces and the text will cut & paste without any problems and become normal words like these. The link would look like this -> normal words like these.

One major hurdle in the plugin coding is that the code is sprinkled with regexps using [A-Za-z] which makes it difficult for links like stens_rliga_sikt to make it through TWiki.pm in one piece (and possible other places as well). Site-specific changes to TWiki.pm et al would work, but complicate upgrades and complicate coding and distribution of the funcionality to other users/sites.

So, if the powers that be think that this is a topic to be worked on, I'd be happy to throw in my efforts. Is this the way to do it? I'm pretty new to this, so I don't even know how tightly monitored these pages are by the maintainers of the code.

-- StefanLindmark - 02 Feb 2002

IME, the TWiki developers, including PeterThoeny, read the pages in the Codev web fairly regularly, so I think you'll see some response within the next week (or less). I have no major interest in internationalization at this point (except that Wikilearn might eventually attract an international audience?!), but I do think the whole WikiWord thing (StudlyCaps, etc.) is an impediment to some users and for some actions, and would prefer (I think) a TWiki that did not require StudlyCaps, allowed spaces in topic names, and rendered topic names with spaces with spaces.

-- RandyKramer - 03 Feb 2002

For en example of how powerful i18n can be, look at http://susning.nu which is a Swedish site with both i18n and l10n that has gone from zero to 3000 pages in just 3 months. This would have been very difficult to accomplish if there would have been problems using . (Not a TWiki site.)

The very same would be true of a corporate intranet. I would never be able to launch a successful site on our network unless plain Swedish with was working in every place it is possible for a user to enter text.

So, how about complementing the locale stuff I talked about above to form the following:

    # StefanLindmark (stefanl): Code to enable different character sets
    $charset = "en_US.ISO8859-15"; # this is my setting, should probably go into WebPreferences
    $ENV{LC_CTYPE}=$charset; # For searching with grep & frinds
    $ENV{LC_COLLATE}=$charset; # For sorting
    use locale;
    use POSIX; 
    setlocale(LC_CTYPE, $charset); # Enables proper use of e.g. [[:upper:]] instead of [A-Z]
...and then go over the code, replacing
    [A-Z] with [[:upper:]]
    [a-z] with [[:lower:]]
    [A-Za-z] with [[:alpha:]]
    [A-Za-z0-9] with [[:alnum:]]
    etc...
This will also make sure that the (already used) case-replace function \u in s/// works properly regardless of which character set is currently in use.

The suggested changes, even if not implemented verbatim, will also make the code much more readable. Just look at the following example from TWiki.pm:

    # 'Web.TopicName#anchor' link: (with quick-fix i18n for just Swedish)
    s/([\s\(])([A-Z]+[a-z0-9]*)\.([A-Z]+[a-z]+[A-Z]+[a-zA-Z0-9]*)(\#[a-zA-Z0-9_]*)/&internalLink($1,$2,$3,"$TranslationToken$3$4$TranslationToken",$4,1)/geo;
that would become
    s/([\s\(])([[:upper:]]+[[:lower:]\d])\.([[:upper:]]+[[:lower:]]+[[:upper:]]+[[:alnum:]]*)(\#[[:alnum:]]*)/&internalLink($1,$2,$3,"$TranslationToken$3$4$TranslationToken",$4,1)/geo; 
Which would not only support all possible characters in the charset, but in fact also be more compact code.

-- StefanLindmark - 03 Feb 2002

Unfortunately [:upper:] etc are only in Perl 5.6 or higher, and since locales are broken on some systems the use locale can't be done unconditionally, but fortunately there are solutions for these issues smile ... See InternationalisationEnhancements for more info and please comment there.

-- RichardDonkin - 26 Nov 2002

Is there any need to localize variable names?

-- SamHasler - 17 Sep 2004

Topic attachments
I Attachment History Action Size Date Who Comment
Perl source code filepm wiki.pm   manage 43.2 K 2000-09-30 - 06:51 FrancoBagnoli wiki.pm
Perl source code filepm wikicfg.pm   manage 15.3 K 2000-09-30 - 06:51 FrancoBagnoli wikicfg.pm
Edit | Attach | Watch | Print version | History: r16 < r15 < r14 < r13 < r12 | Backlinks | Raw View | Raw edit | More topic actions
Topic revision: r16 - 2004-09-17 - SamHasler
 
  • Learn about TWiki  
  • Download TWiki
This site is powered by the TWiki collaboration platform Powered by Perl Hosted by OICcam.com Ideas, requests, problems regarding TWiki? Send feedback. Ask community in the support forum.
Copyright © 1999-2017 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.