Tags:
internationalization1Add my vote for this tag create new tag
view all tags

Using Perl locales the RightWay TM

I've deleted the part of this topic that no longer applies -- RD.

(or: making Perl locales Work Like They Used To TM ... )

The current manner of using Perl locales (the locale module) was temporarily broken. This topic was motivated by Bugs:Item772.

//Material Deleted//

References

For further information on this issue, see:

  • Bugs:Item772
  • man perllocale
  • locale.pm (inside Perl's core library directory).
  • $^H in man perlvar


I would like to have some feedback on this issue, specially about the involved performance hit.

-- AntonioTerceiro - 19 Nov 2005

The problem is that the code in Dakar has regressed where it does the require locale in at least Render.pm, and possibly other modules. The correct code in Cairo does a require locale and an import locale() - latter is what makes this into a dynamic use locale. From perldoc perlmod:

Perl modules are included into your program by saying
    use Module;
or
    use Module LIST;
This is exactly equivalent to
    BEGIN { require Module; import Module; }
or
    BEGIN { require Module; import Module LIST; }

For example, this works (from Cairo TWiki.pm):

# Read the configuration file at compile time in order to set locale
BEGIN {
    do "TWiki.cfg";

    # Do a dynamic 'use locale' for this module
    if( $useLocale ) {
        require locale;
        import locale ();
    }
}

Of course, you only need to read the config file in one module, so only the import line needs adding.

This should fix the problem, without any performance hit for non-I18N sites - no need for a complex change as long as this is done in every module that uses I18N regexes, upper/lower-casing of sorting. It's important to do a use locale in order for the Perl regex engine and sorting to work with locales - setlocale only affects things done through the C library.

Here is a simple test program that lets you see whether a locale has been correctly loaded in Perl terms by inspecting $^H and testing some basic operations. I have now fixed this so it works properly on Perl 5.8.4 (Debian), though it may need tweaking to use 0x800 as the bitmask on older Perls (perhaps 5.6).

Here is some output of this script (requires ISO-8859-1 to view properly). I ran this on Perl 5.8.4 on a Debian Linux box with working locales. This also shows how the require+import has a dynamic effect even if done in a BEGIN block - interestingly, the $^H setting becomes invisible/lost outside that block, but the 'use locale' effect remains.

$ LANG=fr_FR.iso88591 perl charset.pl
LANG is fr_FR.iso88591
====== first part =============
$^H is 1798
Locale bit is 1 (controls Perl regexes etc)
locale.pm loaded
Locale is fr_FR.iso88591
Locale now is fr_FR.iso88591
Sorted: µ_0123456789aAªáAàAâAåÅäÄaAæÆbBcCçÇdDdDeEéÉèEêEëEfFgGhHiIíIìIîIïIjJkKlLm
MnNñÑoOºóOòOôOöÖoOoOpPqQrRsSßtTuUúUùUûUüÜvVwWxXyYyYÿzZ__
Unsorted: 0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ_abcdefghijklmnopqrstuvwxyzªµºAAAA
ÄÅÆÇEÉEEIIIIDÑOOOOÖOUUUÜY_ßàáâaäåæçèéêëìíîïdñòóôoöoùúûüy_ÿ

I don't always read TWiki.org very often at present, so do email me if I don't respond to I18N issues like this.

-- RichardDonkin - 20 Nov 2005

Richard, thank you for this very insightful tip. smile

I've fixed this issue on SVN 7571, by adding import locale() just after each require locale;.

-- AntonioTerceiro - 21 Nov 2005

Topic attachments
I Attachment History Action Size Date Who Comment
Texttxt charset.pl.txt r1 manage 2.4 K 2005-11-20 - 10:19 UnknownUser Locale test script, now working on Perl 5.8.4
Edit | Attach | Watch | Print version | History: r5 < r4 < r3 < r2 < r1 | Backlinks | Raw View | Raw edit | More topic actions
Topic revision: r5 - 2007-03-10 - RichardDonkin
 
  • Learn about TWiki  
  • Download TWiki
This site is powered by the TWiki collaboration platform Powered by Perl Hosted by OICcam.com Ideas, requests, problems regarding TWiki? Send feedback. Ask community in the support forum.
Copyright © 1999-2026 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.