Using Perl locales the RightWay TM
I've deleted the part of this topic that no longer applies -- RD.
(or: making Perl locales Work Like They Used To TM ... )
The current manner of using Perl locales (the
locale module) was temporarily broken. This topic was motivated by
Bugs:Item772
.
//Material Deleted//
References
For further information on this issue, see:
- Bugs:Item772
-
man perllocale
-
locale.pm (inside Perl's core library directory).
-
$^H in man perlvar
I would like to have some feedback on this issue, specially about the involved performance hit.
--
AntonioTerceiro - 19 Nov 2005
The problem is that the code in Dakar has regressed where it does the
require locale in at least Render.pm, and possibly other modules. The correct code in Cairo does a
require locale and an
import locale() - latter is what makes this into a dynamic
use locale. From
perldoc perlmod:
Perl modules are included into your program by saying
use Module;
or
use Module LIST;
This is exactly equivalent to
BEGIN { require Module; import Module; }
or
BEGIN { require Module; import Module LIST; }
For example, this works (from Cairo
TWiki.pm):
# Read the configuration file at compile time in order to set locale
BEGIN {
do "TWiki.cfg";
# Do a dynamic 'use locale' for this module
if( $useLocale ) {
require locale;
import locale ();
}
}
Of course, you only need to read the config file in one module, so only the
import line needs adding.
This should fix the problem, without any performance hit for non-I18N sites - no need for a complex change as long as this is done in every module that uses
I18N regexes, upper/lower-casing of sorting. It's important to do a
use locale in order for the Perl regex engine and sorting to work with locales -
setlocale only affects things done through the C library.
Here is a simple test program that lets you see whether a locale has been correctly loaded in Perl terms by inspecting
$^H and testing some basic operations. I have now fixed this so it works properly on Perl 5.8.4 (Debian), though it may need tweaking to use 0x800 as the bitmask on older Perls (perhaps 5.6).
Here is some output of this script (requires ISO-8859-1 to view properly). I ran this on Perl 5.8.4 on a Debian Linux box with working locales. This also shows how the require+import has a dynamic effect even if done in a BEGIN block - interestingly, the $^H setting becomes invisible/lost outside that block, but the 'use locale' effect remains.
$ LANG=fr_FR.iso88591 perl charset.pl
LANG is fr_FR.iso88591
====== first part =============
$^H is 1798
Locale bit is 1 (controls Perl regexes etc)
locale.pm loaded
Locale is fr_FR.iso88591
Locale now is fr_FR.iso88591
Sorted: µ_0123456789aAªáAàAâAåÅäÄaAæÆbBcCçÇdDdDeEéÉèEêEëEfFgGhHiIíIìIîIïIjJkKlLm
MnNñÑoOºóOòOôOöÖoOoOpPqQrRsSßtTuUúUùUûUüÜvVwWxXyYyYÿzZ__
Unsorted: 0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ_abcdefghijklmnopqrstuvwxyzªµºAAAA
ÄÅÆÇEÉEEIIIIDÑOOOOÖOUUUÜY_ßàáâaäåæçèéêëìíîïdñòóôoöoùúûüy_ÿ
I don't always read TWiki.org very often at present, so do email me if I don't respond to
I18N issues like this.
--
RichardDonkin - 20 Nov 2005
Richard, thank you for this very insightful tip.
I've fixed this issue on
SVN 7571, by adding
import locale() just after each
require locale;.
--
AntonioTerceiro - 21 Nov 2005