develooper Front page | perl.perl5.porters | Postings from March 2011

Re: use locale

Thread Previous | Thread Next
From:
=?ISO-8859-1?Q?Zsb=E1n_Ambrus?=
Date:
March 13, 2011 03:46
Subject:
Re: use locale
Message ID:
AANLkTi=_p_QjLaoe3qZ1bCkpyFgcX2vjhF9zW=4XPqYC@mail.gmail.com
On Sat, Mar 12, 2011 at 11:27 PM, demerphq <demerphq@gmail.com> wrote:
> On 12 March 2011 23:00, Zsbán Ambrus <ambrus@math.bme.hu> wrote:
>> On Sat, Mar 12, 2011 at 12:37 PM, demerphq <demerphq@gmail.com> wrote:
>>> I consider "use locale" broken and preserved only for backwards
>>> compatibility. IMO we should get rid of it, deprecate it, whatever.
>>
>> Er what?  How do I do a locale-aware string compare without it?  I'm
>> not supposed to call POSIX::strcoll?
>
> Its a personal opinion. Please let us know how you use it and why.

Well, just because you ask.  I personally have exactly two uses for
locales (but I don't deny that other people may have more needs).


The first use is to let various programs guess the encoding of the
terminal from it.  This is important, because I'm using both
iso-8859-2 and utf-8 encoded terminals.  In particular, I usually have
LC_CTYPE set to one of hu_HU or hu_HU.utf8 depending on the encoding
of the terminal the program is using (or en_US.utf8 on some broken
machines where hu_HU.utf8 doesn't work), but all other locale
variables unset because I don't want to get translated messages nor ls
sorting files case insensitively.

This use doesn't come up in perl scripts I write for myself though,
because when I write a script and it needs to know the encoding, I'll
tell it explicitly.  However, if I somehow did want to write a script
that behaves this way, it still wouldn't involve use locale, but
instead this:

$ LC_ALL=hu_HU perl -wMI18N::Langinfo -E 'say I18N::Langinfo::la
nginfo(I18N::Langinfo::CODESET());'
ISO-8859-2
$ LC_ALL=hu_HU.UTF8 perl -wMI18N::Langinfo -E 'say
I18N::Langinfo::langinfo(I18N::Langinfo::CODESET());'
UTF-8
$


Tom Christiansen guessed the second use right: I want to alphabetize
lists in Hungarian.  It's usually lists of personal names.  Here's a
constructed example with difficult cases.

$ perl -we 'print pack "(H*)*", qw"6bf5726973206bf664206bf372206bf3
64206bf672206be164206bf674206373 f36b206b6f7661206375646172206bf5
72206b6f72206bf670206bf5206b7574 7961206b6f7474610a"' > a
$ LC_ALL=hu_HU perl -Mlocale -wanE 'say join" ", sort @F' a
cudar csók kád kód kor kór kotta kova kő köd köp kör kőr kőris köt kutya
$

(Throw in a | iconv -f iso-8859-2 if you want to see the output on an
utf-8 terminal.)

Before this thread I didn't know about the Unicode-Collate module, but
now I installed it, and indeed it too allows me to do this sorting.

$ perl -MUnicode::Collate::Locale -MEncode -wE 'say
encode("iso-8859-2", join" ", Unicode::Collate::Locale->new(locale =>
"hu")->sort(split " ", decode("iso-8859-2", <>)))' a
cudar csók kád kód kor kór kotta kova kő köd köp kör kőr kőris köt kutya
$

Tom mentions the level argument for the collator.  Do I understand it
right that this only changes when strings are considered equal?  If
so, then I probably don't need it: it's  impossible to automatically
compare people for equality by their name anyway, with variants of the
same name used randomly and multiple people having the same name.


Ambrus.

Thread Previous | Thread Next


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About