develooper Front page | perl.perl5.porters | Postings from March 2011

use locale

Thread Next
From:
Tom Christiansen
Date:
March 11, 2011 13:58
Subject:
use locale
Message ID:
18782.1299880675@chthon
On a related note, aren't use locale and /pat/l 
also pretty foul dead-ends?

You're completely at the mercy of your vendor's locales, 
which aren't all they're cracked up to be.  

Sure, nobody wants this nonsense:

    % perl -CS -E 'say for sort "r\xE9d", "red", "rfd"'
    red
    rfd
    réd

But you can't count on locales working at all.  Compare host1
vs host2 here, with all environment settings and Perl versions 
identical:

    host1% perl -CS -Mlocale -E 'say for sort "r\xE9d", "red", "rfd"'
    red
    rfd
    réd

    host2% perl -CS -Mlocale -E 'say for sort "r\xE9d", "red", "rfd"'
    réd
    red
    rfd

Plus even on host2, that's the *wrong order*!  This though
works correctly, and the same way, no matter where you are:

    % perl -CS -MUnicode::Collate -E 'say for Unicode::Collate->new->sort( "r\xE9d", "red", "rfd")'  
    red
    réd
    rfd

Pity it's only 100x slower and bigger:

    % head -2000 words.utf8 > /tmp/w

    % time perl -CSD -Mlocale           -E 'say for sort <>'                           /tmp/w > /dev/null
    0.054u 0.004s 0:00.05 100.0%    0+0k 0+0io 0pf+0w

    % time perl -CSD -MUnicode::Collate -E 'say for Unicode::Collate->new->sort( <> )' /tmp/w > /dev/null
    5.397u 0.105s 0:05.56 98.7%     0+0k 160+0io 3pf+0w

But I figure that if it doesn't have to be correct, 
I can easily make it infinitely fast. :(

--tom
-- 
    Palindrome of the Day: 
        "Eva, can I pose as Aesop in a cave?"

Thread Next


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About