develooper Front page | perl.perl5.porters | Postings from November 2008

Re: Matching multi-character folds, and FMTEYEWTK on troubles thereof

Thread Previous | Thread Next
From:
Rafael Garcia-Suarez
Date:
November 25, 2008 23:49
Subject:
Re: Matching multi-character folds, and FMTEYEWTK on troubles thereof
Message ID:
b77c1dce0811252349s422f811fg923e90a6154713e@mail.gmail.com
2008/11/25 Tom Christiansen <tchrist@perl.com>:
> *** Unicode CLDR Project: Common Locale Data Repository
>        http://unicode.org/cldr/
>
> *** CVS Snapshots for CLDR:
>        ftp://ftp.unicode.org/Public/cldr/cldr-repository-daily.tgz
>
> Yves, there're also French versions of some of the above, s'il te plaît,
> but I had trouble getting them to download.
>
> The last, CLDR, contains *VERY* interesting stuff.  I wish I could figure
> out how to auto-translate these into Unicode::Collation objects.  For
> example, here's cldr/common/collation/fr.xml, fycnrdths:

Strange, it doesn't seem to contain collations for the French "Œ" ("e
dans l'o"), which sorts exactly as "oe". Does that mean that the CLDR
still have bugs too ?

Anyway. I don't think that it's the core's job to handle localisation
data and collations. There are too many of them, not counting the ones
you might invent for specific purposes (like, where to put Planck's
constant in a quantum physics book index?) Let us begin by trying to
get the Turkish capitalisation right. And even for this, I'm not sure
we want it really in the core.

> <?xml version="1.0" encoding="UTF-8" ?>
> <!DOCTYPE ldml SYSTEM "http://www.unicode.org/cldr/dtd/1.6/ldml.dtd">
> <ldml>
>        <identity>
>                <version number="$Revision: 1.23 $"/>
>                <generation date="$Date: 2008/03/10 02:27:54 $"/>
>                <language type="fr" />
>        </identity>
>        <collations  validSubLocales="fr_BE fr_CA fr_CH fr_FR fr_LU">
>                <collation type="standard" >
>                        <settings backwards="on"  />
>                        <rules>
>                                <reset>ae</reset>
>                                <s>æ</s>
>                                <t>Æ</t>
> <!--
>                                <reset>A</reset>
>                                <x><s>Æ</s><extend>E</extend></x>
>                                <reset>a</reset>
>                                <x><s>æ</s><extend>e</extend></x>
> -->
>                        </rules>
>                </collation  >
>        </collations>
> </ldml>

-- 
A system is nothing more than the subordination of all aspects of
the universe to any one such aspect.
    -- Borges

Thread Previous | Thread Next


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About