Front page | perl.perl5.porters |
Postings from November 2011
Re: RFC: fc keyword API
Thread Previous
|
Thread Next
From:
Tom Christiansen
Date:
November 26, 2011 05:13
Subject:
Re: RFC: fc keyword API
Message ID:
23398.1322313156@chthon
> Is there any real-life advantage of using simple case folding over
> lowercasing? My impression is not, and if so the discussion is moot.
There is, although I hate to be the one to bring it up because I
have had a lot of trouble because of simple casefolding in other
languages than Perl and so like the Unicode Standard itself am
advocating full casefolding.
You need Unicode casefolding because no other casemap suffices to determine
whether the string is caselessly equivalent to another one. Indeed, caseless
equivalence is defined by fc, not by lc.
The three sigmas are the most obvious example. You can't reasonably match
Σίσυφος without Unicode casefolding, and here simple casefolding suffices.
String uc lc fc
------- ------- ------- -------
Σίσυφος ΣΊΣΥΦΟΣ σίσυφος σίσυφοσ
There are two lowercase sigmas. You cannot just compare lc's of anything,
because the lc casemap will not turn the final sigma into a regular one,
whereas the fc casemap will.
The many Greeks with the iota subscript all require full casefolding, though.
--tom
Thread Previous
|
Thread Next