Front page | perl.perl5.porters |
Postings from May 2010
Re: Running Perl 5 on Turkish (Was : Is lc(\x{130}) -> i\x{307} abug?)
Thread Previous
|
Thread Next
From:
karl williamson
Date:
May 26, 2010 12:39
Subject:
Re: Running Perl 5 on Turkish (Was : Is lc(\x{130}) -> i\x{307} abug?)
Message ID:
4BFD7898.20701@khwilliamson.com
demerphq wrote:
> 2010/5/25 karl williamson <public@khwilliamson.com>:
>> ToFold doesn't work. There is currently no way to override case insensitive
>> matching. It's not clear at the moment what can be done in that direction.
>> Turkish is clearly a demonstration of why that would be something useful.
>> As 5.14 development progresses, we'll know more. I'll tell Yves who has
>> some ideas of how to make folding better, but has not had time to work much
>> on them, that it would be nice if there were a way to override the standard
>> mappings.
>
> This is a design error in Unicode. Probably the *best* option would be
> to petition them to create a "lowercase dotless i" (that has a dot of
> course).
I don't know the history of this. I'm not sure that it is a design
error, but certainly they've made those. It could have been because of
compatibility with an existing standard. I just don't know. I do know
it has caused them many headaches, and they're not about to revisit it,
probably ever. They decided there was no good solution and changed in
something like version 3.1 or 3.2 to the current one, as the least
awful. They continue to pretend that their case folding is not
locale-dependent, but it is in this one instance.
>
> It appears this worked with latin-sharp-ess, as there is now a
> capitalized and lower case version even the letter has always been
> considered to be "lowercase that is used in title case script".
I do know a little about the history of this. They generally require
evidence that something actually exists in the wild before they will
consider most things. And in fact, someone showed the Unicode folks
that E. German newspapers from around 50 years ago were using this
uppercase letter. The proposal was initially rejected, but revived with
more evidence, then accepted. So it wasn't because someone said
"wouldn't it be nice, because this is causing us all sorts of
implementation hassles", it was because there was documented evidence
that the character, different from all other characters, really existed.
(They haven't always been so tight. There is a Unicode Tibetan
character that means -1/2. I was curious about, why of all the
languages in the world, would Tibetan be the only one that had thought
to have a single letter stand for a negative number, and a fraction at
that! It turns out there is no evidence that this has ever existed.
There was a stamp issued in Tibet in the 1930's, I believe, that meant I
think it was 7 - .5 coins (whatever the currency was). Someone in
Unicode used the rule that meant "subtract 1/2" to extrapolate back to
create characters for 5.5, 4.5, ... -0.5. If you're curious, you can
google it, as I did.)
>
> Ill think on a technical solution, but i must admit the plans I've
> been toying with go the other way, in that they would probably be
> compiled at perl build time.
The only thing that comes to me is a 'use re' option.
>
> cheers,
> Yves
>
>
Thread Previous
|
Thread Next