develooper Front page | perl.perl5.porters | Postings from May 2010

Re: Running Perl 5 on Turkish (Was : Is lc(\x{130}) -> i\x{307} a bug?)

Thread Previous | Thread Next
May 26, 2010 14:39
Re: Running Perl 5 on Turkish (Was : Is lc(\x{130}) -> i\x{307} a bug?)
Message ID:
On 26 May 2010 21:38, karl williamson <> wrote:
> demerphq wrote:
>> 2010/5/25 karl williamson <>:
>>> ToFold doesn't work.  There is currently no way to override case
>>> insensitive
>>> matching.  It's not clear at the moment what can be done in that
>>> direction.
>>>  Turkish is clearly a demonstration of why that would be something
>>> useful.
>>>  As 5.14 development progresses, we'll know more. I'll tell Yves who has
>>> some ideas of how to make folding better, but has not had time to work
>>> much
>>> on them, that it would be nice if there were a way to override the
>>> standard
>>> mappings.
>> This is a design error in Unicode. Probably the *best* option would be
>> to petition them to create a "lowercase dotless i" (that has a dot of
>> course).
> I don't know the history of this.  I'm not sure that it is a design error,
> but certainly they've made those.  It could have been because of
> compatibility with an existing standard.  I just don't know.  I do know it
> has caused them many headaches, and they're not about to revisit it,
> probably ever.  They decided there was no good solution and changed in
> something like version 3.1 or 3.2 to the current one, as the least awful.
>  They continue to pretend that their case folding is not locale-dependent,
> but it is in this one instance.

Yes, which is why i suepect there might be the possibility to get them
to move on the subject.

Basically it completely breaks round tripping Turkish script.

>> It appears this worked with latin-sharp-ess, as there is now a
>> capitalized and lower case version even the letter has always been
>> considered to be "lowercase that is used in title case script".
> I do know a little about the history of this.  They generally require
> evidence that something actually exists in the wild before they will
> consider most things.  And in fact, someone showed the Unicode folks that E.
> German newspapers from around 50 years ago were using this uppercase letter.
>  The proposal was initially rejected, but revived with more evidence, then
> accepted.  So it wasn't because someone said "wouldn't it be nice, because
> this is causing us all sorts of implementation hassles", it was because
> there was documented evidence that  the character, different from all other
> characters, really existed.

The thing is tho, latin-sharp-ess at least in German, was/is a ligature.

It was _never_ _ever_ an uppercase letter, it is/was a lower case
ligature (of sz) that had no uppercase equivalent so that in signs,
which are normally uppercase, it was _also_ used.

So even if it /was/ used in signs it was never uppercase.

And IMO this is similar to the case with Turkish dotless I.

No doubt Burak can find loads of photos of Turkish signs using the
lowercase i, and since we know the turkish rule for uppercasing is
"special" we thus can prove that the dotless I should have a lower
case equivalent.

Anyway, I suppose getting the Unicode group to change things is
unlikely, but really that is the right solution (from the point of
view of folding).

> (They haven't always been so tight.  There is a Unicode Tibetan character
> that means -1/2.  I was curious about, why of all the languages in the
> world, would Tibetan be the only one that had thought to have a single
> letter stand for a negative number, and a fraction at that!  It turns out
> there is no evidence that this has ever existed. There was a stamp issued in
> Tibet in the 1930's, I believe, that meant I think it was 7 - .5 coins
> (whatever the currency was).  Someone in Unicode used the rule that meant
> "subtract 1/2" to extrapolate back to create characters for 5.5, 4.5, ...
> -0.5.  If you're curious, you can google it, as I did.)

Yikes. :-)

>> Ill think on a technical solution, but i must admit the plans I've
>> been toying with go the other way, in that they would probably be
>> compiled at perl build time.
> The only thing that comes to me is a 'use re' option.

Yeah, it would hook in there, but still, deferring compilation of fold
tables to run time is not the way I wanted to go. I suppose we have no
choice tho.

perl -Mre=debug -e "/just|another|perl|hacker/"

Thread Previous | Thread Next Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at | Group listing | About