develooper Front page | perl.perl5.porters | Postings from February 2008

Re: use encoding 'utf8' bug for Latin-1 range

Thread Previous | Thread Next
Juerd Waalboer
February 27, 2008 02:47
Re: use encoding 'utf8' bug for Latin-1 range
Message ID:
demerphq skribis 2008-02-27 11:41 (+0100):
> This is tough. It could be done (with a lot of work). But the
> implications I suspect are a lot deeper than you realize. Imagine
> peoples surprise when uc(chr(0xDF)) ends up being "SS".

It's easy to implement; I have do it in pseudo code in Perl, and just
lack the knowledge of C and Perl's source to actually patch pp.c.

All you need is 2 arrays of 128 bytes each, and exactly one special
case, indeed for ß (U+00DF).

People WILL get over the initial shock. uc($string_containing_0xDF)
already results in ß being changed to SS if $string_containing_0xDF

Currently uc($string_containing_0xDF) will EITHER produce "ß" there, OR
"SS". Glenn calls this a "guessing game" and I think that's a very apt
way to describe the current practice.
Met vriendelijke groet,  Kind regards,  Korajn salutojn,

  Juerd Waalboer:  Perl hacker  <>  <>
  Convolution:     ICT solutions and consultancy <>

Thread Previous | Thread Next Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at | Group listing | About