develooper Front page | perl.perl5.porters | Postings from February 2008

Re: use encoding 'utf8' bug for Latin-1 range

Thread Previous | Thread Next
From:
Juerd Waalboer
Date:
February 27, 2008 02:47
Subject:
Re: use encoding 'utf8' bug for Latin-1 range
Message ID:
20080227104411.GV13615@c4.convolution.nl
demerphq skribis 2008-02-27 11:41 (+0100):
> This is tough. It could be done (with a lot of work). But the
> implications I suspect are a lot deeper than you realize. Imagine
> peoples surprise when uc(chr(0xDF)) ends up being "SS".

It's easy to implement; I have do it in pseudo code in Perl, and just
lack the knowledge of C and Perl's source to actually patch pp.c.

All you need is 2 arrays of 128 bytes each, and exactly one special
case, indeed for ß (U+00DF).

People WILL get over the initial shock. uc($string_containing_0xDF)
already results in ß being changed to SS if $string_containing_0xDF
happens to be INTERNALLY ENCODED AS UTF-8.

Currently uc($string_containing_0xDF) will EITHER produce "ß" there, OR
"SS". Glenn calls this a "guessing game" and I think that's a very apt
way to describe the current practice.
-- 
Met vriendelijke groet,  Kind regards,  Korajn salutojn,

  Juerd Waalboer:  Perl hacker  <#####@juerd.nl>  <http://juerd.nl/sig>
  Convolution:     ICT solutions and consultancy <sales@convolution.nl>

Thread Previous | Thread Next


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About