develooper Front page | perl.perl5.porters | Postings from February 2008

Re: use encoding 'utf8' bug for Latin-1 range

Thread Previous | Thread Next
Juerd Waalboer
February 26, 2008 14:34
Re: use encoding 'utf8' bug for Latin-1 range
Message ID:
demerphq skribis 2008-02-26 23:22 (+0100):
> I specifically meant that \N{U+...} should always result in a utf8
> upgraded string regardless of codepoint.
> (...)
> i think any string containing an \N escape (which is documented as
> unicode named sequences) should always return a utf8 string.
> (...)

No, it shouldn't. It could, can, and should certainly be allowed to, but
there's no reason to specify that it should, except compatibility with
existing bugs.

> Part of the reason i think this is because something like
> \n{LATIN-SHARP-ESS} (or whatever the hell its called, ive had a few
> beers tonite, i mean german sharp-s YKWIM) DOES return a utf8 string
> despite it being in a codepoint range where latin-1 overlaps.
> There is an inconsistancy if \N{U+HEX} does not return a utf8 string
> when the same codepoint refered to by name does.

This inconsistency is purely in the implementation, not on the Perl

juerd@nano:~$ perl -Mcharnames=:full -le'print "\N{LATIN SMALL LETTER
SHARP S}" eq "\N{U+00DF}"'

Even though the internal encoding is different, conceptually it's
exactly the same string.

Except, of course, for the known bugs in case/charclass operations,
which assume that !UTF8 ==> !unicode, which is inconsistent with the way
upgrading was later defined to be.

> As i said above, i am talking about whether the string has its utf8
> bit enabled or not.

The internal encoding of a string is irrelevant, except for performance
reasons, and for bug-by-bug compatibility.

I hope the aforementioned bugs are gone in 5.12. Working around them in
Perl 5 is easy (See Unicode::Semantics) but tedious.
Met vriendelijke groet,  Kind regards,  Korajn salutojn,

  Juerd Waalboer:  Perl hacker  <>  <>
  Convolution:     ICT solutions and consultancy <>

Thread Previous | Thread Next Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at | Group listing | About