develooper Front page | perl.perl5.porters | Postings from February 2007

Re: Future Perl development

Gerard Goossen
February 6, 2007 09:25
Re: Future Perl development
Message ID:
On Mon, Feb 05, 2007 at 10:41:15PM +0100, Juerd Waalboer wrote:
> Gerard Goossen skribis 2007-02-05 20:39 (+0100):
> > Sometimes you need have a byte-string.
> Indeed.
> > But \x.. generates a character.
> (Note that \xFF and \x{ff} are the same, for any capitalization of ff.)
> Or a byte. Because of the clever Unicode implementation in Perl, you get
> a character if you use the return value in a unicode string, and a byte
> if you use the return value in a byte string.
> This is not a matter of context, by the way. Instead, the value "\xFF"
> is polymorphic. It's both a unicode string representing code point
> U+00FF, and the single byte 0xFF.
No. \xFF creates a character represented by FF according to the native
encoding. If your native encoding is EBCDIC this does NOT correspend to
U+00FF (instead it corresponds to U+007E or U+009F, depending on the
flavor of EBCDIC you're on). 
You also assume that \xFF in the native encoding corresponds to a byte
You assume (like everybody else) that in the native encoding a
character corresponds to a byte with the same numeric value.
This assumption is what makes the transition to UTF-8 so difficult,
because in the UTF-8 encoding, the assumption is NOT correct. 

Gerard Goossen Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at | Group listing | About