develooper Front page | perl.perl5.porters | Postings from February 2007

Re: Future Perl development

From:
Gerard Goossen
Date:
February 6, 2007 09:25
Subject:
Re: Future Perl development
Message ID:
20070206172759.GC4494@ostwald
On Mon, Feb 05, 2007 at 10:41:15PM +0100, Juerd Waalboer wrote:
> Gerard Goossen skribis 2007-02-05 20:39 (+0100):
> > Sometimes you need have a byte-string.
> 
> Indeed.
> 
> > But \x.. generates a character.
> 
> (Note that \xFF and \x{ff} are the same, for any capitalization of ff.)
> 
> Or a byte. Because of the clever Unicode implementation in Perl, you get
> a character if you use the return value in a unicode string, and a byte
> if you use the return value in a byte string.
>     
> This is not a matter of context, by the way. Instead, the value "\xFF"
> is polymorphic. It's both a unicode string representing code point
> U+00FF, and the single byte 0xFF.
 
No. \xFF creates a character represented by FF according to the native
encoding. If your native encoding is EBCDIC this does NOT correspend to
U+00FF (instead it corresponds to U+007E or U+009F, depending on the
flavor of EBCDIC you're on). 
You also assume that \xFF in the native encoding corresponds to a byte
You assume (like everybody else) that in the native encoding a
character corresponds to a byte with the same numeric value.
This assumption is what makes the transition to UTF-8 so difficult,
because in the UTF-8 encoding, the assumption is NOT correct. 


Gerard Goossen




nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About