develooper Front page | perl.perl5.porters | Postings from February 2007

Re: Future Perl development

From:
Gerard Goossen
Date:
February 5, 2007 11:36
Subject:
Re: Future Perl development
Message ID:
20070205193950.GD9642@ostwald
On Mon, Feb 05, 2007 at 12:54:32AM +0100, Dr.Ruud wrote:
> Gerard Goossen schreef:
> 
> > If we make \x{?..}? really insert codepoints, and not sometimes
> > bytes, we need an escape sequence for bytes.
> 
> That is thinking the wrong way around, because it should only depend on
> the encoding at hand. And the encoding of the source file does not have
> to be equal to the encoding of the referenced data, for example a file
> that is written to. So if the source file is in UTF-8 and the data is in
> Latin-1, then an "Ä" will be built from multiple bytes for the source
> file but be only a single byte in the data.
 
Sometimes you need have a byte-string. But \x.. generates a character.
In Perl 5 \xFF generates a byte. But if your target encoding is UTF-8,
\xFF generates two bytes. And there is no way to insert the byte FF into
the string, because this isn't a valid codepoint UTF-8. So I proposed to
use \x[FF] in Perl7 to insert the byte FF. In Perl 5 \xFF inserts a byte,
because 0xFF is smaller then 256, but having \x[FF] to be explicit that
you want a byte would be nice.

PS. This would also solve some EBCDIC problems where in Perl5 \xA4 does not 
generate an 'A', on EBCDIC platforms.

 
Gerard Goossen




nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About