develooper Front page | perl.perl5.porters | Postings from February 2007

Re: Future Perl development

Gerard Goossen
February 5, 2007 11:36
Re: Future Perl development
Message ID:
On Mon, Feb 05, 2007 at 12:54:32AM +0100, Dr.Ruud wrote:
> Gerard Goossen schreef:
> > If we make \x{?..}? really insert codepoints, and not sometimes
> > bytes, we need an escape sequence for bytes.
> That is thinking the wrong way around, because it should only depend on
> the encoding at hand. And the encoding of the source file does not have
> to be equal to the encoding of the referenced data, for example a file
> that is written to. So if the source file is in UTF-8 and the data is in
> Latin-1, then an "Ä" will be built from multiple bytes for the source
> file but be only a single byte in the data.
Sometimes you need have a byte-string. But \x.. generates a character.
In Perl 5 \xFF generates a byte. But if your target encoding is UTF-8,
\xFF generates two bytes. And there is no way to insert the byte FF into
the string, because this isn't a valid codepoint UTF-8. So I proposed to
use \x[FF] in Perl7 to insert the byte FF. In Perl 5 \xFF inserts a byte,
because 0xFF is smaller then 256, but having \x[FF] to be explicit that
you want a byte would be nice.

PS. This would also solve some EBCDIC problems where in Perl5 \xA4 does not 
generate an 'A', on EBCDIC platforms.

Gerard Goossen Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at | Group listing | About