On Mon, Feb 05, 2007 at 12:54:32AM +0100, Dr.Ruud wrote:
> Gerard Goossen schreef:
>
> > If we make \x{?..}? really insert codepoints, and not sometimes
> > bytes, we need an escape sequence for bytes.
>
> That is thinking the wrong way around, because it should only depend on
> the encoding at hand. And the encoding of the source file does not have
> to be equal to the encoding of the referenced data, for example a file
> that is written to. So if the source file is in UTF-8 and the data is in
> Latin-1, then an "Ä" will be built from multiple bytes for the source
> file but be only a single byte in the data.
Sometimes you need have a byte-string. But \x.. generates a character.
In Perl 5 \xFF generates a byte. But if your target encoding is UTF-8,
\xFF generates two bytes. And there is no way to insert the byte FF into
the string, because this isn't a valid codepoint UTF-8. So I proposed to
use \x[FF] in Perl7 to insert the byte FF. In Perl 5 \xFF inserts a byte,
because 0xFF is smaller then 256, but having \x[FF] to be explicit that
you want a byte would be nice.
PS. This would also solve some EBCDIC problems where in Perl5 \xA4 does not
generate an 'A', on EBCDIC platforms.
Gerard Goossen