develooper Front page | perl.perl5.porters | Postings from February 2007

Re: Future Perl development

From:
Dr.Ruud
Date:
February 5, 2007 14:10
Subject:
Re: Future Perl development
Message ID:
20070205221035.9228.qmail@lists.develooper.com
Gerard Goossen schreef:
> Dr.Ruud:
>> Gerard Goossen:

>>> If we make \x{?..}? really insert codepoints, and not sometimes
>>> bytes, we need an escape sequence for bytes.
>>
>> That is thinking the wrong way around, because it should only depend
>> on the encoding at hand. And the encoding of the source file does
>> not have to be equal to the encoding of the referenced data, for
>> example a file that is written to. So if the source file is in UTF-8
>> and the data is in Latin-1, then an "Ä" will be built from multiple
>> bytes for the source file but be only a single byte in the data.
>
> Sometimes you need have a byte-string. But \x.. generates a character.

perl -wle '
  print pack "H*",
  "4a75737420616e6f74686572205065726c206861636b65722c"
'


> In Perl 5 \xFF generates a byte. But if your target encoding is UTF-8,
> \xFF generates two bytes. And there is no way to insert the byte FF
> into the string, because this isn't a valid codepoint UTF-8.

Doing something like that should turn it into a byte buffer, because it
is no longer valid UTF-8. So just use unpack.


> In Perl 5 \xFF inserts a byte, because 0xFF is smaller then 256

Not.

perl -wle '
  $s = substr "\x{100}\xFF", 1;
  print length $s, ":", unpack "H*", $s;
'

-- 
Affijn, Ruud

"Gewoon is een tijger."




nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About