Gerard Goossen schreef: > Dr.Ruud: >> Gerard Goossen: >>> If we make \x{?..}? really insert codepoints, and not sometimes >>> bytes, we need an escape sequence for bytes. >> >> That is thinking the wrong way around, because it should only depend >> on the encoding at hand. And the encoding of the source file does >> not have to be equal to the encoding of the referenced data, for >> example a file that is written to. So if the source file is in UTF-8 >> and the data is in Latin-1, then an "Ä" will be built from multiple >> bytes for the source file but be only a single byte in the data. > > Sometimes you need have a byte-string. But \x.. generates a character. perl -wle ' print pack "H*", "4a75737420616e6f74686572205065726c206861636b65722c" ' > In Perl 5 \xFF generates a byte. But if your target encoding is UTF-8, > \xFF generates two bytes. And there is no way to insert the byte FF > into the string, because this isn't a valid codepoint UTF-8. Doing something like that should turn it into a byte buffer, because it is no longer valid UTF-8. So just use unpack. > In Perl 5 \xFF inserts a byte, because 0xFF is smaller then 256 Not. perl -wle ' $s = substr "\x{100}\xFF", 1; print length $s, ":", unpack "H*", $s; ' -- Affijn, Ruud "Gewoon is een tijger."Thread Previous | Thread Next