On Mon, Feb 05, 2007 at 08:39:50PM +0100, Gerard Goossen wrote: > Sometimes you need have a byte-string. But \x.. generates a character. > In Perl 5 \xFF generates a byte. But if your target encoding is UTF-8, > \xFF generates two bytes. And there is no way to insert the byte FF into > the string, because this isn't a valid codepoint UTF-8. So I proposed to > use \x[FF] in Perl7 to insert the byte FF. In Perl 5 \xFF inserts a byte, > because 0xFF is smaller then 256, but having \x[FF] to be explicit that > you want a byte would be nice. I think this becomes a confusion between UTF-8 strings and byte strings. Why would you care about the representation in memory? Will the string be passed to a C function that expects bytes, and not UTF-8? > PS. This would also solve some EBCDIC problems where in Perl5 \xA4 does not > generate an 'A', on EBCDIC platforms. I don't understand. If it needs to be translated from UTF-8 to EBCDIC when output to the screen, then that is where it should happen. Cheers, mark -- mark@mielke.cc / markm@ncf.ca / markm@nortel.com __________________________ . . _ ._ . . .__ . . ._. .__ . . . .__ | Neighbourhood Coder |\/| |_| |_| |/ |_ |\/| | |_ | |/ |_ | | | | | | \ | \ |__ . | | .|. |__ |__ | \ |__ | Ottawa, Ontario, Canada One ring to rule them all, one ring to find them, one ring to bring them all and in the darkness bind them... http://mark.mielke.cc/Thread Previous | Thread Next