develooper Front page | perl.perl5.porters | Postings from February 2008

use encoding 'utf8' bug for Latin-1 range

Thread Next
From:
Jarkko Hietaniemi
Date:
February 20, 2008 18:23
Subject:
use encoding 'utf8' bug for Latin-1 range
Message ID:
47BCE092.3010109@iki.fi
$ cat bug.pl
use encoding 'utf8';
my $x = "\x{ff}";
use Devel::Peek;
Dump($x);
$ perl -w bug.pl
SV = PV(0x801038) at 0x80f5c0
  REFCNT = 1
  FLAGS = (PADMY,POK,pPOK,UTF8)
  PV = 0x201cb0 "\357\277\275"\0 [UTF8 "\x{fffd}"]
  CUR = 3
  LEN = 4
$

The \x{fffd} is the Unicode "lost in translation" character,
in case people are wondering.

I don't think it makes much sense
for "use encoding 'utf8'" to break the "Latin-1 range" of 0x80-0xff
like this?  Note that there is no literal UTF-8 anywhere, only the
\x notation. (The {} in the \x{ff} make no difference whether one
has them or if one does just \xff, the {} just make it easier to
try e.g. \x{fff} and see how Things Should Work.)

Seems to happen both with 5.8.8 and 5.10.0, don't have a blead
or a maint conveniently compiled.


Thread Next


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About