develooper Front page | perl.perl5.porters | Postings from February 2008

use encoding 'utf8' bug for Latin-1 range

Thread Next
Jarkko Hietaniemi
February 20, 2008 18:23
use encoding 'utf8' bug for Latin-1 range
Message ID:
$ cat
use encoding 'utf8';
my $x = "\x{ff}";
use Devel::Peek;
$ perl -w
SV = PV(0x801038) at 0x80f5c0
  REFCNT = 1
  PV = 0x201cb0 "\357\277\275"\0 [UTF8 "\x{fffd}"]
  CUR = 3
  LEN = 4

The \x{fffd} is the Unicode "lost in translation" character,
in case people are wondering.

I don't think it makes much sense
for "use encoding 'utf8'" to break the "Latin-1 range" of 0x80-0xff
like this?  Note that there is no literal UTF-8 anywhere, only the
\x notation. (The {} in the \x{ff} make no difference whether one
has them or if one does just \xff, the {} just make it easier to
try e.g. \x{fff} and see how Things Should Work.)

Seems to happen both with 5.8.8 and 5.10.0, don't have a blead
or a maint conveniently compiled.

Thread Next Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at | Group listing | About