develooper Front page | perl.perl5.porters | Postings from February 2008

Re: use encoding 'utf8' bug for Latin-1 range

Thread Previous | Thread Next
From:
Ben Morrow
Date:
February 21, 2008 01:12
Subject:
Re: use encoding 'utf8' bug for Latin-1 range
Message ID:
1vdv85-6g8.ln1@osiris.mauzo.dyndns.org

Quoth jhi@iki.fi:
> $ cat bug.pl
> use encoding 'utf8';
> my $x = "\x{ff}";
> use Devel::Peek;
> Dump($x);
> $ perl -w bug.pl
> SV = PV(0x801038) at 0x80f5c0
>   REFCNT = 1
>   FLAGS = (PADMY,POK,pPOK,UTF8)
>   PV = 0x201cb0 "\357\277\275"\0 [UTF8 "\x{fffd}"]
>   CUR = 3
>   LEN = 4
> $
> 
> The \x{fffd} is the Unicode "lost in translation" character,
> in case people are wondering.
> 
> I don't think it makes much sense
> for "use encoding 'utf8'" to break the "Latin-1 range" of 0x80-0xff
> like this?

'use encoding' seems to be completely broken with both 5.10.0 and 5.8
anyway; for instance

    #!/usr/bin/perl -l

    use encoding 'utf8';

    sub ι { $_[0] }
    print ι("foo");

(that's a sub named with a lowercase Greek iota) gives the error
'Illegal declaration of anonymous subroutine' whereas it works perfectly
with 'use utf8'. 

bug.pl above also works with utf8 instead of encoding. There seem to be
three cases:

    no encoding specified, or just 'use utf8': "\377"\0

    use encoding 'iso-8859-1': "\307\277"\0 [UTF8 "\x{ff}"]

    anything else: "\357\277\275"\0 [UTF8 "\x{fffd}"]

Ben


Thread Previous | Thread Next


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About