develooper Front page | perl.perl5.porters | Postings from May 2008

Re: on broken manpages, trolling, inconsistent implementation andthe difficulty to fix bugs

From:
John Peacock
Date:
May 20, 2008 04:05
Subject:
Re: on broken manpages, trolling, inconsistent implementation andthe difficulty to fix bugs
Message ID:
4832B0A4.80709@havurah-software.org
Marc Lehmann wrote:
> Easy, it is the only way for perl to internally represent characters with a
> value of 500.
> 
> If that 500 is for example the second character in v5.500, then this might
> simply be the perl 5.500 version string (or part of an ip address in the
> game "uplink" stored in a compact string form, e.g. v478.321.571.277).

Just to be a pedant, v5.500 is a v-string, and since 5.8.1 has been magical:

$ perl -MDevel::Peek -e '$v = v5.500; Dump($v);'
SV = PVMG(0x81ae810) at 0x816e8b8
   REFCNT = 1
   FLAGS = (RMG,POK,pPOK,UTF8)
   IV = 0
   NV = 0
   PV = 0x81874c8 "\5\307\264"\0 [UTF8 "\x{5}\x{1f4}"]
   CUR = 3
   LEN = 4
   MAGIC = 0x818ee80
     MG_VIRTUAL = &PL_vtbl_utf8
     MG_TYPE = PERL_MAGIC_utf8(w)
     MG_LEN = 2
   MAGIC = 0x81b0b78
     MG_VIRTUAL = 0
     MG_TYPE = PERL_MAGIC_v-string(V)
     MG_LEN = 6
     MG_PTR = 0x8184710 "v5.500"

Perl version objects (native in v5.10.0 and available from CPAN for earlier 
releases) uses a very different storage format:

$ perl -MDevel::Peek -Mversion -e '$v = qv(v5.500); Dump($v);'
SV = PV(0x816eae8) at 0x816dcdc
   REFCNT = 1
   FLAGS = (ROK,OVERLOAD)
   RV = 0x816e8ac
   SV = PVHV(0x8172760) at 0x816e8ac
     REFCNT = 1
     FLAGS = (OBJECT,SHAREKEYS)
     IV = 3
     NV = 0
     STASH = 0x8197220   "version"
     ARRAY = 0x81b6c20  (1:1, 2:1)
     hash quality = 90.0%
     KEYS = 3
     FILL = 2
     MAX = 1
     RITER = 0
     EITER = 0x0
     Elt "original" HASH = 0xb45e44f2
     SV = PV(0x816eba8) at 0x81b4228
       REFCNT = 1
       FLAGS = (POK,pPOK)
       PV = 0x8197760 "v5.500"\0
       CUR = 6
       LEN = 8
     Elt "qv" HASH = 0x18c4b28a
     SV = IV(0x818de44) at 0x816dcf4
       REFCNT = 1
       FLAGS = (IOK,pIOK)
       IV = 1
     Elt "version" HASH = 0x68c27e33
     SV = RV(0x819a54c) at 0x81c6730
       REFCNT = 1
       FLAGS = (ROK)
       RV = 0x816dd90
       SV = PVAV(0x8172c64) at 0x816dd90
         REFCNT = 1
         FLAGS = ()
         IV = 0
         NV = 0
         ARRAY = 0x81c0988
         FILL = 2
         MAX = 3
         ARYLEN = 0x0
         FLAGS = (REAL)
         Elt No. 0
         SV = IV(0x818e05c) at 0x816e930
           REFCNT = 1
           FLAGS = (IOK,pIOK)
           IV = 5
         Elt No. 1
         SV = IV(0x818e060) at 0x816e900
           REFCNT = 1
           FLAGS = (IOK,pIOK)
           IV = 500
         Elt No. 2
         SV = IV(0x818e068) at 0x81b421c
           REFCNT = 1
           FLAGS = (IOK,pIOK)
           IV = 0
   PV = 0x816e8ac ""
   CUR = 0
   LEN = 0

The fact that the former (v-strings) were used for a while as $VERSION 
initializers is an historical aberration - a failure of imagination - that 
relied on the behavior of Perl's internals in a somewhat unhealthy fashion.

> Of course, this gets you in trouble:
> 
>    my $s = chr 200; # not unicode, but native 8-bit(??)
>    substr $s, 0, 0, chr 500;
>    $s =~ /ΓΌ/; # now interpreted as unicode
> 
> This is the insane part - I wouldn't expect even an expert perl programmer
> to predict how $s gets interpreted here.

This is a contrived example because you are going out of your way to manufacture 
bad code.  Just because you *can* use chr() with values > 255 and Perl turns on 
the UTF8 flag in the supreme hope that you knew what you were doing, doesn't 
make this irredeemably broken.  You broke $s by mixing your string-types using a 
low-level function that has no knowledge of unicode semantics, *nor should it*.

A more realistic example is a PV containing ASCII text has a UTF8 string 
concatanated to it.  This works as designed - the original string is upgraded to 
UTF8 and the second string appended and well-formed UTF8 is assured.

John



nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About