Marc Lehmann wrote:
> Easy, it is the only way for perl to internally represent characters with a
> value of 500.
>
> If that 500 is for example the second character in v5.500, then this might
> simply be the perl 5.500 version string (or part of an ip address in the
> game "uplink" stored in a compact string form, e.g. v478.321.571.277).
Just to be a pedant, v5.500 is a v-string, and since 5.8.1 has been magical:
$ perl -MDevel::Peek -e '$v = v5.500; Dump($v);'
SV = PVMG(0x81ae810) at 0x816e8b8
REFCNT = 1
FLAGS = (RMG,POK,pPOK,UTF8)
IV = 0
NV = 0
PV = 0x81874c8 "\5\307\264"\0 [UTF8 "\x{5}\x{1f4}"]
CUR = 3
LEN = 4
MAGIC = 0x818ee80
MG_VIRTUAL = &PL_vtbl_utf8
MG_TYPE = PERL_MAGIC_utf8(w)
MG_LEN = 2
MAGIC = 0x81b0b78
MG_VIRTUAL = 0
MG_TYPE = PERL_MAGIC_v-string(V)
MG_LEN = 6
MG_PTR = 0x8184710 "v5.500"
Perl version objects (native in v5.10.0 and available from CPAN for earlier
releases) uses a very different storage format:
$ perl -MDevel::Peek -Mversion -e '$v = qv(v5.500); Dump($v);'
SV = PV(0x816eae8) at 0x816dcdc
REFCNT = 1
FLAGS = (ROK,OVERLOAD)
RV = 0x816e8ac
SV = PVHV(0x8172760) at 0x816e8ac
REFCNT = 1
FLAGS = (OBJECT,SHAREKEYS)
IV = 3
NV = 0
STASH = 0x8197220 "version"
ARRAY = 0x81b6c20 (1:1, 2:1)
hash quality = 90.0%
KEYS = 3
FILL = 2
MAX = 1
RITER = 0
EITER = 0x0
Elt "original" HASH = 0xb45e44f2
SV = PV(0x816eba8) at 0x81b4228
REFCNT = 1
FLAGS = (POK,pPOK)
PV = 0x8197760 "v5.500"\0
CUR = 6
LEN = 8
Elt "qv" HASH = 0x18c4b28a
SV = IV(0x818de44) at 0x816dcf4
REFCNT = 1
FLAGS = (IOK,pIOK)
IV = 1
Elt "version" HASH = 0x68c27e33
SV = RV(0x819a54c) at 0x81c6730
REFCNT = 1
FLAGS = (ROK)
RV = 0x816dd90
SV = PVAV(0x8172c64) at 0x816dd90
REFCNT = 1
FLAGS = ()
IV = 0
NV = 0
ARRAY = 0x81c0988
FILL = 2
MAX = 3
ARYLEN = 0x0
FLAGS = (REAL)
Elt No. 0
SV = IV(0x818e05c) at 0x816e930
REFCNT = 1
FLAGS = (IOK,pIOK)
IV = 5
Elt No. 1
SV = IV(0x818e060) at 0x816e900
REFCNT = 1
FLAGS = (IOK,pIOK)
IV = 500
Elt No. 2
SV = IV(0x818e068) at 0x81b421c
REFCNT = 1
FLAGS = (IOK,pIOK)
IV = 0
PV = 0x816e8ac ""
CUR = 0
LEN = 0
The fact that the former (v-strings) were used for a while as $VERSION
initializers is an historical aberration - a failure of imagination - that
relied on the behavior of Perl's internals in a somewhat unhealthy fashion.
> Of course, this gets you in trouble:
>
> my $s = chr 200; # not unicode, but native 8-bit(??)
> substr $s, 0, 0, chr 500;
> $s =~ /ΓΌ/; # now interpreted as unicode
>
> This is the insane part - I wouldn't expect even an expert perl programmer
> to predict how $s gets interpreted here.
This is a contrived example because you are going out of your way to manufacture
bad code. Just because you *can* use chr() with values > 255 and Perl turns on
the UTF8 flag in the supreme hope that you knew what you were doing, doesn't
make this irredeemably broken. You broke $s by mixing your string-types using a
low-level function that has no knowledge of unicode semantics, *nor should it*.
A more realistic example is a PV containing ASCII text has a UTF8 string
concatanated to it. This works as designed - the original string is upgraded to
UTF8 and the second string appended and well-formed UTF8 is assured.
John
Thread Previous
|
Thread Next