Marc Lehmann wrote: > Easy, it is the only way for perl to internally represent characters with a > value of 500. > > If that 500 is for example the second character in v5.500, then this might > simply be the perl 5.500 version string (or part of an ip address in the > game "uplink" stored in a compact string form, e.g. v478.321.571.277). Just to be a pedant, v5.500 is a v-string, and since 5.8.1 has been magical: $ perl -MDevel::Peek -e '$v = v5.500; Dump($v);' SV = PVMG(0x81ae810) at 0x816e8b8 REFCNT = 1 FLAGS = (RMG,POK,pPOK,UTF8) IV = 0 NV = 0 PV = 0x81874c8 "\5\307\264"\0 [UTF8 "\x{5}\x{1f4}"] CUR = 3 LEN = 4 MAGIC = 0x818ee80 MG_VIRTUAL = &PL_vtbl_utf8 MG_TYPE = PERL_MAGIC_utf8(w) MG_LEN = 2 MAGIC = 0x81b0b78 MG_VIRTUAL = 0 MG_TYPE = PERL_MAGIC_v-string(V) MG_LEN = 6 MG_PTR = 0x8184710 "v5.500" Perl version objects (native in v5.10.0 and available from CPAN for earlier releases) uses a very different storage format: $ perl -MDevel::Peek -Mversion -e '$v = qv(v5.500); Dump($v);' SV = PV(0x816eae8) at 0x816dcdc REFCNT = 1 FLAGS = (ROK,OVERLOAD) RV = 0x816e8ac SV = PVHV(0x8172760) at 0x816e8ac REFCNT = 1 FLAGS = (OBJECT,SHAREKEYS) IV = 3 NV = 0 STASH = 0x8197220 "version" ARRAY = 0x81b6c20 (1:1, 2:1) hash quality = 90.0% KEYS = 3 FILL = 2 MAX = 1 RITER = 0 EITER = 0x0 Elt "original" HASH = 0xb45e44f2 SV = PV(0x816eba8) at 0x81b4228 REFCNT = 1 FLAGS = (POK,pPOK) PV = 0x8197760 "v5.500"\0 CUR = 6 LEN = 8 Elt "qv" HASH = 0x18c4b28a SV = IV(0x818de44) at 0x816dcf4 REFCNT = 1 FLAGS = (IOK,pIOK) IV = 1 Elt "version" HASH = 0x68c27e33 SV = RV(0x819a54c) at 0x81c6730 REFCNT = 1 FLAGS = (ROK) RV = 0x816dd90 SV = PVAV(0x8172c64) at 0x816dd90 REFCNT = 1 FLAGS = () IV = 0 NV = 0 ARRAY = 0x81c0988 FILL = 2 MAX = 3 ARYLEN = 0x0 FLAGS = (REAL) Elt No. 0 SV = IV(0x818e05c) at 0x816e930 REFCNT = 1 FLAGS = (IOK,pIOK) IV = 5 Elt No. 1 SV = IV(0x818e060) at 0x816e900 REFCNT = 1 FLAGS = (IOK,pIOK) IV = 500 Elt No. 2 SV = IV(0x818e068) at 0x81b421c REFCNT = 1 FLAGS = (IOK,pIOK) IV = 0 PV = 0x816e8ac "" CUR = 0 LEN = 0 The fact that the former (v-strings) were used for a while as $VERSION initializers is an historical aberration - a failure of imagination - that relied on the behavior of Perl's internals in a somewhat unhealthy fashion. > Of course, this gets you in trouble: > > my $s = chr 200; # not unicode, but native 8-bit(??) > substr $s, 0, 0, chr 500; > $s =~ /ΓΌ/; # now interpreted as unicode > > This is the insane part - I wouldn't expect even an expert perl programmer > to predict how $s gets interpreted here. This is a contrived example because you are going out of your way to manufacture bad code. Just because you *can* use chr() with values > 255 and Perl turns on the UTF8 flag in the supreme hope that you knew what you were doing, doesn't make this irredeemably broken. You broke $s by mixing your string-types using a low-level function that has no knowledge of unicode semantics, *nor should it*. A more realistic example is a PV containing ASCII text has a UTF8 string concatanated to it. This works as designed - the original string is upgraded to UTF8 and the second string appended and well-formed UTF8 is assured. John