develooper Front page | perl.perl5.porters | Postings from July 2021

elevator pitch for IV vs PV serialisation

Thread Next
From:
Nicholas Clark
Date:
July 3, 2021 13:40
Subject:
elevator pitch for IV vs PV serialisation
Message ID:
20210703134006.GP11066@etla.org
So, we currently say in "What needs an RFC? What can just be a PR?"

1. Language changes (feature changes to the parser, tokeniser)
2. Command line options
3. Adding/removing warnings (entries in `perldiag.pod`)
4. Significant changes to when an existing warning triggers

https://github.com/Perl/RFCs/blob/master/docs/process.md


The previous PSC had wondered whether we should add

5. Bug fixes that have the potential for significant runtime behaviour changes
   that can't be warned about and break things


So, here goes

Problem - perl caches string, integer and floating point representations of
values, and does this on reads. Hence serialisers can't distinguish between
values that started as numbers but got used as strings, and the other way.

$ perl -MDevel::Peek -e '$a = 42; $b = "$a"; Dump $a'
SV = PVIV(0x556b723ebad0) at 0x556b723ee9b8
  REFCNT = 1
  FLAGS = (IOK,POK,pIOK,pPOK)
  IV = 42
  PV = 0x556b723df720 "42"\0
  CUR = 2
  LEN = 10
$ perl -MDevel::Peek -e '$a = "42"; $b = $a + 0; Dump $a'
SV = PVIV(0x556abce6aa10) at 0x556abce6d8f8
  REFCNT = 1
  FLAGS = (IOK,POK,IsCOW,pIOK,pPOK)
  IV = 42
  PV = 0x556abceaa690 "42"\0
  CUR = 2
  LEN = 10
  COW_REFCNT = 1

(ignore that IsCOW. You can't rely on that)


Proposed solution:

Don't set SVf_POK in Perl_sv_2pv_flags() when converting IVs to PVs.
Basically this:

diff --git a/sv.c b/sv.c
index 3c69dece01..307846fb45 100644
--- a/sv.c
+++ b/sv.c
@@ -3141,7 +3141,19 @@ Perl_sv_2pv_flags(pTHX_ SV *const sv, STRLEN *const lp, const U32 flags)
         Move(ptr, s, len, char);
         s += len;
         *s = '\0';
-        SvPOK_on(sv);
+        /* We used to call SvPOK_on(). Whilst this is fine for (most) Perl code,
+           it means that after this stringification is cached, there is no way
+           to distinguish between values originally assigned as $a = 42; and
+           $a = "42"; (or results of string operators vs numeric operators)
+           where the value has subsequently been used in the other "context"
+           and had a value cached.
+           This (somewhat) hack means that we retain the cached stringification,
+           but the existing SvPV() macros end up entering this function (which
+           returns the cached value) instead of using it directly inline.
+           However, the result is that if a value is SVf_IOK|SVf_POK then it
+           originated as "42", whereas if it's SVf_IOK then it originated as 42.
+           (ignore SVp_IOK and SVp_POK) */
+        SvPOKp_on(sv);
     }
     else if (SvNOK(sv)) {
         if (SvTYPE(sv) < SVt_PVNV)


Benefits of this:

$ ./perl -Ilib -MDevel::Peek -e '$a = 42; $b = "$a"; Dump $a'
SV = PVIV(0x557eacb3b2d0) at 0x557eacb3a7a0
  REFCNT = 1
  FLAGS = (IOK,pIOK,pPOK)
  IV = 42
  PV = 0x557eacb841d0 "42"\0
  CUR = 2
  LEN = 10
$ ./perl -Ilib -MDevel::Peek -e '$a = "42"; $b = $a + 0; Dump $a'
SV = PVIV(0x564aae2d82d0) at 0x564aae2d77a0
  REFCNT = 1
  FLAGS = (IOK,POK,IsCOW,pIOK,pPOK)
  IV = 42
  PV = 0x564aae320f60 "42"\0
  CUR = 2
  LEN = 10
  COW_REFCNT = 1


if (flags & SVf_IOK|SVf_NOK) && !(flags & SVf_POK)
    serealise as number
else if (flags & SVf_POK)
    serealise as string
else
    the existing guesswork ...


Potential problems:

Something is bound to break.

As to existing core code, the branch I pushed also has this hunk:

diff --git a/dist/Data-Dumper/Dumper.xs b/dist/Data-Dumper/Dumper.xs
index 6d73beca30..78005d3726 100644
--- a/dist/Data-Dumper/Dumper.xs
+++ b/dist/Data-Dumper/Dumper.xs
@@ -1569,7 +1569,7 @@ Data_Dumper_Dumpxs(href, ...)
                    else
                        (void)SvOK_off(name);

-                   if (SvPOK(name)) {
+                   if (SvPOKp(name)) {
                        if ((SvPVX_const(name))[0] == '*') {
                            if (SvROK(val)) {
                                switch (SvTYPE(SvRV(val))) {


because, sigh, Data::Dumper is testing SvPOK.


However, it's wrong. The change exposes an existing bug. It's easier to
demonstrate with NVs:

$ /media/Tux/perls/bin/perl5.18.0 -MData::Dumper -e 'print Data::Dumper->Dumpxs(["pi"], [4.5/1.5])'
$3 = 'pi';
$ /media/Tux/perls/bin/perl5.20.0 -MData::Dumper -e 'print Data::Dumper->Dumpxs(["pi"], [4.5/1.5])'
$VAR1 = 'pi';


but it also applies to string overloading. I've just uploaded Data::Dumper
2.182_51 to CPAN with a general fix.


Anyway, changing this is likely to expose other problems, although I'll
guess that 50% will be existing bugs that can be tickled on 5.34.0 with
carefully crafted test casses.


Prototype implementation in https://github.com/nwc10/perl5/tree/IV-vs-PV

Please don't merge this (yet), as the DD fix should go into blead first
(once 2.183 is released) then these two commits get rebased on that.


Nicholas Clark

Thread Next


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About