develooper Front page | perl.perl5.porters | Postings from May 2017

Re: proof-of-concept short-string PVs

Thread Previous | Thread Next
Dave Mitchell
May 5, 2017 12:06
Re: proof-of-concept short-string PVs
Message ID:
On Mon, Mar 27, 2017 at 11:15:37AM +0100, Dave Mitchell wrote:
> TD;DR: I tried storing short strings directly in the SV head: a bit faster,
> but probably not practical. A variant might be practical.
> I've just pushed the branch davem/shpv_poc which contains some
> proof-of-concept code for adding a new SV type, SVt_SHPV, which uses the
> sv_any and sv_u fields of an SV head as a buffer to hold short strings
> (<=14 chars on 64-bit builds), eliminating the need for an SV body and
> malloced PVX buffer.

After a lot of thought, I've come up with a plan B. This plan allows for
short PVs and should  also allow COW to be improved. I have pushed a
branch which does the first part of this:


I'll await feedback before doing any more work on this branch.

The basic idea is to add an extra pointer-sized field to the XPV structure
(and all those SV types which inherit from it); so there's now (xpv_cur,
xpv_len, extra) instead of just (xpv_cur, xpv_len) in every string-capable
body type.

This field has two main purposes.

First, it allows short strings. If a new flag, SVf_SHORTPV, is set, then
sv_any points to the body as normal, but sv_u.svu_pv
points into the middle of the same body, where the (xpv_len, extra) fields
are used as a 16-byte buffer. xpv_cur still holds the current string
length. This is the bit I've implemented so far, and I'll discuss it
further below.

The second use for the extra field is to provide an alternative to the
current COW refcount, which is stored in a spare byte, if any, after the
end of the string. The two main problems with that approach is that
strings "randomly" becomes uncowable if there isn't a spare byte. So code
that works ok, suddenly becomes slow for particular string lengths.
Secondly, COW can't be used on strings which won't allow a writable
refcount byte, such as SvLEN(()==0 strings and mmap()ed R/O strings.
[There are still some issues with SvLEN()==0 SVs however even with mt new

Instead, by using the extra field in the XPV we can avoid ever needing to
modify the string buffer itself. I was thinking of reviving a variant of
Nicholas' original COW implementation ("COW1"). In that, rather than
having a reference count in the shared referent (the string buffer), he
stored a circular chain of pointers linking all the referers (i.e. the
SVs) together. When an SV was freed, it would remove itself from the
chain; if the chain became empty, it would free the string buffer.

COW1 used the IV field of a non-IOK PVIV body to store the chain pointer.
This obviously restricted which SVs were cowable. My hypothetical COW3
would instead use the "extra" field available on all SV string types to
store the chain pointer, so virtually any string would be COWable.

There was a second issue with COW1: for high COW numbers, the chain would
be long, and to free all N SVs in the chain would require a full traversal
of the chain for every sv_clear() call - you have to follow the chain
right round to find who in the chain is pointing at you. So freeing N SVs
took O(N^2) time.

If this is an issue, I intend to use a hybrid approach. If the COW count
is 2, then use the "extra" field in the 2 bodies to link the two SVs
together in a chain of length 2 a la COW1. (I'm not sure what to do yet
for a COW count of 1: make "extra" point to itself, or set to NULL? - this
is an implementation detail.) I think a COW count of <= 2 will be very

If the COW count is 3+, then instead, an integer field is allocated from
an arena, which holds the ref count,and all the "extra" fields instead
point to that ref count field.  .

I also think it might be possible to hijack the SVf_SHORTPV code paths to
make shared HEKs include both SV body and string buffer, so that 'keys %h'
would still still have to allocate N SvTEMP SVs, but they would all share
the same body and HEK, rather than each requiring a different body as

Adding the "extra" field is a classic tradeoff. It makes COW easier (and
possibly more performant), and for short strings uses a lot less memory
and is a lot faster. For other strings (and string-capable SVs that don't
happen to be POK), it uses slightly more memory and is slightly slower.

The rest of this email will discuss in more detail what I've implemented
so far in the davem/cow3 branch. This branch adds the extra field and the
SVf_SHORTPV flag, and implements short strings.

Note that  while the SvLEN() macro has been changed to be a conditional
which returns a constant value if SVf_SHORTPV or xpv_len otherwise, the
SvCUR() and SvPVX() macros are unchanged. This is significant for
performance, since SvLEN() is used less frequently than the other two.

Since derived string types like XPVIV also have the extra field, they can
hold a short string as well as an integer value.

When upgrading a SVf_SHORTPV SV to a new body type, the old body isn't
discarded but remains as the SV's string buffer, so sv_any and sv_u.svu_pv
now point to two different bodies. This means that code calling sv_upgrade
doesn't see the buffer address change.

Note that nothing under ext/, dist/ or cpan/ needed fixing apart from
ext/PerlIO-encoding/, which did some direct PVX buffer stealing.
(And Peek.t, obviously.)

The main breakages in core itself were places that directly moved a PVX
buffer from one SV to another: typically in the implementation of matching
and substitution - places such as pp_subst.

That branch is just work in progress - once the COW bits are implemented
too, I'll need to go back and look more closely at things such as when it
should prefer COW, and when it should prefer copying a SHORTPV etc. i.e.
code in that branch isn't fully optimised yet.

Here are various crude bits of benchmarking against my branch built with
and without -DPERL_COPY_ON_WRITE3. They seem to indicate that non-SHORTPV
SVs are penalised slightly in performance and memory usage, while SHORTPV
SVs see big gains. Note particularly the C columns below, which show
around 30% improvements when just creating many short strings.

p0: davem/cow3 branch - non-threaded, -O2. Linux x86_64.
p1: as p0, but with -DPERL_COPY_ON_WRITE3

"make test" over 3 runs:

    p0: 736.84 741.08 744.26 CPU seconds
    p1: 735.09 739.82 746.29 CPU seconds

    So this appears to be just noise.

Some random benchmarks:

    A: the code below with n=12 (so short strings)
    B: with n=24:

        my $n = 12; # or 24
        for (1..1_000_000) {
            my $x = 'a' x $n;
            my $y = substr($x, 3, $n-5);
            my $z = $y . "xyz";

    C: create lots of short strings:
        perl -e'my @a; push @a, "aaa$_" for 1..1_000_000;'

    D: create lots of medium strings:
        perl -e'my @a; push @a, "aaaaaaaaaaaaaaaaaaaaaa$_" for 1..1_000_000;'

    E: run mktables:
        ./miniperl -Ilib -Idist/Cwd -Idist/Cwd/lib -Idist/Carp/lib \
                    lib/unicore/mktables -C lib/unicore -P pod \
                    -maketest -makelist -p -w

    "perf stat" results in millions of CPU cycles (lower better):

                 A     B    C     D      E
        p0    1204  1311  864   935  57902
        p1    1064  1399  612   974  56763
              ----  ----  ---  ----  -----
               88%  107%  71%  104%    98%

    Max resident memory in Kbytes (lower better):

                 A     B      C       D       E
        p0    4272  4260  83476   99284  133648
        p1    4364  4436  60424  107216  136320
              ----  ----  -----  ------  ------
              102%  104%    72%    107%    102%

    In terms of Ir, about 20% of tests are slower (97.63..100.00)
    about 60% are unchanged (100.00) and 20% are faster: (100.00..147.78)
    with an average of 101.07. A similar spread of results is seen for the
    other fields.

The crew of the Enterprise encounter an alien life form which is
surprisingly neither humanoid nor made from pure energy.
    -- Things That Never Happen in "Star Trek" #22

Thread Previous | Thread Next Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at | Group listing | About