Front page | perl.perl5.porters |
Postings from March 2017
proof-of-concept short-string PVs
Thread Next
From:
Dave Mitchell
Date:
March 27, 2017 10:15
Subject:
proof-of-concept short-string PVs
Message ID:
20170327101537.GI3342@iabyn.com
TD;DR: I tried storing short strings directly in the SV head: a bit faster,
but probably not practical. A variant might be practical.
I've just pushed the branch davem/shpv_poc which contains some
proof-of-concept code for adding a new SV type, SVt_SHPV, which uses the
sv_any and sv_u fields of an SV head as a buffer to hold short strings
(<=14 chars on 64-bit builds), eliminating the need for an SV body and
malloced PVX buffer.
The upside of this concept is reduced memory and CPU usage for short
strings; the downside is that SvCUR(), SvPVX() etc are now more expensive,
requiring an (SvTYPE(sv) == SVt_SHPV ? ... : ...) conditional. Also,
SVt_SHPV's can't be used where the string is also used in numeric context
or needs magic attaching; in those cases it will get upgraded to an
SVt_PVIV or SVt_PVMG and a real string buffer will be malloc()ed.
The branch has progressed enough that miniperl builds and very simple test
programs can run. The following:
my $n = 12; # or 24
for (1..1_000_000) {
my $x = 'a' x $n;
my $y = substr($x, 3, $n-5);
my $z = $y . "xyz";
}
gives these CPU cycle counts under 'perf stat' (lower is better):
A blead;
B As A, but with SvCUR() etc slowed down with the
(SvTYPE(sv) == SVt_SHPV) condition;
C as B, but with SVt_SHPV PVs implemented;
n=12 n=24
------------- -------------
A 1,227,902,617 1,323,312,098
B 1,443,647,414 1,474,730,101
C 1,149,521,154 1,322,818,824
The n=12/24 variation represents whether the calculated values will fit
in an SVt_SHPV or not.
My provisional conclusion from those timings is that the more complex
definitions of SvCUR() etc add about 10%; with a mix of short and longer
strings we claw back that time by implementing SVt_SHPVs; and if
everything is short, we more than claw back the 10%, resulting in an
overall 10% speedup.
However, the issues with this implementation are:
1) anything that directly manipulates the PV body's PVX pointer will
break; there's quite a bit of code in core which does things like buffer
stealing, e.g.:
SvPV_free(nsv);
SvPV_set(nsv, SvPVX(sv));
SvPV_set(sv,NULL);
these could probably all be fixed in one go by adding a generic 'steal PVX
buffer' function.
However, a more subtle and tricky problem is that that sv_upgrade(), when
upgrading from an SVt_SHPV to SVt_PV or higher, allocates a buffer and so
changes the value of SvPVX. So code like this:
s = SvPV(sv,len);
if (...)
SvUPGRADE(sv, SVt_PVMG);
... do stuff with s ...
will break because after the upgrade, s is no longer valid. I can't see
any way to fix this.
I can think of two other ways to implement short PVs. The first is to
store the string in the head as before, but rather than adding a new type,
SVt_SHPV, use a flag. This will suffer from the same problems as described
above, but in addition, code which assumes that SVt_PV and greater have a
body will break.
The second way is is to allocate a body, then use the xpv_cur and
xpvlenu_len fields of the body to store the short string. This will
probably solve the sv_upgrade issue, but not the SvPV_set() issue, and of
course has the expense of requiring a body. On the other hand, it can be
upgraded to SVt_PVIV, SVt_PVMG etc without issue. So it's probably the way
forward.
--
The warp engines start playing up a bit, but seem to sort themselves out
after a while without any intervention from boy genius Wesley Crusher.
-- Things That Never Happen in "Star Trek" #17
Thread Next
-
proof-of-concept short-string PVs
by Dave Mitchell