Front page | perl.perl5.porters |
Postings from March 2017
Re: proof-of-concept short-string PVs
March 30, 2017 17:12
Re: proof-of-concept short-string PVs
Message ID: firstname.lastname@example.org
Dave Mitchell wrote:
> TD;DR: I tried storing short strings directly in the SV head: a bit faster,
> but probably not practical. A variant might be practical.
> I've just pushed the branch davem/shpv_poc which contains some
> proof-of-concept code for adding a new SV type, SVt_SHPV, which uses the
> sv_any and sv_u fields of an SV head as a buffer to hold short strings
> (<=14 chars on 64-bit builds), eliminating the need for an SV body and
> malloced PVX buffer.
> The upside of this concept is reduced memory and CPU usage for short
> strings; the downside is that SvCUR(), SvPVX() etc are now more expensive,
> requiring an (SvTYPE(sv) == SVt_SHPV ? ... : ...) conditional. Also,
> SVt_SHPV's can't be used where the string is also used in numeric context
> or needs magic attaching; in those cases it will get upgraded to an
> SVt_PVIV or SVt_PVMG and a real string buffer will be malloc()ed.
> My provisional conclusion from those timings is that the more complex
> definitions of SvCUR() etc add about 10%; with a mix of short and longer
> strings we claw back that time by implementing SVt_SHPVs; and if
> everything is short, we more than claw back the 10%, resulting in an
> overall 10% speedup.
> However, the issues with this implementation are:
> 1) anything that directly manipulates the PV body's PVX pointer will
> break; there's quite a bit of code in core which does things like buffer
> stealing, e.g.:
> SvPV_set(nsv, SvPVX(sv));
> these could probably all be fixed in one go by adding a generic 'steal PVX
> buffer' function.
The SvPV_set() macro is very high risk/barely public (its perlapi marked
"Am" in sv.h so its technically public but its docs say it is high risk
to use) . It will wreck COWs. It is just a synonym for "SvPVX(sv) =
newpvbuf". Any public XS code that uses it is suspect anyways due to
probably flawed COW handling. I wouldn't worry about breaking it further
since code that uses it is probably broken already. Remember perl also
supports static C global strings through SvLEN = 0 trick.
> However, a more subtle and tricky problem is that that sv_upgrade(), when
> upgrading from an SVt_SHPV to SVt_PV or higher, allocates a buffer and so
> changes the value of SvPVX. So code like this:
> s = SvPV(sv,len);
> if (...)
> SvUPGRADE(sv, SVt_PVMG);
> ... do stuff with s ...
> will break because after the upgrade, s is no longer valid. I can't see
> any way to fix this.
DEBUGGING (common enough) or PERL_POISON (not really common) should
write a freed poison pattern on the old short string when returning it
to the arena.
> I can think of two other ways to implement short PVs. The first is to
> store the string in the head as before, but rather than adding a new type,
> SVt_SHPV, use a flag. This will suffer from the same problems as described
> above, but in addition, code which assumes that SVt_PV and greater have a
> body will break.
> The second way is is to allocate a body, then use the xpv_cur and
> xpvlenu_len fields of the body to store the short string. This will
> probably solve the sv_upgrade issue, but not the SvPV_set() issue, and of
> course has the expense of requiring a body.
That means that all scalar SV types, SVt_PV through SVt_PVLV can
potentially be short strings at all times right? What happens to svu in
SV head if CUR/LEN are in a union to make a short string? unused? Do we
go back to the pre-5.10 3 field (12/16 bytes) SV head (prior to svu
being implemented in 5.10)?
> On the other hand, it can be
> upgraded to SVt_PVIV, SVt_PVMG etc without issue. So it's probably the way
Won't the body ptr be realloced (the small body returned to arena and
larger one checked out of arena) during the sv_upgrade() and therefore
realloc SvPVX ptr that is stored a union of CUR/LEN?
General thoughts. Although this is less mem savings than your idea. Why
not use SVt_PVNV body to store short strings? CUR and LEN members remain
binary compatible with all other PV type. Both are 32/64bit size_t read
CPU instructions. Your implementation implements CUR as a char
instruction (reminds me of the OOK hack) and LEN is a compile time
constant. union _xnvu is almost always 8 bytes (64 bit FPs) or 12 (80
bit long double on x86-32) or 16 (80 bit long double on x64). (4 bytes
requires perl NVs to be shorts, a build config nobody ever tried with
perl I think, plus 32 bit CPU so HV * xgv_stash is 4 bytes). union
_xivu is 4 or 8 bytes. SvPVX would point to &((struct
xpvnv*)sv->sv_any)->xiv_u.xivu_iv . In theory my idea would allow a
SVt_PVLV to also be used for even bigger short strings, but the 1st 2
fields (xmg_u/xmg_stash) of SVt_PVNV aren't allocated (ghost fields),
but they are alloced and wouldnt be used (wasted) in a SVt_PVLV
containing a short string.