develooper Front page | perl.perl5.porters | Postings from March 2017

Re: proof-of-concept short-string PVs

Thread Previous | Thread Next
March 30, 2017 17:12
Re: proof-of-concept short-string PVs
Message ID:
Dave Mitchell wrote:
> TD;DR: I tried storing short strings directly in the SV head: a bit faster,
> but probably not practical. A variant might be practical.
> I've just pushed the branch davem/shpv_poc which contains some
> proof-of-concept code for adding a new SV type, SVt_SHPV, which uses the
> sv_any and sv_u fields of an SV head as a buffer to hold short strings
> (<=14 chars on 64-bit builds), eliminating the need for an SV body and
> malloced PVX buffer.
> The upside of this concept is reduced memory and CPU usage for short
> strings; the downside is that SvCUR(), SvPVX() etc are now more expensive,
> requiring an (SvTYPE(sv) == SVt_SHPV ? ... : ...) conditional. Also,
> SVt_SHPV's can't be used where the string is also used in numeric context
> or needs magic attaching; in those cases it will get upgraded to an
> SVt_PVIV or SVt_PVMG and a real string buffer will be malloc()ed.
> My provisional conclusion from those timings is that the more complex
> definitions of SvCUR() etc add about 10%; with a mix of short and longer
> strings we claw back that time by implementing SVt_SHPVs; and if
> everything is short, we more than claw back the 10%, resulting in an
> overall 10% speedup.
> However, the issues with this implementation are:
> 1) anything that directly manipulates the PV body's PVX pointer will
> break; there's quite a bit of code in core which does things like buffer
> stealing, e.g.:
>     SvPV_free(nsv);
>     SvPV_set(nsv, SvPVX(sv));
>     SvPV_set(sv,NULL);
> these could probably all be fixed in one go by adding a generic 'steal PVX
> buffer' function.

The SvPV_set() macro is very high risk/barely public (its perlapi marked 
"Am" in sv.h so its technically public but its docs say it is high risk 
to use) . It will wreck COWs. It is just a synonym for "SvPVX(sv) = 
newpvbuf". Any public XS code that uses it is suspect anyways due to 
probably flawed COW handling. I wouldn't worry about breaking it further 
since code that uses it is probably broken already. Remember perl also 
supports static C global strings through SvLEN = 0 trick.

> However, a more subtle and tricky problem is that that sv_upgrade(), when
> upgrading from an SVt_SHPV to SVt_PV or higher, allocates a buffer and so
> changes the value of SvPVX. So code like this:
>     s = SvPV(sv,len);
>     if (...)
>         SvUPGRADE(sv, SVt_PVMG);
>     ... do stuff with s ...
> will break because after the upgrade, s is no longer valid. I can't see
> any way to fix this.

DEBUGGING (common enough) or PERL_POISON (not really common) should 
write a freed poison pattern on the old short string when returning it 
to the arena.

> I can think of two other ways to implement short PVs. The first is to
> store the string in the head as before, but rather than adding a new type,
> SVt_SHPV, use a flag. This will suffer from the same problems as described
> above, but in addition, code which assumes that SVt_PV and greater have a
> body will break.
> The second way is is to allocate a body, then use the xpv_cur and
> xpvlenu_len fields of the body to store the short string. This will
> probably solve the sv_upgrade issue, but not the SvPV_set() issue, and of
> course has the expense of requiring a body.

That means that all scalar SV types, SVt_PV through SVt_PVLV can 
potentially be short strings at all times right? What happens to svu in 
SV head if CUR/LEN are in a union to make a short string? unused? Do we 
go back to the pre-5.10 3 field (12/16 bytes) SV head (prior to svu 
being implemented in 5.10)?

> On the other hand, it can be
> upgraded to SVt_PVIV, SVt_PVMG etc without issue. So it's probably the way
> forward.

Won't the body ptr be realloced (the small body returned to arena and 
larger one checked out of arena) during the sv_upgrade() and therefore 
realloc SvPVX ptr that is stored a union of CUR/LEN?

General thoughts. Although this is less mem savings than your idea. Why 
not use SVt_PVNV body to store short strings? CUR and LEN members remain 
binary compatible with all other PV type. Both are 32/64bit size_t read 
CPU instructions. Your implementation implements CUR as a char 
instruction (reminds me of the OOK hack) and LEN is a compile time 
constant. union _xnvu is almost always 8 bytes (64 bit FPs) or 12 (80 
bit long double on x86-32) or 16 (80 bit long double on x64). (4 bytes 
requires perl NVs to be shorts, a build config nobody ever tried with 
perl I think, plus 32 bit CPU so HV *    xgv_stash is 4 bytes). union 
_xivu is 4 or 8 bytes. SvPVX would point to &((struct 
xpvnv*)sv->sv_any)->xiv_u.xivu_iv . In theory my idea would allow a 
SVt_PVLV to also be used for even bigger short strings, but the 1st 2 
fields (xmg_u/xmg_stash) of SVt_PVNV aren't allocated (ghost fields), 
but they are alloced and wouldnt be used (wasted) in a SVt_PVLV 
containing a short string.

Thread Previous | Thread Next Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at | Group listing | About