Front page | perl.perl5.porters |
Postings from November 2016
Re: A possible new approach to COW - COW_META.
Thread Previous
|
Thread Next
From:
Father Chrysostomos
Date:
November 6, 2016 20:51
Subject:
Re: A possible new approach to COW - COW_META.
Message ID:
72BEF3C7-76F1-474F-8B62-5E743CD62B26@cpan.org
On Nov 6, 2016, at 7:13 AM, demerphq <demerphq@gmail.com> wrote:
> Today I pushed a branch, yves/cow_meta.
>
> This branch is my take on reworking COW so it does not store the
> refcount in the string.
>
> The idea is to introduce a new struct cow_meta:
>
> struct cow_meta {
> U32 cm_flags;
> U32 cm_refcnt;
> STRLEN cm_len;
> };
>
> typedef struct cow_meta COW_META;
>
> This new type is currently allocated out the HE arenas.
>
> The xpv_len_u union in the _XPV_HEAD struct fragment gets modified to
> support a new member:
>
> union { \
> STRLEN xpvlenu_len; /* allocated size */ \
> char * xpvlenu_pv; /* regexp string */ \
> COW_META * xpvlenu_cow_meta; /* ref to refcount struct */ \
> } xpv_len_u
>
> We then change COW so that it works as follows.
>
> To "en-COW" an existing SVPV we allocate a new COW_META structure and
> then, after copying the length from the SVPV to the COW_META
> structure, we set the xmp_len_u slot of the SVPV to point at the
> COW_META.
>
> To copy a COW structure we copy the PV, and copy the COW_META
> structure, and increment the refcount.
>
> The immediate advantage of this new structure is that we can COW *ANY*
> buffer, without worrying about whether it is overallocated or not, and
> regardless of whether it is RO or RW. The downside is far more
> bookkeeping when we initially COW a string, as we have to allocate and
> initialize the COW_META structure. However I consider the fact that we
> lose most of the weird edge cases where a buffer could not be COW'ed
> combined with the large range of the refcount to be good justification
> for this change.
>
> The code as pushed passes all tests.
Have you benchmarked it at all? I tried something similar at first (but not quite as clean), and found that it made things slower than no COW at all.
>
> Next steps...
> I believe this structural change brings further benefits, as we can
> switch the refcount data in PL_strtab to hold COW_META references, and
> unify the logic for managing sharing strings from keys, and strings
> from arbitrary buffers. I think this will greatly simplify our string
> sharing logic once it is done. I am working on this now, but please
> don't let that stop anyone from improving the branch as pushed.
> For instance it occurs to me that the COW_META structure could be
> inlined into the shared_he structure so that entries in PL_strtab dont
> need independent allocation.
>
> For now...
> Take a look at the new branch and see what you think. As a patch
> sequence it is a little raw,
My perennial complaint: Unnecessary whitespace changes make it harder to see the actual code changes (and harder to blame, etc.).
PerlIO::encoding:
- SvLEN_set(e->dataSV, 0); /* Hands off sv.c - it isn't yours */
+ SvLEN_set(e->dataSV, 0); /* Hands off sv.c - it isn't yours (XXX: PERL_ANY_COW)*/
I don’t understand that comment. I non-COW string with SvLEN==0 has a ‘static’ buffer that sv_clear should not free. Simple enough. A cowy SvLEN==0 is a shared hash key.
hv.c:
+ /* how does the following line work when the entry holds an SV?!! */
if (memNE(HeKEY(entry),key,klen)) /* is this it? */
SVs only come into play for tied hashes, if I understand correctly.
op.c:ck_svconst:
- if (!SvREADONLY(sv) && !SvIsCOW(sv) && SvCANCOW(sv)) {
- SvIsCOW_on(sv);
- CowREFCNT(sv) = 0;
+ if (
+ !SvREADONLY(sv) &&
+ !SvIsCOW(sv) &&
+ SvCANCOW(sv)
+ ) {
+ sv_cow_meta_setup(sv);
Based on your description, that code can just be deleted now. The COW setup should not be necessary here.
newCOW_META (rather than new_COW_META) would match existing conventions more closely.
sv.c:sv_setsv_flags:
+ : ( len /* XXX PERL_ANY_COW
+ len == 0 means private buffer which
+ for we will assume we cannot cow.
+ see ext/PerlIO-encoding/encoding.xs */
s/XXX |for //g;
In case you meant ‘for now’, I think this is a permanent state of affairs and should remain that way. A ‘static’ PV in an SV is a very useful feature. But it may not be truly static, so we cannot do COW with it, as some XS module might free the buffer while we are still hanging on to it.
sv.c:sv_setsv_cow:
+ /* NEW COW XXX - why mutable, why sv_buf_to_ro? */
new_pv = SvPVX_mutable(sstr);
sv_buf_to_ro(sstr);
Why mutable? Ask Nicholas. I’ve never really understood why we need a distinction between SvPVX_const and SvPVX_mutable. That’s not my line of code.
We need sv_buf_to_ro here if we turned it off to increment the reference count. It can be deleted, as long as the sv_buf_to_rw a few lines earlier is also deleted.
sv.c:S_sv_uncow:
+ sv_buf_to_rw(sv); /* NOOP if RO-ing not supported *//* XXX: is this require anymore */
+ if (cm->cm_refcnt) {
+ cm->cm_refcnt--;
+ sv_buf_to_ro(sv);
+ } else {
+ del_COW_META(cm);
+ }
sv_buf_to_rw is required if the SV is not going to be COW after this. It should go in the else.
These comments are just from a quick skim; I have not tested or confirmed any of this. If the whitespace is screwy, it’s because I copied and pasted from gitweb.
That’s all I have time for.
> but I wanted to push it out to a wider
> audience so people could take a look.
>
> Thanks...
> To FC for getting it working in the first place, changing it once it
> works is a lot easier than doing it in the first place.
I’m happy that I got the ball rolling. Thank you for continuing.
Thread Previous
|
Thread Next