develooper Front page | perl.perl5.porters | Postings from October 2016

Re: hv.h hek definition

Thread Previous | Thread Next
From:
demerphq
Date:
October 22, 2016 09:15
Subject:
Re: hv.h hek definition
Message ID:
CANgJU+XejU9U7UBm8LO44z2fAaR4_WJoWThLfztVcNHC0LcY7g@mail.gmail.com
On 14 October 2016 at 05:46, Todd Rinaldo <toddr@cpanel.net> wrote:
>
>> On Sep 28, 2016, at 10:27 PM, Father Chrysostomos <sprout@cpan.org> wrote:
>>
>> Todd Rinaldo wrote:
>>> IMO, the declaration for hek_key is VERY misleading. In the context of a
>>> struct, what you end up with when you do char hek_key[1]; is a char
>>> pointer followed by a char.
>>
>> char hek_key[1] stores the char array's members directly in the
>> struct.  There is no pointer stored in the struct.
>>
>
> Right this was my first time seeing a char[1] hanging off the end of a struct so one could dynamically allocate a variable size string off the end of the struct without the need for a pointer. Neat trick.

FWIW: It is a common pattern in the core.

> Digging further I now have a few questions:
>
> 1. The only value of HvHASKFLAGS is for Perl_hv_assert to make sure the flag has been set properly. It does this by checking HeKUTF8 and HeKWASUTF8 against every he/hek in the hash and determining if any of them are set. The only other use of this I can find is to turn the flag on and off. I can find no evidence in perl or on grep.cpan.me that anything is using the flag. Does it actually have value?

It is used by serializers like Storable, and if I had noticed it
previously I would have used it in Sereal, and likely will in a new
release. :-)

Basically it tells a serializer "this hash has weird keys, dont use
optimised code to serialize it".

Quite useful actually. :-)

> 2. HEK_FLAGS seems too complicated. Rather than being stored as a "char hek_flags;" in the hek struct, It hides off the tail end of the hek_key. From what I've been able to determine, it is this way because the hek_key used to be a PV way back when it wasn't in its own struct. So instead of the HEK_FLAGS macro being something simple like:
>
> (hek)->hek_flags
>
> It instead has to be something complicated like:
>
> (*((unsigned char *)(HEK_KEY(hek))+HEK_LEN(hek)+1))
>
> I can only imagine that the former would be much cheaper computationally. It is true that when an SV rather than the normal C string hangs off the end of a HEK, the flag is wasted. However from what I can tell, this is an extreme rarity (related to magic) that an SV is ever used in a hek. I could probably produce some anecdotal proof of the rarity of SVs if it was needed.
>
> I have attached a patch in this email to show what I think would need to be changed to simplify HEK_FLAGS.

Beside the concern that bulk88 had about the compiler possibly padding
the struct because of this, I do agree with would be saner if the
flags were the first byte and not the last.

On the other hand, given PL_strtab (see below), and how malloc blocks
tend to be managed, I wonder if making heks 3 or 4 bytes larger would
actually make much of a  difference in actual memory use. I wonder if
just making the flags a U32 or something efficient for the CPU would
be better.

> 3. This is still stewing in my head but it occurs to me that if you no longer had a flag hanging off the end of your hek_key, you could store a COW counter. I'm not sure what use cases that would provide but it might be worth considering.

As bulk88 somewhat cryptically mentioned, most HEK's live inside of
PL_strtab, a hash whose values are the refcounts of the HEK, and which
are then shared amongst all hashes. So a store in an empty hash goes
through the following psuedo steps:

normalize key and set flags
calculate hash value
lookup hash value in PL_strtab
if (found) bump_refcount and return found HEK
create new HEK, insert into PL_strtab with a refcount of 1
return HEK.

I imagine there are ways to share the PL_strtab logic such that you
could use them inside of special SvPV's as part of a COW
infrastructure. Whether doing so would be a win is not clear to me. It
sounds like bulk88 did something along those lines and might have an
opinion. :-)

> This patch seems to pass tests. It seems too simple, but then that's the beauty of using macros, right?

Personally I think the patch is probably safe with the assert added
that bulk88 mentioned, but I kinda wonder why it wasn't done that way
in the first place, and would want to dig a bit to see why not before
I applied it. On the other hand I like the idea of applying it, it
definitely is simpler code.

Yves



-- 
perl -Mre=debug -e "/just|another|perl|hacker/"

Thread Previous | Thread Next


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About