develooper Front page | perl.perl5.porters | Postings from October 2016

Re: hv.h hek definition

Thread Previous | Thread Next
bulk 88
October 22, 2016 03:04
Re: hv.h hek definition
Message ID:
Disclaimer I've poked around in this code,

Todd Rinaldo wrote:
>> On Sep 28, 2016, at 10:27 PM, Father Chrysostomos <>
>> wrote:
>> Todd Rinaldo wrote:
>>> IMO, the declaration for hek_key is VERY misleading. In the
>>> context of a struct, what you end up with when you do char
>>> hek_key[1]; is a char pointer followed by a char.
>> char hek_key[1] stores the char array's members directly in the 
>> struct.  There is no pointer stored in the struct.
> Right this was my first time seeing a char[1] hanging off the end of
> a struct so one could dynamically allocate a variable size string off
> the end of the struct without the need for a pointer. Neat trick.

IIRC C99's Flexible Array Member feature lets you write "char foo[];" as 
the last member in a C struct declaration. BC P5 is C89 compat, so the 
[1] has to be there. member hek_key is a flexible array member but in 

> 2. HEK_FLAGS seems too complicated. Rather than being stored as a
> "char hek_flags;" in the hek struct, It hides off the tail end of the
> hek_key. From what I've been able to determine, it is this way
> because the hek_key used to be a PV way back when it wasn't in its
> own struct. So instead of the HEK_FLAGS macro being something simple
> like:
> (hek)->hek_flags
> It instead has to be something complicated like:
> (*((unsigned char *)(HEK_KEY(hek))+HEK_LEN(hek)+1))
> I can only imagine that the former would be much cheaper
> computationally. It is true that when an SV rather than the normal C
> string hangs off the end of a HEK, the flag is wasted. However from
> what I can tell, this is an extreme rarity (related to magic) that an
> SV is ever used in a hek. I could probably produce some anecdotal
> proof of the rarity of SVs if it was needed.
> I have attached a patch in this email to show what I think would need
> to be changed to simplify HEK_FLAGS.

#define HeKEY_sv(he)		(*(SV**)HeKEY(he))

diff --git a/hv.h b/hv.h
index 0e773f2..2f79f03 100644
--- a/hv.h
+++ b/hv.h
@@ -45,6 +45,7 @@ struct he {
  struct hek {
      U32		hek_hash;	/* hash of key */
      I32		hek_len;	/* length of hash key */
+    char    hek_flags;  /* The flags associated with this key */
      char	hek_key[1];	/* variable-length hash key */
      /* the hash-key is \0-terminated */
      /* after the \0 there is a byte for flags, such as whether the key

What about alignment? HeKEY_sv() contains a 32/64 bit read in it. What 
about non-x86 CPUs that dont support unaligned reads without special 
measures? (and bulk88 has been unsuccessful in the past in arguing on 
P5P for use of unaligned memory reads, which all have to be done through 
a future P5P C API anyway, because either you just do *(U32*)0x1ABC0001, 
do *(__unaligned U32*)0x1ABC0001, do *(__attribute__((__packed__)) 
U32*)0x1ABC0001, do __aeabi_uread4(0x1ABC0001), or roll your own 
"inline" with 4 1 byte reads for braindead CCs, or a non-inlined version 
with a switch() (IE jump table in ASM) based on the last 2 bits of the 
pointer that does a 1 U32 read (ptr was really aligned), 2 U16 reads, or 
4 U8 reads.

+    char    hek_flags;  /* The flags associated with this key */
      char	hek_key[1];	/* variable-length hash key */

I also worry some CC will get the bright idea that you need 3 padding 
bytes between a single char and an inline char array for "performance" 
reasons. I would write a static assert that

STATIC_ASSERT_STMT(offsetof(struct hek, hek_key)-offset(struct hek, 
hek_flags) == 1));

to make sure no CC/platform starts wasting 3 bytes there.

> 3. This is still stewing in my head but it occurs to me that if you
> no longer had a flag hanging off the end of your hek_key, you could
> store a COW counter. I'm not sure what use cases that would provide
> but it might be worth considering.

Often there is already a "COW counter" for most HEKs when they inlined 
into a shared HEK, see "struct shared_he" and "Size_t	hent_refcount". In 
my CopFILE branch I turned HEKs into basically an immutable string type 
like Javascript strings. On a PP level shared/unshared/COW/whatever HEKs 
in PP Scalars, you can't tell the difference on a PP level if its a HEK 
or not, no matter what kind of COW it is, is de-COWed automatically on a 
PP level.

Dont forget rurban's complaint that the C strings in HEKs and Father C's 
COW code can't be stored in const RO memory in binaries made by a "perl 
compiler" because the refcounts are right next to the C string in 
memory, and the refcounts must be RW through the whole process lifetime. 
I have no solutions off the top of my head to rurban's complaint.

Thread Previous | Thread Next Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at | Group listing | About