Front page | perl.perl5.porters |
Postings from October 2016
Re: hv.h hek definition
Thread Previous
|
Thread Next
From:
bulk 88
Date:
October 22, 2016 03:04
Subject:
Re: hv.h hek definition
Message ID:
CY4PR04MB06639296AFD42DE71E76C9C3DFD70@CY4PR04MB0663.namprd04.prod.outlook.com
Disclaimer I've poked around in this code,
https://rt.perl.org/Public/Bug/Display.html?id=125296
http://perl5.git.perl.org/perl.git/shortlog/refs/heads/smoke-me/bulk88/rt125296-wip-COPFILE-threads
Todd Rinaldo wrote:
>> On Sep 28, 2016, at 10:27 PM, Father Chrysostomos <sprout@cpan.org>
>> wrote:
>>
>> Todd Rinaldo wrote:
>>> IMO, the declaration for hek_key is VERY misleading. In the
>>> context of a struct, what you end up with when you do char
>>> hek_key[1]; is a char pointer followed by a char.
>> char hek_key[1] stores the char array's members directly in the
>> struct. There is no pointer stored in the struct.
>>
>
> Right this was my first time seeing a char[1] hanging off the end of
> a struct so one could dynamically allocate a variable size string off
> the end of the struct without the need for a pointer. Neat trick.
IIRC C99's Flexible Array Member feature lets you write "char foo[];" as
the last member in a C struct declaration. BC P5 is C89 compat, so the
[1] has to be there. member hek_key is a flexible array member but in
C89-ese. https://gcc.gnu.org/onlinedocs/gcc/Zero-Length.html
> 2. HEK_FLAGS seems too complicated. Rather than being stored as a
> "char hek_flags;" in the hek struct, It hides off the tail end of the
> hek_key. From what I've been able to determine, it is this way
> because the hek_key used to be a PV way back when it wasn't in its
> own struct. So instead of the HEK_FLAGS macro being something simple
> like:
>
> (hek)->hek_flags
>
> It instead has to be something complicated like:
>
> (*((unsigned char *)(HEK_KEY(hek))+HEK_LEN(hek)+1))
>
> I can only imagine that the former would be much cheaper
> computationally. It is true that when an SV rather than the normal C
> string hangs off the end of a HEK, the flag is wasted. However from
> what I can tell, this is an extreme rarity (related to magic) that an
> SV is ever used in a hek. I could probably produce some anecdotal
> proof of the rarity of SVs if it was needed.
>
> I have attached a patch in this email to show what I think would need
> to be changed to simplify HEK_FLAGS.
----------------------------------------------------------
#define HeKEY_sv(he) (*(SV**)HeKEY(he))
----------------------------------------------------------
----------------------------------------------------------
diff --git a/hv.h b/hv.h
index 0e773f2..2f79f03 100644
--- a/hv.h
+++ b/hv.h
@@ -45,6 +45,7 @@ struct he {
struct hek {
U32 hek_hash; /* hash of key */
I32 hek_len; /* length of hash key */
+ char hek_flags; /* The flags associated with this key */
char hek_key[1]; /* variable-length hash key */
/* the hash-key is \0-terminated */
/* after the \0 there is a byte for flags, such as whether the key
----------------------------------------------------------
What about alignment? HeKEY_sv() contains a 32/64 bit read in it. What
about non-x86 CPUs that dont support unaligned reads without special
measures? (and bulk88 has been unsuccessful in the past in arguing on
P5P for use of unaligned memory reads, which all have to be done through
a future P5P C API anyway, because either you just do *(U32*)0x1ABC0001,
do *(__unaligned U32*)0x1ABC0001, do *(__attribute__((__packed__))
U32*)0x1ABC0001, do __aeabi_uread4(0x1ABC0001), or roll your own
"inline" with 4 1 byte reads for braindead CCs, or a non-inlined version
with a switch() (IE jump table in ASM) based on the last 2 bits of the
pointer that does a 1 U32 read (ptr was really aligned), 2 U16 reads, or
4 U8 reads.
----------------------------------------------------------
+ char hek_flags; /* The flags associated with this key */
char hek_key[1]; /* variable-length hash key */
----------------------------------------------------------
I also worry some CC will get the bright idea that you need 3 padding
bytes between a single char and an inline char array for "performance"
reasons. I would write a static assert that
STATIC_ASSERT_STMT(offsetof(struct hek, hek_key)-offset(struct hek,
hek_flags) == 1));
to make sure no CC/platform starts wasting 3 bytes there.
> 3. This is still stewing in my head but it occurs to me that if you
> no longer had a flag hanging off the end of your hek_key, you could
> store a COW counter. I'm not sure what use cases that would provide
> but it might be worth considering.
Often there is already a "COW counter" for most HEKs when they inlined
into a shared HEK, see "struct shared_he" and "Size_t hent_refcount". In
my CopFILE branch I turned HEKs into basically an immutable string type
like Javascript strings. On a PP level shared/unshared/COW/whatever HEKs
in PP Scalars, you can't tell the difference on a PP level if its a HEK
or not, no matter what kind of COW it is, is de-COWed automatically on a
PP level.
Dont forget rurban's complaint that the C strings in HEKs and Father C's
COW code can't be stored in const RO memory in binaries made by a "perl
compiler" because the refcounts are right next to the C string in
memory, and the refcounts must be RW through the whole process lifetime.
I have no solutions off the top of my head to rurban's complaint.
Thread Previous
|
Thread Next