develooper Front page | perl.perl5.porters | Postings from October 2003

Re: 5.8.2-RC1 and mp2

Thread Previous | Thread Next
Nicholas Clark
October 30, 2003 11:40
Re: 5.8.2-RC1 and mp2
Message ID:
On Thu, Oct 30, 2003 at 01:37:27AM -0800, Stas Bekman wrote:
> Nicholas Clark wrote:
> [...]

> >HE *
> >Perl_hv_fetch_ent(pTHX_ HV *hv, SV *keysv, I32 lval, register U32 hash)
> >{
> >    ...
> >
> >    if (HvREHASH(hv)) {
> >	PERL_HASH_INTERNAL(hash, key, klen);
> >	/* Yes, you do need this even though you are not "storing" because
> >	   you can flip the flags below if doing an lval lookup.  (And that
> >	   was put in to give the semantics Andreas was expecting.)  */
> >	flags |= HVhek_REHASH;
> >    } else if (!hash) {
> >	PERL_HASH(hash, key, klen);
> >    }
> >
> >
> >hv_fetch_ent knows how to ignore the passed in hash value.
> >
> >(But the thing I was worrying about some days ago appears to have come
> >to pass) - mod_perl is then storing the hash value from the returned HE *
> >?
> That's exactly what I was repeatedly asking. Whether it'll ignore that 
> value once or all the time, once the attack is detected. Earlier you said 
> once, now I understand that only for the first time.

No. Either you are misunderstanding me, or I am misunderstanding you.
All hv.c functions ignore a passed in hash value every time when HvREHASH()
is true. And HvREHASH() is true on any hash that has switched to randomised
hashing because of pathological data.

Please look at the source of hv_fetch_ent in hv.c to see the logic, if my
explanation is failing to make sense.

> >I could think of ways round this, but none of them felt efficient.
> >Two were:
> >
> >1: Using the pool of temporary HEs that the tied hashes use to return
> >   HE *s with "proper" data
> You mean on the "client" side (mod_perl)? So that the passed hash is 
> modified internally and the client should re-cache it?

er, yes but.

> >2: Not using temporaries, but storing both real and rehashed HE* in
> >   hashes "under attack"
> >   (not quite sure how - I can see a good way to hang the data of the
> >    existing structures. Not sure how much extra data this represents,
> >    either)
> And it makes things too complex, besides wasted memory and some extra CPU 
> overhead

I can see a way to do it that may actually help.
(Hang a pointer to a shared HEK in the hash key beyond the flag byte.
Have to copy it out as it's not aligned.
The benefit is that the linked lists stay the same length, and the unshared
key can use a pointer into the shared string table. hash routines are
comparing pointers first, and if they match skipping the memcmp.
This may actually use less memory than the current implementation)

> >A third way would be welcome.
> 3. Make GV's HV special and never rehash those?

Not sure how to do this.

> 4. introduce a new API which will tell the client that the hash has been 
> changed and needs to be re-cached? Or may be better, introduce an new flags 
> macro which will tell the client, that the HV had its keys rehashed, and 
> the client needs to recache it?

I don't want to introduce a new API in haste.
(want, not won't)

> At the moment I tend to think that exempting GVs from rehashing is the 
> safest approach to start with. Though someone will now say: "what if 
> external source which is not under control of the developer is used to 
> create GVs?"

Except that the GVs call the regular hash functions, and I'm not sure
how to make the hash functions aware that it's a GV.

After I've eaten I'll have proper look at the modperl source.

Nicholas Clark

Thread Previous | Thread Next Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at | Group listing | About