develooper Front page | perl.perl5.porters | Postings from October 2003

Re: 5.8.2-RC1 and mp2

Thread Previous | Thread Next
From:
Nicholas Clark
Date:
October 30, 2003 00:26
Subject:
Re: 5.8.2-RC1 and mp2
Message ID:
20031030082526.GT6287@plum.flirble.org
On Wed, Oct 29, 2003 at 11:42:35PM -0800, Stas Bekman wrote:
> Nicholas Clark wrote:
> >On Tue, Oct 28, 2003 at 04:10:34PM -0800, Stas Bekman wrote:
> >
> >>Nicholas Clark wrote:
> >>
> >>>On Tue, Oct 28, 2003 at 03:44:56PM -0800, Stas Bekman wrote:

> >I think that I've failed to be clear. mod_perl will still need an
> >analogous workaround for 5.8.2. All 5.8.1 hashes are randomised.
> >Some 5.8.2 hashes are randomised. (But only the hashes being fed
> >pathological data)
> 
> I don't think so. If we do that we abolish the protection. Besides it 
> doesn't work. it has no effect at all if I preset:
> 
>         PL_new_hash_seed_set = TRUE;
>         PL_new_hash_seed = MP_init_hash_seed;
> 
> like I did for 5.8.1 (which works):
> 
>         PL_hash_seed_set = TRUE;
>         PL_hash_seed = MP_init_hash_seed;

Hmmm. Is mod_perl pulling the hash value directly from the HEK anywhere?

/* entry in hash value chain */
struct he {
    HE		*hent_next;	/* next entry in chain */
    HEK		*hent_hek;	/* hash key */
    SV		*hent_val;	/* scalar value that was hashed */
};

/* hash key -- defined separately for use as shared pointer */
struct hek {
    U32		hek_hash;	/* hash of key */
    I32		hek_len;	/* length of hash key */
    char	hek_key[1];	/* variable-length hash key */
    /* the hash-key is \0-terminated */
    /* after the \0 there is a byte for flags, such as whether the key
       is UTF-8 */
};

> >It would not be impossible for mod_perl to have some real data that was
> >similar. But it seems unlikely.
> 
> As you can see below, it's not only likely, it happens in our ever-growing 
> test suite ;)

Ah

> >I'm going to get the mod_perl source after I send this message.
> 
> Please ask me any questions directly if you have any problems to get it 
> running.

Not tried yet. Won't get a chance after I send this message until this
evening. I can't do any of this on work time.


> >My understanding of GVs may be failing me here. You keep mentioning
> >fetch_gv, but that's not part of the perl API. As far as I can see all
> >the functions in gv.c in perl call hv.c functions to do the lookups, and
> >hv.c functions now know when to rehash.
> 
> sorry, mp2 implements a light-weighted version of gv_fetchpv (it doesn't 
> use it). It uses PERL_HASH to cache the hash values and then it directly 
> retrieves those GVs via hv_fetch_he(stash, mgv->name, mgv->len, mgv->hash);

Which still shouldn't be a problem on retrieve

HE *
Perl_hv_fetch_ent(pTHX_ HV *hv, SV *keysv, I32 lval, register U32 hash)
{
    ...

    if (HvREHASH(hv)) {
	PERL_HASH_INTERNAL(hash, key, klen);
	/* Yes, you do need this even though you are not "storing" because
	   you can flip the flags below if doing an lval lookup.  (And that
	   was put in to give the semantics Andreas was expecting.)  */
	flags |= HVhek_REHASH;
    } else if (!hash) {
	PERL_HASH(hash, key, klen);
    }


hv_fetch_ent knows how to ignore the passed in hash value.

(But the thing I was worrying about some days ago appears to have come
to pass) - mod_perl is then storing the hash value from the returned HE *
?

I could think of ways round this, but none of them felt efficient.
Two were:

1: Using the pool of temporary HEs that the tied hashes use to return
   HE *s with "proper" data
2: Not using temporaries, but storing both real and rehashed HE* in
   hashes "under attack"
   (not quite sure how - I can see a good way to hang the data of the
    existing structures. Not sure how much extra data this represents,
    either)

A third way would be welcome.

> >2: Change
> >   #define HV_MAX_LENGTH_BEFORE_SPLIT 4
> >   in hv.c from 4 to (say) 16 to attempt to put off randomisation as long
> >   as possible (in case something in the GVs or elsewhere is triggering
> >   randomisation.
> 
> works!
> 
> So the conclusion is that the randomization kicks in too early? If so that 
> doesn't sound efficient at all. I don't think we get more than 40 keys in 
> each HV. Next I'm going to try your scripts that reproduce the attack 
> conditions.

"Attack condition" is only per hash, so you'll need to feed those keys into
every hash you wish to attack.

(ignoring mod perl) I'm not convinced that it kicks in too way early.
MJD calculates that the average linked list size on an HV should be 1.6
entries. (ie there usually shouldn't be a linear search)
It's kicking in at 4, *after* a hash split failed to partition the longest
list below 4. So it's saving linear searches of 4 elements.

We don't have data on the right number. I forget the number of people
subscribed to this list - 200?

HELLO THERE

Anyway, no-one (yet) has been able and willing to calculate what we should
expect in linked list sizes for random (er, and more importantly "typical")
data.

At this point I have to go to work, and radio silence.

Nicholas Clark

PS Does mod_perl have to be binary compatible with 5.8.0? If not, for
   mod_perl we could use the 5.8.1 scheme (which works) and for the rest
   perl the 5.8.2 scheme. (As a general case 5.8.2 install needs to be
   binary compatible with 5.8.0)

Thread Previous | Thread Next


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About