develooper Front page | perl.perl5.porters | Postings from October 2003

Re: 5.8.2-RC1 and mp2

Thread Previous | Thread Next
From:
Stas Bekman
Date:
October 30, 2003 01:39
Subject:
Re: 5.8.2-RC1 and mp2
Message ID:
3FA0DBD7.30503@stason.org
Nicholas Clark wrote:
[...]
>>>I think that I've failed to be clear. mod_perl will still need an
>>>analogous workaround for 5.8.2. All 5.8.1 hashes are randomised.
>>>Some 5.8.2 hashes are randomised. (But only the hashes being fed
>>>pathological data)
>>
>>I don't think so. If we do that we abolish the protection. Besides it 
>>doesn't work. it has no effect at all if I preset:
>>
>>        PL_new_hash_seed_set = TRUE;
>>        PL_new_hash_seed = MP_init_hash_seed;
>>
>>like I did for 5.8.1 (which works):
>>
>>        PL_hash_seed_set = TRUE;
>>        PL_hash_seed = MP_init_hash_seed;
> 
> 
> Hmmm. Is mod_perl pulling the hash value directly from the HEK anywhere?
> 
> /* entry in hash value chain */
> struct he {
>     HE		*hent_next;	/* next entry in chain */
>     HEK		*hent_hek;	/* hash key */
>     SV		*hent_val;	/* scalar value that was hashed */
> };
> 
> /* hash key -- defined separately for use as shared pointer */
> struct hek {
>     U32		hek_hash;	/* hash of key */
>     I32		hek_len;	/* length of hash key */
>     char	hek_key[1];	/* variable-length hash key */
>     /* the hash-key is \0-terminated */
>     /* after the \0 there is a byte for flags, such as whether the key
>        is UTF-8 */
> };

It calls PERL_HASH() to precache hash values. The problem is that we have to 
use that function, because we pre-hash the GVs before perl is even started.

It calls:

   HE *he = hv_fetch_he(stash, mgv->name, mgv->len, mgv->hash);

to get the HE, and then the corresponding GV.

See src/modules/perl/modperl_mgv.c
http://cvs.apache.org/viewcvs.cgi/*checkout*/modperl-2.0/src/modules/perl/modperl_mgv.c?rev=HEAD&content-type=text/plain

find hv_fetch_he.

>>>My understanding of GVs may be failing me here. You keep mentioning
>>>fetch_gv, but that's not part of the perl API. As far as I can see all
>>>the functions in gv.c in perl call hv.c functions to do the lookups, and
>>>hv.c functions now know when to rehash.
>>
>>sorry, mp2 implements a light-weighted version of gv_fetchpv (it doesn't 
>>use it). It uses PERL_HASH to cache the hash values and then it directly 
>>retrieves those GVs via hv_fetch_he(stash, mgv->name, mgv->len, mgv->hash);
> 
> 
> Which still shouldn't be a problem on retrieve
> 
> HE *
> Perl_hv_fetch_ent(pTHX_ HV *hv, SV *keysv, I32 lval, register U32 hash)
> {
>     ...
> 
>     if (HvREHASH(hv)) {
> 	PERL_HASH_INTERNAL(hash, key, klen);
> 	/* Yes, you do need this even though you are not "storing" because
> 	   you can flip the flags below if doing an lval lookup.  (And that
> 	   was put in to give the semantics Andreas was expecting.)  */
> 	flags |= HVhek_REHASH;
>     } else if (!hash) {
> 	PERL_HASH(hash, key, klen);
>     }
> 
> 
> hv_fetch_ent knows how to ignore the passed in hash value.
> 
> (But the thing I was worrying about some days ago appears to have come
> to pass) - mod_perl is then storing the hash value from the returned HE *
> ?

That's exactly what I was repeatedly asking. Whether it'll ignore that value 
once or all the time, once the attack is detected. Earlier you said once, now 
I understand that only for the first time.

> I could think of ways round this, but none of them felt efficient.
> Two were:
> 
> 1: Using the pool of temporary HEs that the tied hashes use to return
>    HE *s with "proper" data

You mean on the "client" side (mod_perl)? So that the passed hash is modified 
internally and the client should re-cache it?

> 2: Not using temporaries, but storing both real and rehashed HE* in
>    hashes "under attack"
>    (not quite sure how - I can see a good way to hang the data of the
>     existing structures. Not sure how much extra data this represents,
>     either)

And it makes things too complex, besides wasted memory and some extra CPU overhead

> A third way would be welcome.

3. Make GV's HV special and never rehash those?

4. introduce a new API which will tell the client that the hash has been 
changed and needs to be re-cached? Or may be better, introduce an new flags 
macro which will tell the client, that the HV had its keys rehashed, and the 
client needs to recache it?

> 
>>>2: Change
>>>  #define HV_MAX_LENGTH_BEFORE_SPLIT 4
>>>  in hv.c from 4 to (say) 16 to attempt to put off randomisation as long
>>>  as possible (in case something in the GVs or elsewhere is triggering
>>>  randomisation.
>>
>>works!
>>
>>So the conclusion is that the randomization kicks in too early? If so that 
>>doesn't sound efficient at all. I don't think we get more than 40 keys in 
>>each HV. Next I'm going to try your scripts that reproduce the attack 
>>conditions.
> 
> 
> "Attack condition" is only per hash, so you'll need to feed those keys into
> every hash you wish to attack.
> 
> (ignoring mod perl) I'm not convinced that it kicks in too way early.
> MJD calculates that the average linked list size on an HV should be 1.6
> entries. (ie there usually shouldn't be a linear search)
> It's kicking in at 4, *after* a hash split failed to partition the longest
> list below 4. So it's saving linear searches of 4 elements.

But without ignoring mod_perl you can clearly see that this abnormal condition 
happens under pretty normal conditions. You can see in my other post all the 
250 GVs and the biggest number of entries for a single GV is 27. So any 
program with several packages may potentially find itself under a non-existing 
attack (could break if caching hash value/adds an unnecessary overhead)

Here is the further data, continued from the other post:

  perl -lne '/\b((?:\w+::)+)\w+\b/ and $c{$1}++; \
   END { print "$c{$_} $_" for sort { $c{$b} <=> $c{$a} } keys %c }' gvs
27 TestModperl::
19 TestFilter::
15 APR::URI::
15 TestAPI::
12 TestAPR::
12 TestDirective::
12 TestHooks::
10 TestApache::
9 TestCompat::
9 TestHooks::push_handlers::
8 Apache::
6 TestHooks::stacked_handlers2::
5 TestFilter::in_init_basic::
4 TestError::
4 TestFilter::both_str_req_mix::
4 TestHooks::stacked_handlers::
3 TestFilter::out_str_remove::
3 TestFilter::out_str_declined::
3 TestFilter::out_init_basic::
3 TestHooks::init::
3 TestModperl::current_callback::
3 TestFilter::both_str_con_add::
3 TestProtocol::
3 TestModules::
2 TestFilter::in_str_consume::
2 TestAPI::lookup_uri2::
2 TestModules::proxy::
2 Apache::TestHandler::
2 TestHooks::cleanup::
2 TestHooks::access::
2 TestFilter::out_str_req_eos::
2 ModPerl::Test::
2 TestAPI::rflush::
2 TestFilter::in_bbs_underrun::
2 TestAPI::internal_redirect::
2 TestPerl::
2 TestFilter::both_str_req_add::
1 TestHooks::fixup::
1 Chatbot::
1 TestFilter::in_str_sandwich::
1 TestFilter::out_str_api::
1 TestFilter::out_bbs_ctx::
1 ModPerl::
1 TestPreConnection::
1 TestFilter::out_str_reverse::
1 TestFilter::in_str_declined::
1 TestHooks::authz::
1 Apache::RequestRec::
1 TestFilter::in_str_lc::
1 Compress::
1 TestFilter::in_bbs_consume::
1 TestFilter::in_bbs_msg::
1 TestHooks::stacked_handlers3::
1 TestFilter::out_bbs_basic::
1 TestError::runtime::
1 TestFilter::out_str_req_mix::
1 TestModperl::cookie2::
1 TestFilter::in_bbs_body::
1 TestHooks::headerparser::
1 TestPreConnection::note::
1 TestFilter::out_str_ctx::
1 APR::
1 TestFilter::in_bbs_inject_header::
1 TestModperl::cookie::
1 TestFilter::in_str_msg::

> PS Does mod_perl have to be binary compatible with 5.8.0? If not, for
>    mod_perl we could use the 5.8.1 scheme (which works) and for the rest
>    perl the 5.8.2 scheme. (As a general case 5.8.2 install needs to be
>    binary compatible with 5.8.0)

Since mod_perl 2 (mod_perl 1 doesn't have this problem) hasn't been released 
yet, we don't have to be binary compatible. But while we could use what 5.8.1 
did, it's going be just a pain as perl evolves. Besides I'm sure there will be 
other users having similar problems. I'd rather see this solved on the API 
level, giving more control to users, rather than taking it away ;)

At the moment I tend to think that exempting GVs from rehashing is the safest 
approach to start with. Though someone will now say: "what if external source 
which is not under control of the developer is used to create GVs?"

Frankly I don't know what's the best solution. I have spent so much time 
wrestling with this issue with 5.8.1 (and Jarkko!) and I felt so relieved when 
it all was over. Now it all starts again, and it seems to be much worse this 
time. Hopefully we find a solid solution so we won't have to return to this 
issue for some longer time than just 3 months ;)

__________________________________________________________________
Stas Bekman            JAm_pH ------> Just Another mod_perl Hacker
http://stason.org/     mod_perl Guide ---> http://perl.apache.org
mailto:stas@stason.org http://use.perl.org http://apacheweek.com
http://modperlbook.org http://apache.org   http://ticketmaster.com


Thread Previous | Thread Next


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About