develooper Front page | perl.perl5.porters | Postings from March 2013

NWCLARK TPF grant report #77

Nicholas Clark
March 27, 2013 20:22
NWCLARK TPF grant report #77
Message ID:
[Hours]		[Activity]
2013/02/18	Monday
 1.50		PERL_HASH
 0.25		RT #116615
 0.25		RT #116789
 0.25		RT #116795
 0.50		RT #116833
 0.25		RT #116839
 0.50		Unicode Names
 1.00		inversion lists
 0.50		process, scalability, mentoring
 2.00		reading/responding to list mail

2013/02/19	Tuesday
 0.25		PL_interp_size_5_18_0
 0.25		RT #116833
 0.50		RT #116943
 0.25		RT #59802
 0.50		SvOK
 0.75		Unicode Names
 3.25		pahole on the interpreter struct

2013/02/20	Wednesday
 0.25		PL_interp_size_5_18_0
 0.25		RT #113620
 0.25		RT #116362
 0.25		lexical $_
 0.25		process, scalability, mentoring
 4.00		reading/responding to list mail
 3.50		yves/hv_h_split

2013/02/21	Thursday
 0.75		RT #116899
 3.75		hv_ksplit/hsplit merge
 1.25		reading/responding to list mail
 1.25		yves/hv_h_split

2013/02/22	Friday
 1.25		MVS
 0.25		abolish STRANGE_MALLOC
 1.25		android
 1.00		hv_ksplit/hsplit merge

2013/02/24	Sunday
 0.25		MVS
 1.75		Unicode Names

Which I calculate is 34.25 hours

This week has the flattest spread of hours that I can remember.

Aside from working on a lot of different small things, there was one common
theme to the week. I reviewed the branch Yves had pushed with his proposed
hash changes to protect against hash key discovery. As-was, it added another
member to XPVHV, the structure used for every hash. As Perl 5 only has one
hashes type, it's used for symbol tables, (most) objects, and, well, hashes,
as in associative arrays storing user data, accessed by keys. Increase XPVHV
and (nearly) every object gets bigger, which isn't ideal if the increase is
only needed for iterating the hash. (Most code doesn't poke into objects in
order to iterate over their attributes.)

So I looked to find a way to move the member into the on-demand allocated
structure which holds iterator state. This revealed some fun - there are
actually two functions for "splitting" a hash. A static function,
S_hsplit(), which doubles the size of the array, only used by hash key
insertion code, and an API function, Perl_hv_ksplit(), which can grow the
hash array to "any" size requested (which is rounded up to the next power of
two). The former makes simplifying assumptions because it knows that it is
always only doubling, whereas the latter can not, as it needs to be able to
multiply the array size by any power of two.

It's not clear why there are two functions. S_hsplit() dates back to 5.000.
Perl_hv_ksplit() was added in September 1996. Archives exist for
Perl5-Porters from 1996, but the patch from back then was actually a
reworking of an earlier patch. Unfortunately *that* patch, and any
subsequent discussion, was sent to the list before beginning of recorded
time (21st August 1995), so it's a bit of a mystery as to why it duplicated

Fortunately, with a bit of massaging, it turned out to be relatively easy to
refactor the two functions so that their implementations converged. This was
helped by S_hsplit() being a static function, so I had complete control over
both its callers, and was able to push some work out to them. (Actually,
most usefully, just to one of them. As that work wasn't needed for the other
call path. Eliminating work is the only sure form of optimisation.) When I
got them close enough, I was able to replace the guts of Perl_hv_ksplit()
with a call to S_split(), and the fork was healed.

Once it smoked clean, I merged this change back to blead, as it was
independent of the other changes. With the change made, it became a lot
easier to fix the original problem of moving the structure member, and Yves
subsequently incorporated a variant of my suggested changes into his branch.

Nicholas Clark Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at | Group listing | About