develooper Front page | perl.perl5.porters | Postings from August 2012

Re: Seeking guidance on Benchmarking and potentially removing swashcaching

Thread Previous | Thread Next
From:
Karl Williamson
Date:
August 21, 2012 09:09
Subject:
Re: Seeking guidance on Benchmarking and potentially removing swashcaching
Message ID:
5033B2BB.9050100@khwilliamson.com
On 08/20/2012 04:22 AM, Dave Mitchell wrote:
> On Sun, Aug 19, 2012 at 12:14:39PM -0600, Karl Williamson wrote:
>> I would prefer to benchmark more realistic real-world data, and
>> wonder if anyone has some ideas on such, or any other thoughts on
>> this issue.
>
> Well, one thing would be to benchmark a worst-case scenario. Pick a
> swatch that has a large inversion list: i.e. something that is maximally
> bad for binary search; then run a loop where only the same character is
> repeatedly looked up (so minimally small cache). And see whether the
> result is significantly bad.
>
> PS: If a swatch has only a small, fixed number of fields, would it be
> better to use an array rather than a hash to represent it?
>


I'm not understanding you here very well.

You use the term 'swatch'.  I think you meant what I called 'swash'. 
The comments in the code refer to a 'swatch' as the 64-bit bit map (8 
bytes) that each value in swash hashes is.

If so, then I still don't understand.  My email gave the results of 
using binary search with and without caching the last look-up result, 
and caching gave somewhat better results, so my proposal included 
caching, and so your test would just keep returning that cached value, 
so would be fine.

I also don't understand the postscript.  An inversion list is an ordered 
array, so that would seem to fit your description.  Perhaps you meant an 
adaptive algorithm, which I hadn't thought of.  Use the inversion list 
if the number of elements is small, and use a swash if it is larger. 
(The largest inversion list for an official Unicode property is about 
1600 elements, which is 11 probes to search, BTW.) regcomp.c could 
decide for any given property or bracketed character class which one to 
use.  Possibly, also, a swash that was getting too large and losing 
performance could be switched to an inversion list.  That seems probably 
more trouble than the possible gain.

Thread Previous | Thread Next


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About