develooper Front page | perl.perl5.porters | Postings from August 2012

Re: Seeking guidance on Benchmarking and potentially removing swashcaching

Thread Previous | Thread Next
From:
Dave Mitchell
Date:
August 21, 2012 14:45
Subject:
Re: Seeking guidance on Benchmarking and potentially removing swashcaching
Message ID:
20120821214500.GE4366@iabyn.com
On Tue, Aug 21, 2012 at 10:09:31AM -0600, Karl Williamson wrote:
> On 08/20/2012 04:22 AM, Dave Mitchell wrote:
> >On Sun, Aug 19, 2012 at 12:14:39PM -0600, Karl Williamson wrote:
> >>I would prefer to benchmark more realistic real-world data, and
> >>wonder if anyone has some ideas on such, or any other thoughts on
> >>this issue.
> >
> >Well, one thing would be to benchmark a worst-case scenario. Pick a
> >swatch that has a large inversion list: i.e. something that is maximally
> >bad for binary search; then run a loop where only the same character is
> >repeatedly looked up (so minimally small cache). And see whether the
> >result is significantly bad.
> >
> >PS: If a swatch has only a small, fixed number of fields, would it be
> >better to use an array rather than a hash to represent it?
> >
> 
> 
> I'm not understanding you here very well.
> 
> You use the term 'swatch'.  I think you meant what I called 'swash'.
> The comments in the code refer to a 'swatch' as the 64-bit bit map
> (8 bytes) that each value in swash hashes is.

Brain fart on my part. I meant swash.

> If so, then I still don't understand.  My email gave the results of
> using binary search with and without caching the last look-up
> result, and caching gave somewhat better results, so my proposal
> included caching, and so your test would just keep returning that
> cached value, so would be fine.

I wasn't paying close enough attention. Ignore me.
> 
> I also don't understand the postscript.  An inversion list is an
> ordered array, so that would seem to fit your description.  Perhaps
> you meant an adaptive algorithm, which I hadn't thought of.  Use the
> inversion list if the number of elements is small, and use a swash
> if it is larger. (The largest inversion list for an official Unicode
> property is about 1600 elements, which is 11 probes to search, BTW.)
> regcomp.c could decide for any given property or bracketed character
> class which one to use.  Possibly, also, a swash that was getting
> too large and losing performance could be switched to an inversion
> list.  That seems probably more trouble than the possible gain.

From your original email: "A swash is a hash with a few keys.". I was just
randomly wondering if the key set is small and well known, whether swashes
could be implemented using arrays, e.g.

    $swash{BITS}
becomes
    use constant BITS => 3;
    $swash[BITS]

But knowing almost nothing about swashes, I have no idea whether that
would provide any benefit.

-- 
I've often wanted to drown my troubles, but I can't get my wife to go
swimming.

Thread Previous | Thread Next


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About