develooper Front page | perl.perl5.porters | Postings from August 2012

Re: Seeking guidance on Benchmarking and potentially removing swashcaching

Thread Previous | Thread Next
From:
Nicholas Clark
Date:
August 29, 2012 08:02
Subject:
Re: Seeking guidance on Benchmarking and potentially removing swashcaching
Message ID:
20120829150240.GX9583@plum.flirble.org
On Tue, Aug 21, 2012 at 02:26:42PM -0600, Karl Williamson wrote:

> It got me to being more realistic about what real-world applications 
> look like.  Most things are written in just one language, or at most a 
> few.  And so processing will be of just a relatively few code points. 
> The hash implementation shines in this regard.  Modern Cyrillic has 32 
> characters IIRC, times 2 for upper/lower case.  Swash hashes will handle 
> these in just a couple of keys.  Chinese has a lot more characters, so I 
> tried it on a Chinese wikipedia page, and got similar results.  So even 
> though there are more keys in the hash, it's not enough to degrade hash 
> performance.

IIRC this was Larry's rough assumption about 10-12 years ago when he chose
the implementation he did - most programs won't be dealing with more than
one language, code points used by a language are in clusters, so most
programs won't ever have to load more than an few bits of Unicode.

Hence:

> knowledge of how hashes work, which is decades old.)  But I don't think 
> there are that many real world situations where this likely happens. 

> In the long run, it would be best to get most or all of the standard 
> Unicode properties into memory, using the techniques that ICU does; then 
> this wouldn't much matter.

Agree.

Nicholas Clark

Thread Previous | Thread Next


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About