develooper Front page | perl.perl5.porters | Postings from August 2012

Re: Seeking guidance on Benchmarking and potentially removing swashcaching

Thread Previous
From:
Karl Williamson
Date:
August 21, 2012 14:36
Subject:
Re: Seeking guidance on Benchmarking and potentially removing swashcaching
Message ID:
5033FF50.2020508@khwilliamson.com
On 08/21/2012 02:50 PM, Jarkko Hietaniemi wrote:
>>
>> Thank you for this idea.  I did it for Russian, and it showed the current
>> scheme had between 20-25% advantage over my proposed one, so I won't be
>> pursuing the proposal as-is.
>
> Glad it gave you some results.  In the meanwhile I remembered another
> source for more Unicode text,
> but this time it is much shorter (though you can probably just
> self-concat it enough times), and at least
> in principle the same text:
>
> http://www.unicode.org/standard/WhatIsUnicode-more.html
>

In the meantime, I looked at the Unicode 6.2 properties.  There are 842 
that match distinct sets of code points (not including the ones that are 
complements of them.)  (Some Unicode properties match the exact same set 
of code points as others.  For example Line_Break=CR and 
Grapheme_Cluster_Break=CR both match exactly a carriage return; there 
are others that are non-trivial)

Of those, 47% have just one or two elements in their inversion lists.
           55% have up to four
           63% have up to eight
           70% have up to 16
           77% have up to 32
           81% have up to 64
           85% have up to 128
           90% have up to 256

This indicates that we shouldn't be generating swashes for most official 
Unicode properties.

Thread Previous


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About