develooper Front page | perl.perl5.porters | Postings from December 2008

Re: Even faster Unicode character counting

Thread Previous | Thread Next
David Nicol
December 30, 2008 14:34
Re: Even faster Unicode character counting
Message ID:
On Tue, Dec 30, 2008 at 12:39 PM, David Nicol <> wrote:
> Here's an ifdeffed version that attempts to use vectorization to skip
> runs of invariants quickly while still using the information from the
> start bytes to skip ahead, for the EBCDIC channel.  And I corrected
> the off-by-one in the comment.  This compiles, but I have neither
> tested nor benchmarked.

open note to self:
Sent a patch without even testing?  What kind of idiot are you trying
to pass yourself off as, anyway?
I tested your patch and discovered that you had an off-by-one -- your
initializing count to -1 was based on a misunderstanding of how the
function is used!
end open note to self

But there's a thing I don't quite get -- the UTFSKIP lookup table
assigns a length of 13 to a 0xFF start byte.  Could someone explain
how that works?  The tests pertaining to length now work, but tests
presumably based on the functionality involved in this 13-long thing
are now failing, as that uses a different mechanism than counting
continuation bytes can handle.

Thread Previous | Thread Next Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at | Group listing | About