develooper Front page | perl.perl5.porters | Postings from December 2008

Re: Even faster Unicode character counting

Thread Previous
From:
demerphq
Date:
December 30, 2008 14:52
Subject:
Re: Even faster Unicode character counting
Message ID:
9b18b3110812301452u1c150622ie36c24a27153ab60@mail.gmail.com
2008/12/30 David Nicol <davidnicol@gmail.com>:
> On Tue, Dec 30, 2008 at 12:39 PM, David Nicol <davidnicol@gmail.com> wrote:
>>
>> Here's an ifdeffed version that attempts to use vectorization to skip
>> runs of invariants quickly while still using the information from the
>> start bytes to skip ahead, for the EBCDIC channel.  And I corrected
>> the off-by-one in the comment.  This compiles, but I have neither
>> tested nor benchmarked.
>
> open note to self:
> Sent a patch without even testing?  What kind of idiot are you trying
> to pass yourself off as, anyway?
> I tested your patch and discovered that you had an off-by-one -- your
> initializing count to -1 was based on a misunderstanding of how the
> function is used!
> end open note to self
>
> But there's a thing I don't quite get -- the UTFSKIP lookup table
> assigns a length of 13 to a 0xFF start byte.  Could someone explain
> how that works?  The tests pertaining to length now work, but tests
> presumably based on the functionality involved in this 13-long thing
> are now failing, as that uses a different mechanism than counting
> continuation bytes can handle.

grep -C 5 13 utf8.c utf8.h

See also UTF8_MAXBYTES.

-- 
perl -Mre=debug -e "/just|another|perl|hacker/"

Thread Previous


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About