On Tue, Dec 23, 2008 at 5:51 AM, Nicholas Clark <nick@ccl4.org> wrote: > Jarkko alerted me to this > http://www.daemonology.net/blog/2008-06-05-faster-utf8-strlen.html > > which references our very own Aristotle Pagaltzis. > > Is anyone interested in experimenting with his bit-smashing approach and > seeing whether it can be used in Perl_utf8_length(), and what sort of a > speedup it gives? It's not the world's largest function: Well, it can't be used, because perl's arbitrary-length extension apparently does not follow the all-continuation-bytes-and-only-continuation-bytes-are-in-the-0x80-0xBF-range convention. Well maybe it could be used by checking for and special-casing 0xFF before using the walk-then-run approach in the patch. Although there are then scary alignment issues, and the 0b100..... bytes that are invariants in I8 for the EBCDIC port. What can be done, or what I have done, is to use masking to accelerate skipping aligned runs of invariants. Haven't benchmarked it, but the attached patch passes all tests in t/uni both with and without -DNEVERMINDABOUTTHEWORDSKIPPINGTHING which disables the accelerated skipping while keeping the subtractive rather than additive approach to length calculation. Happy 2009. -- Lucky Cap'n Rabbit King Nuggets: For the Irish seafaring nobleman in YOU!Thread Previous | Thread Next