2008/12/31 karl williamson <public@khwilliamson.com>: > David Nicol wrote: >> >> On Tue, Dec 30, 2008 at 12:39 PM, David Nicol <davidnicol@gmail.com> >> wrote: >>> >>> Here's an ifdeffed version that attempts to use vectorization to skip >>> runs of invariants quickly while still using the information from the >>> start bytes to skip ahead, for the EBCDIC channel. And I corrected >>> the off-by-one in the comment. This compiles, but I have neither >>> tested nor benchmarked. >> >> open note to self: >> Sent a patch without even testing? What kind of idiot are you trying >> to pass yourself off as, anyway? >> I tested your patch and discovered that you had an off-by-one -- your >> initializing count to -1 was based on a misunderstanding of how the >> function is used! >> end open note to self >> >> But there's a thing I don't quite get -- the UTFSKIP lookup table >> assigns a length of 13 to a 0xFF start byte. Could someone explain >> how that works? The tests pertaining to length now work, but tests >> presumably based on the functionality involved in this 13-long thing >> are now failing, as that uses a different mechanism than counting >> continuation bytes can handle. >> >> > 0xff is an illegal start byte. Here's some info: > http://en.wikipedia.org/wiki/UTF-8 Dont forget that perls utf8 != UTF-8. The latter is subset of the former. You have to read the comments in utf8.h to see what i mean, and im not sure if it impacts your statement, but some aspects of true UTF-8 dont apply to perls internal implementation. Yves -- perl -Mre=debug -e "/just|another|perl|hacker/"Thread Previous | Thread Next