develooper Front page | perl.perl5.porters | Postings from January 2009

Re: Even faster Unicode character counting

Thread Previous | Thread Next
David Nicol
January 29, 2009 19:27
Re: Even faster Unicode character counting
Message ID:
On Thu, Jan 29, 2009 at 8:32 PM, karl williamson
<> wrote:
> David Nicol wrote:
>> I don't know about the endianness issues, the patch uses the U64 macro
>> which should be an appropriate size even if it has to be char[8] or
>> such.
> Why then does the code have a HAS_QUAD macro to say whether the machine even
> accepts 64 bits or not, and other macros to declare a constant suffixed with
> an L, for example, or not.

to determine what U64 is, on any particular platform.  I'm not
entirely certain that it's available everywhere; and I'm hoping
someone else will be pleased to take this little optimization project
over.  Without the massive speed gain of counting continuation
characters in parallel and subtracting them available, and the fact
that the additional tests will slow down operations on
all-extended-character data, I'm not sure this is even worthwhile at
all; better to do deeper reengineering to create a "pure utf8" data
type that would be guaranteed to hold valid utf8, or similarly grand
unfunded mandate.

Thread Previous | Thread Next Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at | Group listing | About