On Thu, Jan 29, 2009 at 8:32 PM, karl williamson <public@khwilliamson.com> wrote: > David Nicol wrote: >> >> I don't know about the endianness issues, the patch uses the U64 macro >> which should be an appropriate size even if it has to be char[8] or >> such. > > Why then does the code have a HAS_QUAD macro to say whether the machine even > accepts 64 bits or not, and other macros to declare a constant suffixed with > an L, for example, or not. to determine what U64 is, on any particular platform. I'm not entirely certain that it's available everywhere; and I'm hoping someone else will be pleased to take this little optimization project over. Without the massive speed gain of counting continuation characters in parallel and subtracting them available, and the fact that the additional tests will slow down operations on all-extended-character data, I'm not sure this is even worthwhile at all; better to do deeper reengineering to create a "pure utf8" data type that would be guaranteed to hold valid utf8, or similarly grand unfunded mandate.Thread Previous | Thread Next