With this in place, the same tests as an unpatched blead pass, which is, all of them except the one that passes 38 tests even though it thinks its supposed to only do 37 tests. Acceleration is supposed to be provided by skipping bytes less than 128 eight at a time, by examining them as a U64. When there is a high bit, three tests are performed to determine which of the eight bytes has the high bit set and normal counting proceeds from that point, instead of starting over at s=alignment-point regardless of which byte had the high bit. The benefit is: (1) faster passing over runs of bytes all less than or equal to 127, which all have lengths of 1 in the lookup table. The costs are (1) one extra comparison each "walking" go-around to check if we are at a 64-bit alignment (2) extra examinations in the "running" loop will consume extra ticks when the string is mostly characters greater than 127 (3) unknown performance of bitmasking a U64 on a 32-bit architecture cost #1 could be mitigated by unrolling the "walking" test into an 8-wide Duff's device, but hopefully I can restrain myself.Thread Previous | Thread Next