On 10/31/2017 05:47 AM, Dave Mitchell wrote: > So I think the conclusion is that trying to second-guess the CPU is > probably a waste of time! You might be interested in the branchless decoder then, http://nullprogram.com/blog/2017/10/06/ It was faster than the dfa under some circumstances, but not in all. He prefers the dfa to his own, in part because it avoids branches by requiring every input string to be padded with 3 trailing bytes. Since I first looked at it, a link at the bottom has been added to a SIMD decoder, which works faster on some architectures, slower on other. > >> The total size of ./perl increased 100K bytes. There were 98 calls in it >> that got inlined; only 6 needed the full generality. > Hmmm, that seems a bit excessive. I would be tempted to see what > performance drop you get by not inlining the function, or only inlining > it in a few critical places, or (if technically feasible) only inlining > a smaller subset of the function. > >> I have pushed the branch as >> >> https://perl5.git.perl.org/perl.git/shortlog/refs/heads/smoke-me/khw-utf8 >> >> in case anyone wants to look at it. > It looks good, except I don't like the 'real' function being named the > same as the inline function but with an extra _ prefix. It confused me a > lot trying to review the diff. Maybe rename the 'real' function to > Perl_utf8n_to_uvchr_full() (non-API but exported)? I wondered about that name, so certainly I will take your advice. > >> If we did go with this, I don't know how it would work with the license, >> which reads: > IANAL but I can't see any reason not to include it the way you have. It took me a bit to figure out this acronym, but it makes sense given their reputation.Thread Previous