develooper Front page | perl.perl5.porters | Postings from January 2018

Re: Another idea for speeding up /i matching on ASCII characters

Thread Previous | Thread Next
Karl Williamson
January 25, 2018 22:31
Re: Another idea for speeding up /i matching on ASCII characters
Message ID:
On 01/25/2018 01:49 PM, Father Chrysostomos wrote:
> Karl Williamson wrote:
>> This however is not space efficient.
>> If we have an EXACTFish node consisting entirely of ASCII characters
>> that don't have folds outside of ASCII (so Kk are out because they match
>> the KELVIN SIGN), it would be possible to create a parallel string to be
>> used as a mask, specially constructed so that each byte is the
>> appropriate mask for /i on the corresponding byte in the main string.
>> When doing the match, you load a words-worth
> I wandered lonely as a cloud that floats on high o'er vales and hills,
> And all at once I saw a crowd....
> Sorry, I got sidetracked.

It never occurred to me before that his name is somewhat apt
>> of the string, and the
>> corresponding portion of the mask, AND them together and compare.  If
>> then fail or go on to the next word full.
>> Thus you wouldn't have to look up the fold for every byte as we do now.
>> If you've ever watched an EXACTFish node execute with backtracking, it's
>> depressing how often we have to do this over and over.
>> This could also work on many UTF-8 strings as well.
>> It essentially doubles the length of the pattern, but my guess is that
>> it's around an order of magnitude faster on a 64-bit machine.
> Maybe we could actually make 'use less "memory"' do something.
> How much faster is it on 32-bit machines.  (I am no longer in a posi-
> tion to provide 32-bit benchmarks.  Machines bitten 32 times seem to
> be getting rarer these days.)

I think I was a little optimistic.  On an 8 byte word, we do a 
conditional every 8 bytes instead of every single one.  That's a 
difference of 7 conditionals per word.  And indeed, in the recent 
word-at-a-time changes I've made, we see that conditionals ratio 
approaching 800%, but the improvement is only 700%; still quite 
significant.  The number of instructions executed doesn't have this high 
a curve, as we do extra masks, etc per word that don't have to be done 
when looking at a single byte.  But reducing the number of conditionals 
is a far bigger deal.

I was thinking 800%, and then that you don't have to fold, to come up 
with my estimate, but that folding doesn't have to involve extra 
conditionals, so that's not as big a deal as I had thought.

On 4 byte words, you save 3 conditionals per word that gets processed. 
So the savings are roughly half what you get with an 8 byte word.

There is extra overhead at the beginning and end of the string, as you 
have to get to a word boundary before you can start reading 
word-at-a-time (unless we wrote a good probe that somehow also knew that 
unaligned reads weren't slower than aligned ones), and you have to 
process any straggler bytes that don't fill a full word at the end.  And 
you have to have extra tests to determine that you have or don't have 
these conditions that may need to be handled.

Thread Previous | Thread Next Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at | Group listing | About