develooper Front page | perl.perl5.porters | Postings from February 2018

Re: Another idea for speeding up /i matching on ASCII characters

Thread Previous
Karl Williamson
February 2, 2018 05:19
Re: Another idea for speeding up /i matching on ASCII characters
Message ID:
On 01/25/2018 03:30 PM, Karl Williamson wrote:
> On 01/25/2018 01:49 PM, Father Chrysostomos wrote:
>> Karl Williamson wrote:
>>> This however is not space efficient.
>>> If we have an EXACTFish node consisting entirely of ASCII characters
>>> that don't have folds outside of ASCII (so Kk are out because they match
>>> the KELVIN SIGN), it would be possible to create a parallel string to be
>>> used as a mask, specially constructed so that each byte is the
>>> appropriate mask for /i on the corresponding byte in the main string.
>>> When doing the match, you load a words-worth
>> I wandered lonely as a cloud that floats on high o'er vales and hills,
>> And all at once I saw a crowd....
>> Sorry, I got sidetracked.
> It never occurred to me before that his name is somewhat apt
>>> of the string, and the
>>> corresponding portion of the mask, AND them together and compare.  If
>>> then fail or go on to the next word full.
>>> Thus you wouldn't have to look up the fold for every byte as we do now.
>>> If you've ever watched an EXACTFish node execute with backtracking, it's
>>> depressing how often we have to do this over and over.
>>> This could also work on many UTF-8 strings as well.
>>> It essentially doubles the length of the pattern, but my guess is that
>>> it's around an order of magnitude faster on a 64-bit machine.
>> Maybe we could actually make 'use less "memory"' do something.
>> How much faster is it on 32-bit machines.  (I am no longer in a posi-
>> tion to provide 32-bit benchmarks.  Machines bitten 32 times seem to
>> be getting rarer these days.)
> I think I was a little optimistic.  On an 8 byte word, we do a 
> conditional every 8 bytes instead of every single one.  That's a 
> difference of 7 conditionals per word.  And indeed, in the recent 
> word-at-a-time changes I've made, we see that conditionals ratio 
> approaching 800%, but the improvement is only 700%; still quite 
> significant.  The number of instructions executed doesn't have this high 
> a curve, as we do extra masks, etc per word that don't have to be done 
> when looking at a single byte.  But reducing the number of conditionals 
> is a far bigger deal.
> I was thinking 800%, and then that you don't have to fold, to come up 
> with my estimate, but that folding doesn't have to involve extra 
> conditionals, so that's not as big a deal as I had thought.
> On 4 byte words, you save 3 conditionals per word that gets processed. 
> So the savings are roughly half what you get with an 8 byte word.
> There is extra overhead at the beginning and end of the string, as you 
> have to get to a word boundary before you can start reading 
> word-at-a-time (unless we wrote a good probe that somehow also knew that 
> unaligned reads weren't slower than aligned ones), and you have to 
> process any straggler bytes that don't fill a full word at the end.  And 
> you have to have extra tests to determine that you have or don't have 
> these conditions that may need to be handled.

Yves had another idea which is to just do this for now on the pattern's 
start class, so the memory use isn't really a concern.  (I have 
forwarded his offlist reply, to this thread)

But I note that c05cc3b625744ca35b9d2edaf1c108c901eaad86 speeds up the 
finding of where to start matching by the promised order of magnitude 
(on a 64-bit system)

I also researched all the Unicode case folds, and about half would work 
with this technique.  So it's something I will investigate

Thread Previous Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at | Group listing | About