develooper Front page | perl.perl5.porters | Postings from February 2018

Re: Another idea for speeding up /i matching on ASCII characters

Thread Previous
From:
Karl Williamson
Date:
February 2, 2018 05:19
Subject:
Re: Another idea for speeding up /i matching on ASCII characters
Message ID:
ed0ecde6-7ed4-7403-1443-46771b28e875@khwilliamson.com
On 01/25/2018 03:30 PM, Karl Williamson wrote:
> On 01/25/2018 01:49 PM, Father Chrysostomos wrote:
>> Karl Williamson wrote:
>>> This however is not space efficient.
>>>
>>> If we have an EXACTFish node consisting entirely of ASCII characters
>>> that don't have folds outside of ASCII (so Kk are out because they match
>>> the KELVIN SIGN), it would be possible to create a parallel string to be
>>> used as a mask, specially constructed so that each byte is the
>>> appropriate mask for /i on the corresponding byte in the main string.
>>>
>>> When doing the match, you load a words-worth
>>
>> I wandered lonely as a cloud that floats on high o'er vales and hills,
>> And all at once I saw a crowd....
>>
>> Sorry, I got sidetracked.
> 
> It never occurred to me before that his name is somewhat apt
>>
>>> of the string, and the
>>> corresponding portion of the mask, AND them together and compare.  If
>>> then fail or go on to the next word full.
>>>
>>> Thus you wouldn't have to look up the fold for every byte as we do now.
>>> If you've ever watched an EXACTFish node execute with backtracking, it's
>>> depressing how often we have to do this over and over.
>>>
>>> This could also work on many UTF-8 strings as well.
>>>
>>> It essentially doubles the length of the pattern, but my guess is that
>>> it's around an order of magnitude faster on a 64-bit machine.
>>
>> Maybe we could actually make 'use less "memory"' do something.
>>
>> How much faster is it on 32-bit machines.  (I am no longer in a posi-
>> tion to provide 32-bit benchmarks.  Machines bitten 32 times seem to
>> be getting rarer these days.)
>>
> 
> I think I was a little optimistic.  On an 8 byte word, we do a 
> conditional every 8 bytes instead of every single one.  That's a 
> difference of 7 conditionals per word.  And indeed, in the recent 
> word-at-a-time changes I've made, we see that conditionals ratio 
> approaching 800%, but the improvement is only 700%; still quite 
> significant.  The number of instructions executed doesn't have this high 
> a curve, as we do extra masks, etc per word that don't have to be done 
> when looking at a single byte.  But reducing the number of conditionals 
> is a far bigger deal.
> 
> I was thinking 800%, and then that you don't have to fold, to come up 
> with my estimate, but that folding doesn't have to involve extra 
> conditionals, so that's not as big a deal as I had thought.
> 
> On 4 byte words, you save 3 conditionals per word that gets processed. 
> So the savings are roughly half what you get with an 8 byte word.
> 
> There is extra overhead at the beginning and end of the string, as you 
> have to get to a word boundary before you can start reading 
> word-at-a-time (unless we wrote a good probe that somehow also knew that 
> unaligned reads weren't slower than aligned ones), and you have to 
> process any straggler bytes that don't fill a full word at the end.  And 
> you have to have extra tests to determine that you have or don't have 
> these conditions that may need to be handled.
> 

Yves had another idea which is to just do this for now on the pattern's 
start class, so the memory use isn't really a concern.  (I have 
forwarded his offlist reply, to this thread)

But I note that c05cc3b625744ca35b9d2edaf1c108c901eaad86 speeds up the 
finding of where to start matching by the promised order of magnitude 
(on a 64-bit system)

I also researched all the Unicode case folds, and about half would work 
with this technique.  So it's something I will investigate

Thread Previous


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About