Front page | perl.perl5.porters |
Postings from February 2018
Re: Another idea for speeding up /i matching on ASCII characters
Thread Previous
From:
Karl Williamson
Date:
February 2, 2018 05:19
Subject:
Re: Another idea for speeding up /i matching on ASCII characters
Message ID:
ed0ecde6-7ed4-7403-1443-46771b28e875@khwilliamson.com
On 01/25/2018 03:30 PM, Karl Williamson wrote:
> On 01/25/2018 01:49 PM, Father Chrysostomos wrote:
>> Karl Williamson wrote:
>>> This however is not space efficient.
>>>
>>> If we have an EXACTFish node consisting entirely of ASCII characters
>>> that don't have folds outside of ASCII (so Kk are out because they match
>>> the KELVIN SIGN), it would be possible to create a parallel string to be
>>> used as a mask, specially constructed so that each byte is the
>>> appropriate mask for /i on the corresponding byte in the main string.
>>>
>>> When doing the match, you load a words-worth
>>
>> I wandered lonely as a cloud that floats on high o'er vales and hills,
>> And all at once I saw a crowd....
>>
>> Sorry, I got sidetracked.
>
> It never occurred to me before that his name is somewhat apt
>>
>>> of the string, and the
>>> corresponding portion of the mask, AND them together and compare. If
>>> then fail or go on to the next word full.
>>>
>>> Thus you wouldn't have to look up the fold for every byte as we do now.
>>> If you've ever watched an EXACTFish node execute with backtracking, it's
>>> depressing how often we have to do this over and over.
>>>
>>> This could also work on many UTF-8 strings as well.
>>>
>>> It essentially doubles the length of the pattern, but my guess is that
>>> it's around an order of magnitude faster on a 64-bit machine.
>>
>> Maybe we could actually make 'use less "memory"' do something.
>>
>> How much faster is it on 32-bit machines. (I am no longer in a posi-
>> tion to provide 32-bit benchmarks. Machines bitten 32 times seem to
>> be getting rarer these days.)
>>
>
> I think I was a little optimistic. On an 8 byte word, we do a
> conditional every 8 bytes instead of every single one. That's a
> difference of 7 conditionals per word. And indeed, in the recent
> word-at-a-time changes I've made, we see that conditionals ratio
> approaching 800%, but the improvement is only 700%; still quite
> significant. The number of instructions executed doesn't have this high
> a curve, as we do extra masks, etc per word that don't have to be done
> when looking at a single byte. But reducing the number of conditionals
> is a far bigger deal.
>
> I was thinking 800%, and then that you don't have to fold, to come up
> with my estimate, but that folding doesn't have to involve extra
> conditionals, so that's not as big a deal as I had thought.
>
> On 4 byte words, you save 3 conditionals per word that gets processed.
> So the savings are roughly half what you get with an 8 byte word.
>
> There is extra overhead at the beginning and end of the string, as you
> have to get to a word boundary before you can start reading
> word-at-a-time (unless we wrote a good probe that somehow also knew that
> unaligned reads weren't slower than aligned ones), and you have to
> process any straggler bytes that don't fill a full word at the end. And
> you have to have extra tests to determine that you have or don't have
> these conditions that may need to be handled.
>
Yves had another idea which is to just do this for now on the pattern's
start class, so the memory use isn't really a concern. (I have
forwarded his offlist reply, to this thread)
But I note that c05cc3b625744ca35b9d2edaf1c108c901eaad86 speeds up the
finding of where to start matching by the promised order of magnitude
(on a 64-bit system)
I also researched all the Unicode case folds, and about half would work
with this technique. So it's something I will investigate
Thread Previous