develooper Front page | perl.perl5.porters | Postings from January 2018

Re: regrepeat()

Thread Previous | Thread Next
From:
demerphq
Date:
January 31, 2018 22:56
Subject:
Re: regrepeat()
Message ID:
CANgJU+VeO9+NPM0CYXyZMf3Wv3Foma1XS5invm17fDo7C-6sSQ@mail.gmail.com
On 30 January 2018 at 19:41, Karl Williamson <public@khwilliamson.com> wrote:
> On 01/02/2018 05:37 AM, Dave Mitchell wrote:
>>
>> On Mon, Jan 01, 2018 at 07:08:10PM +0100, demerphq wrote:
>>>
>>> On 31 December 2017 at 22:36, Karl Williamson <public@khwilliamson.com>
>>> wrote:
>>>>
>>>> This function is called during regular expression pattern matching for
>>>> things like
>>>>
>>>>   (foo)+
>>>>
>>>> to  match as many 'foo's as there are.  There is special code to handle
>>>> the
>>>> case where foo is a single byte, such as in
>>>>
>>>>   a+
>>>>
>>>> It turns out that these cases can be sped up dramatically if what we are
>>>> matching is a long string of 'a's in a row.  We simply load a word with
>>>> 4 or
>>>> 8 a's and look at the string a word-at-a-time, which uses 1/4 or 1/8 the
>>>> number of instructions.  By using a mask, this can be extended to work
>>>> for
>>>>
>>>>   [aA]+
>>>>
>>>> as well.  The code in each case is just over 20 lines of C.
>>>>
>>>> My question is, does this happen often enough in real life to justify
>>>> the
>>>> extra code?
>>>>
>>>> Leon pointed out that in DNA, there may be longish strings of 'A's.
>>>
>>>
>>> I think this is highly likely to be worth it.
>>
>>
>> +1
>>
>
> Now done by
> ab1efbdc1f74b2f4db076efa0b4d54f387d74efe
> 070e8b2ef4f827a7e0d3199f7b37883a09545802

Thank you Karl, not just for this, but all the work you put into the
regex engine and Perl as a whole. ++ to you.

Yves

-- 
perl -Mre=debug -e "/just|another|perl|hacker/"

Thread Previous | Thread Next


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About