develooper Front page | perl.perl5.porters | Postings from January 2018

Re: regrepeat()

Thread Previous | Thread Next
Karl Williamson
January 20, 2018 18:53
Re: regrepeat()
Message ID:
On 01/18/2018 08:52 AM, Abigail wrote:
> On Sun, Dec 31, 2017 at 04:21:04PM -0700, Karl Williamson wrote:
>> On 12/31/2017 4:13 PM, Andy Lester wrote:
>>>> My question is, does this happen often enough in real life to justify the extra code?
>>> I'd bet that the case of the object of the + being a single character is the most common use of +, starting with things like s/ +$//;
>>> --
>>> Andy Lester =>
>> Yes, but the issue is in the strings it matches, are there a lot of
>> spaces in a row?  Only in that case is there a noticeable speed up
> I think multiple spaces in a row is the most common in practise.
> However, I also think most people would not write / +/, but /\s+/
> to match a sequence of spaces.
> Some questions:
>    - What is the effect of this proposed patch if /a+/ is used, and it
>      matches just one or two characters (or even fails to match?).
>    - Can it still partially match a string of 'a'? "aaaaa" =~ /a+(?=..)/
>      should match just three a's.
>    - Is there any effect on modifiers? /a+?/, /a++/

The patch just adds vectorization to the current operation, so it should 
have zero effect, except for performance, on anything.

Effectively what is happening is that we are looking for the end of a 
span of a given repeated byte.

There is extra code executed, which is more than balanced by the 
performance gain, should the data be such that there are enough bytes to 
be spanned in a row.  But these instructions are wasted on data that has 
too few such bytes.  Which is why I asked the question.

You can see the existing code here:

Note that this is now a static function, instead of an inline loop. 
There is the risk that the compiler will not inline it, so that there is 
added function call overhead.  I think that modern compilers will DTRT.

We don't know ahead of time where the span will end, but we do know the 
maximum potential span.  We shouldn't bother with trying this code 
unless the potential gain is worth it.  I have set an arbitrary value 
for now for what that is.

Otherwise the single loop currently used becomes potentially 3 loops.

The first loop advances to the next word boundary if not already there, 
returning from the function if the match fails.  So if the span is too 
short to reach the next word boundary, the only extra code executed is 
the initial test that the potential gain is significant enough.

Otherwise, there is an integer multiply to set up a word-full of copies 
of the spanned byte.

Then the second loop does the word-at-a-time parallelism.  This 
continues until we reach the end of input, or more likely, the span is 

Finally is the third loop to handle bytes after the final word boundary. 
  Even if there are none, there is an extra conditional required to 
discover that fact.

The final loop is also used to determine which byte terminated the span 
when the 2nd loop drops out after finding such a byte.  As I write this, 
I realize that there is a way for the 2nd loop to figure out which byte 
it is without conditionals.  The fastest way is to use xor and ffs(), if 
available (which on modern computers becomes a single machine 
instruction), but there are other ways, using shift, mask and integer 
multiply, similar to what S__variant_byte_number() in inline.h currently 

So, there's extra work involved when the span is short, but just a few 

Thread Previous | Thread Next Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at | Group listing | About