develooper Front page | perl.perl5.porters | Postings from January 2018

Re: Name that regnode

Thread Previous
From:
Karl Williamson
Date:
January 30, 2018 19:43
Subject:
Re: Name that regnode
Message ID:
111f810e-1009-5f26-2a7e-f7123adb6fcf@khwilliamson.com
On 01/30/2018 11:44 AM, Karl Williamson wrote:
> On 01/29/2018 10:35 PM, Karl Williamson wrote:
>> On 01/25/2018 10:37 AM, Father Chrysostomos wrote:
>>> Karl Williamson wrote:
>>>> But I don't like the name I came up with for this regnode, MASKED.  I'm
>>>> open to suggestions for a better name.
>>>
>>> Existing char classes use ANYOF.  Maybe ANYOFMASK?  ANYPERMASK?
>>>
>>
>> I modified your first suggestion slightly to ANYOFM.
>>
>> I now have some performance numbers:
>>
>>   Key:
>>           Ir   Instruction read
>>           Dr   Data read
>>           Dw   Data write
>>           COND conditional branches
>>           IND  indirect branches
>>
>>       The numbers represent raw counts per loop iteration.
>>
>>       Results of ('b' x 10000) . 'a' =~ /[Aa]/
>>
>>                 blead    mask Ratio %
>>              -------- ------- -------
>>           Ir 153132.0 25636.0   597.3
>>           Dr  40909.0  2155.0  1898.3
>>           Dw  20593.0   593.0  3472.7
>>         COND  20529.0  3028.0   678.0
>>          IND     22.0    22.0   100.0
>>

Suppose the set matched by an ANYOF node is shy 1 code point of being a 
complete set that would allow it to be optimized into this node.  With 
these kinds of gains, it would still be faster to pretend for the moment 
that the missing code point is part of the set, and then at run time, 
make sure that it isn't the one matched.  This would require a 
specialized ANYOFM node to do, and I'm not planning to do it, but I 
wanted to document the idea.

> 
> And pushed as
> 
> commit 2813d4adc971fbaa124b5322d4bccaa73e9df8e2
>   Author: Karl Williamson <khw@cpan.org>
>   Date:   Mon Jan 29 20:47:56 2018 -0700
> 
>       Add ANYOFM regnode
> 
>       This is a specialized ANYOF node for use when the code points in it
>       have characteristics that allow them to be matched with a mask 
> instead
>       of a bit map.
> 
>    ...
> 
>       The set of ASCII characters also could be done with this node 
> instead of
>       having the special ASCII regnode, reducing code size and complexity.
>       I haven't investigated the speed loss of doing so.
> 
>       A NANYOFM node could be created for matching the complements this one
>       matches.
> 
>       A pattern like /A/i is not affected by this commit, but the regex
>       optimizer could be changed to take advantage of this commit.  What 
> would
>       need to be done is for it to look at the first byte of an 
> EXACTFish node
>       and if its one of the case pairs this handles, to generate a 
> synthetic
>       start class for it.  This would automatically invoke the sped up 
> code.
> 

Thread Previous


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About