develooper Front page | perl.perl5.porters | Postings from February 2020

Re: Anomalies in parsing regex quantifiers

Thread Previous | Thread Next
From:
James E Keenan
Date:
February 11, 2020 17:08
Subject:
Re: Anomalies in parsing regex quantifiers
Message ID:
20200211170838.25896.qmail@lists-nntp.develooper.com
On 2/11/20 10:58 AM, Karl Williamson wrote:
> I have been looking at the code in regcomp.c in regpiece() that deals 
> with qwuantifiers.
> 
> After reordering things so that goto's don't cause it to jump back then 
> forth, some anonmalies became clear.   I also found some potential easy 
> optimizations.
> 
> I would expect that the results of parsing {1,} would be the same as 
> '+', and they both do generate the PLUS regnode, but the flags passed to 
> the higher level aren't set the same.  This is true of the other 
> shortcuts '*' and '?' as well.
> 
> I then tried to figure out what the consequences of those differences 
> are.  Two of the flags WORST and SPSTART do not appear to ever be looked 
> at.  Should we remove them, or dig to find out how they used to be used, 
> or might they come back again, and we should set them consistently?
> 
> regpiece assumes that any quantifier whose upper limit is non-zero 
> causes the construct to not match the null string, and sets HASWIDTH. 
> That simply isn't true when quantifying a zero-width assertion.  I 
> didn't look at what the optimizer does with that, but when I change that 
> a higher level warning is emitted:
> 
>   "Quantifier unexpected on zero-length expression "
> 
> Now to the optimizations:  I believe the quantifier {1,1} can simply be 
> optimized out.  There are occurrences in our test suite of this; I 
> believe from Abigail.  And I can see machine generated or interpolated 
> code ending up with this.  So we don't need to create a loop that gets 
> executed precisely once.  But there is {1,1}+, that has to be 
> considered; and that's easy to do.
> 
> Generally, in {m,m}? the ? is a no-op and can be omitted.

I think that research like this is important -- but we need to be 
mindful of where we are in our annual development cycle.  Your findings 
and recommendations are likely to be at such a deep level inside the 
codebase that their implications will take time to learn.  So we 
shouldn't be expecting to implement such recommendations in perl-5.32.0.

Thank you very much.
Jim Keenan

Thread Previous | Thread Next


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About