develooper Front page | perl.perl5.porters | Postings from February 2020

Re: Anomalies in parsing regex quantifiers

Thread Previous | Thread Next
From:
hv
Date:
February 11, 2020 23:55
Subject:
Re: Anomalies in parsing regex quantifiers
Message ID:
202002112336.01BNanP15429@crypt.org
Karl Williamson <public@khwilliamson.com> wrote:
:I have been looking at the code in regcomp.c in regpiece() that deals 
:with qwuantifiers.
:
:After reordering things so that goto's don't cause it to jump back then 
:forth, some anonmalies became clear.   I also found some potential easy 
:optimizations.
:
:I would expect that the results of parsing {1,} would be the same as 
:'+', and they both do generate the PLUS regnode, but the flags passed to 
:the higher level aren't set the same.  This is true of the other 
:shortcuts '*' and '?' as well.
:
:I then tried to figure out what the consequences of those differences 
:are.  Two of the flags WORST and SPSTART do not appear to ever be looked 
:at.  Should we remove them, or dig to find out how they used to be used, 
:or might they come back again, and we should set them consistently?

I definitely think there's value in some digging, I'm happy to give that
a go, time permitting. But of those I'm sure at least WORST would be
from Ilya, quite likely SPSTART too, so digging is not guaranteed to
lead to light.

:regpiece assumes that any quantifier whose upper limit is non-zero 
:causes the construct to not match the null string, and sets HASWIDTH. 
:That simply isn't true when quantifying a zero-width assertion.  I 
:didn't look at what the optimizer does with that, but when I change that 
:a higher level warning is emitted:
:
:  "Quantifier unexpected on zero-length expression "
:
:Now to the optimizations:  I believe the quantifier {1,1} can simply be 
:optimized out.  There are occurrences in our test suite of this; I 
:believe from Abigail.  And I can see machine generated or interpolated 
:code ending up with this.  So we don't need to create a loop that gets 
:executed precisely once.  But there is {1,1}+, that has to be 
:considered; and that's easy to do.
:
:Generally, in {m,m}? the ? is a no-op and can be omitted.

I think it is worth trying to understand why \K{1,1} fails before
eliding the general {1,1} case, since I suspect there's something
fundamental going wrong there that will shed light on more than itself.

Hugo

Thread Previous | Thread Next


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About