develooper Front page | perl.perl5.porters | Postings from March 2013

Re: Is this a /^*/ bug?

Thread Previous | Thread Next
Aristotle Pagaltzis
March 26, 2013 16:15
Re: Is this a /^*/ bug?
Message ID:
* <> [2013-03-26 10:30]:
> "A. Pagaltzis" <> wrote:
> > Should zero-width assertions in general be quantifiable at all?
> I believe so. The canonical example is optional lookahead with capture
> /(?=x(y)z)?/ - the side-effect of capture makes it a not necessarily
> useless thing to do.

Is there a similar case to make for `*` instead of `?`?

> > What would be the fallout of making that a compile-time error?
> A bunch of stuff would fall over at runtime; in a small number of
> cases that would highlight a bug that might not otherwise have been
> found, or found as quickly; in most cases it will already have been
> doing what was wanted, and people will either change the pattern to
> something equivalent-but-legal (if a fixed pattern), or put in extra
> (probably buggy) logic to detect and change to equivalent-but-legal
> for a constructed pattern.

The case with captures within zero-width assertions essentially makes it

One could debate the utility of still disallowing quantifiers after
non-generic zero-width assertions like `^`, `$`, `\A`, `\b` etc., which
inherently cannot enclose captures within (nor (?{}) constructs, for
that matter).

But then we get a conditional rule with an ad-hoc inclusion list, rather
than a general principle, which is exactly what Zefram argued against
with good reason.

Oh well.

> I say "probably buggy", because detecting this in the general case is
> quite hard: I think you can't do it with anything short of a full
> regexp parser.

I did mean for the pattern compiler to detect this during REx program
construction, when it creates a node for a quantifier, and finds its
previous node to be some zero-width assertion construction. I did not
mean doing it by way of some clumsy string matching against the pattern.

> > If too bad, what of optimising them away with a specially worded
> > warning?
> That's exactly what we do now. The specially worded warning is "matches null
> string many times in regex", and you can see the optimization under debug:
> % perl -Mre=debug -wle '"foo" =~ /^+f/' 2>&1 | grep 'while'
>        whilem: matched 0 out of 1..32767
>          whilem: matched 1 out of 1..32767
>          whilem: empty match detected, trying continuation...
> %
> Without optimization, it would match 32767 of them before continuing.

I thought it was a runtime warning. Mea culpa.

* demerphq <> [2013-03-26 11:15]:
> On 26 March 2013 11:01, Zefram <> wrote:
> > wrote:
> > There's no warning for /^?/.
> I dont think there should be
> ?
> is syntactic sugar for
> (?:X|)
> so
> ^?
> means:
> (?:^|)
> and IMO at that point it is perfectly reasonable. Anything could be in
> the second half of that alternation.

No it couldn’t. If you write the long form with the alternation, then
sure. And there are lots of times I have written `(?:^|/)` in URL
rewrite rules, say. But I have never literally written `(?:^|)` (with
nothing on the right side). Because, err, what would be the point? But
since we’re not trying to solve the halting problem, a literal `(?:^|)`
is reasonable to not warn on.

However there is no way to write `^?` and then add something to it such
that it equates to `(?:^|$whatever)`. So I don’t see how a literal `^?`
would ever make sense to show up in a pattern. (Well – I suppose it *is*
a cute way of golfing `(?:)`…) And so in the name of assisting the user
it would be reasonable to warn or even error on that construct.

But the inconsistency required between simple and compound zero-width
assertions makes me disinclined toward the idea anyhow. So consider my
tentative suggestion retracted.

Aristotle Pagaltzis // <>

Thread Previous | Thread Next Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at | Group listing | About