develooper Front page | perl.perl5.porters | Postings from March 2013

Re: Is this a /^*/ bug?

Thread Previous | Thread Next
From:
Aristotle Pagaltzis
Date:
March 26, 2013 16:15
Subject:
Re: Is this a /^*/ bug?
Message ID:
20130326161532.GA473@fernweh.plasmasturm.org
* hv@crypt.org <hv@crypt.org> [2013-03-26 10:30]:
> "A. Pagaltzis" <pagaltzis@gmx.de> wrote:
> > Should zero-width assertions in general be quantifiable at all?
>
> I believe so. The canonical example is optional lookahead with capture
> /(?=x(y)z)?/ - the side-effect of capture makes it a not necessarily
> useless thing to do.

Is there a similar case to make for `*` instead of `?`?

> > What would be the fallout of making that a compile-time error?
>
> A bunch of stuff would fall over at runtime; in a small number of
> cases that would highlight a bug that might not otherwise have been
> found, or found as quickly; in most cases it will already have been
> doing what was wanted, and people will either change the pattern to
> something equivalent-but-legal (if a fixed pattern), or put in extra
> (probably buggy) logic to detect and change to equivalent-but-legal
> for a constructed pattern.

The case with captures within zero-width assertions essentially makes it
unappealing.

One could debate the utility of still disallowing quantifiers after
non-generic zero-width assertions like `^`, `$`, `\A`, `\b` etc., which
inherently cannot enclose captures within (nor (?{}) constructs, for
that matter).

But then we get a conditional rule with an ad-hoc inclusion list, rather
than a general principle, which is exactly what Zefram argued against
with good reason.

Oh well.

> I say "probably buggy", because detecting this in the general case is
> quite hard: I think you can't do it with anything short of a full
> regexp parser.

I did mean for the pattern compiler to detect this during REx program
construction, when it creates a node for a quantifier, and finds its
previous node to be some zero-width assertion construction. I did not
mean doing it by way of some clumsy string matching against the pattern.

> > If too bad, what of optimising them away with a specially worded
> > warning?
>
> That's exactly what we do now. The specially worded warning is "matches null
> string many times in regex", and you can see the optimization under debug:
>
> % perl -Mre=debug -wle '"foo" =~ /^+f/' 2>&1 | grep 'while'
>        whilem: matched 0 out of 1..32767
>          whilem: matched 1 out of 1..32767
>          whilem: empty match detected, trying continuation...
> %
>
> Without optimization, it would match 32767 of them before continuing.

I thought it was a runtime warning. Mea culpa.


* demerphq <demerphq@gmail.com> [2013-03-26 11:15]:
> On 26 March 2013 11:01, Zefram <zefram@fysh.org> wrote:
> > hv@crypt.org wrote:
> > There's no warning for /^?/.
>
> I dont think there should be
>
> ?
>
> is syntactic sugar for
>
> (?:X|)
>
> so
>
> ^?
>
> means:
>
> (?:^|)
>
> and IMO at that point it is perfectly reasonable. Anything could be in
> the second half of that alternation.

No it couldn’t. If you write the long form with the alternation, then
sure. And there are lots of times I have written `(?:^|/)` in URL
rewrite rules, say. But I have never literally written `(?:^|)` (with
nothing on the right side). Because, err, what would be the point? But
since we’re not trying to solve the halting problem, a literal `(?:^|)`
is reasonable to not warn on.

However there is no way to write `^?` and then add something to it such
that it equates to `(?:^|$whatever)`. So I don’t see how a literal `^?`
would ever make sense to show up in a pattern. (Well – I suppose it *is*
a cute way of golfing `(?:)`…) And so in the name of assisting the user
it would be reasonable to warn or even error on that construct.

But the inconsistency required between simple and compound zero-width
assertions makes me disinclined toward the idea anyhow. So consider my
tentative suggestion retracted.

Regards,
-- 
Aristotle Pagaltzis // <http://plasmasturm.org/>

Thread Previous | Thread Next


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About