develooper Front page | perl.perl5.porters | Postings from September 2017

regex conundrum: \G not at the start of a string.

Thread Next
September 27, 2017 13:50
regex conundrum: \G not at the start of a string.
Message ID:
Someone just pointed out the following bug to me:

$ perl -le'my $str="foobar"; $str=~/foo/g and print "ok";
$str=~/12\G|foo|\Gbar/gc and print $&'
$ perl -le'my $str="foobar"; $str=~/foo/g and print "ok";
$str=~/123\G|foo|\Gbar/gc and print $&'

It comes from the fact that we allow \G to be used at positions not at
the beginning of a pattern. We then calculate a nominal start offset
to start matching from based on the position of the \G in the string.
When we match we subtract that offset from pos().

However this case illustrates the problem. The 123\G tells the engine
to try to match from -3 from pos(), which happens to allow the second
branch of the alternation to match, but foo should not match except at
pos() or later.

I dont see how we can provide sane semantics in this kind of case, I
also dont really see why we have to support this kind of insanity:

perl -le'my $str="foobar"; $str=~/foo/g and print "ok";
$str=~/foo\Gbar/gc and print $&'

I believe we have simple tests for this kind of behavior, but I think
they are more there to test for regressions rather than an assertion
that the behavior is desirable. I am inclined to say that we should
forbid use of \G except at the very start of a pattern (where it can
be very useful).

What do folks thing here?


perl -Mre=debug -e "/just|another|perl|hacker/"

Thread Next Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at | Group listing | About