develooper Front page | perl.perl5.porters | Postings from June 2019

Re: [perl #134209] Regex stores wrong value in $1, $^N etc,thoroughly corrupting parsers

Thread Previous | Thread Next
Jamie Lokier
June 20, 2019 17:27
Re: [perl #134209] Regex stores wrong value in $1, $^N etc,thoroughly corrupting parsers
Message ID:
Tony Cook via RT wrote:
> On Tue, 18 Jun 2019 13:37:38 -0700, jamie wrote:
> > The following short Perl program shows the error:
> > 
> > sub S { "A" =~ /(.)(?{})/; }
> > "xyz" =~ /(?:(.)(?{say $1;S()}))*/;
> > 
> > Output should be:
> > 
> > x
> > y
> > z
> > 
> > Actual output is:
> > 
> > x
> > A
> > A
> > 
> > This is because the value stored in $1 is incorrect.
> I'm not suggesting this isn't a bug, but has it ever worked the way you expect?

No, I wouldn't have expected everything to be fine prior to Perl 5.18.

I had thought regex variables and (?{...}) blocks in general to be
largely reliable since the big regex engine update of Perl 5.18, and
quite buggy prior to that.  Before that we couldn't trust lexicals
inside (?{...}) at all, and I used to use only global variables and
very simple code fragments within those scopes.  Certainly no calling
subroutines in those days, let alone pattern matching on $^N.

From perl5180delta(1):

  - The implementation of code blocks in regular expressions, such as
    "(?{})" and "(??{})", has been heavily reworked to eliminate a
    whole slew of bugs.

      - Lexical variables are now sane as regards scope, recursion and
        closure behavior.

So I've always thought lexical variables and scopes generally were
fixed in 5.18.  I haven't noticed other problems with them, and I use
them quite a bit.  (This made Perl 5.18 the minimum version my code
requires, until I adopted subroutines signatures.)

Variables like $^N and $1 are so much more "straightforward" and
"core" than closures, that with someone having put in the effort to do
closures and (?{...}) correctly, I'm really surprised to recently
notice bogus values in them.

So surprised, in fact, that I didn't notice with one parser alone,
until I started finding errors in the data, when multiple parsers are
used with one calling another - years after getting used to $1, @-, @+
and $^N "doing what they say on the tin".

-- Jamie

Thread Previous | Thread Next Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at | Group listing | About