develooper Front page | perl.perl5.porters | Postings from November 1999

Re: Regex match variables, threads, and recursion

From:
Mark-Jason Dominus
Date:
November 5, 1999 10:31
Subject:
Re: Regex match variables, threads, and recursion
Message ID:
19991105183147.5190.qmail@plover.com

> A couple of days ago someone (mjd?) was wondering if the problem that
> threads have with the regex match variables (where the regex variables are
> tied to pieces of the code and reflect the last thread that executed it)
> occurred with non-threaded code too. 

Well, it's more that I already knew there was this problem with
matching, and I was trying to find out if this other problem, that
regexes don't work properly under threads, was actually the same.

> Just for chuckles, I decided to try it this afternoon and, as
> expected, the problem can be duplicated with a non-threaded perl
> build. Witness:

Here's a simpler version:

	sub foo {
	  my $s = shift;
	  return unless $s =~ /(.)/;
	  print "$1";
	  foo(substr($s, 1));
	  print "$1";
	}

	foo('ouch');


We'd like this to emit `ouchhcuo', because we would expect the two
`print' calls in each invocation of `foo' to each print the same
thing.  But instead the backreference variables get clobbered by the
recursive call to `foo' and you get `ouchhhhh'.

Sarathy:
> I'm sad to see that it hasn't been fixed in more than a year. 

Almost two years since I brought it up, and at the time Chip called it
a `known limitation'.  

Here's my example from January 1998:

      # Given a pattern, return an anonymous function which 
      # checks to see if its argument matches that pattern
      sub make_matcher {
          my $pat = shift;
          sub { my $target = shift;
                $target =~ /$pat/o;
              };
        }

        my $a = make_matcher('a');
        my $b = make_matcher('b');

        print ($a->('aa') ? "matched\n" : "did not match.\n");   #1
        print ($b->('bb') ? "matched\n" : "did not match.\n");   #2
        print ($b->('aa') ? "matched\n" : "did not match.\n");   #3

You would like for #1 and #2 to match, and for #3 to not match.  But
instead, #3 matches and #2 does not.  You think you are returning two
anonymous functions, but they share code, and because the regex that
is cached by /o is cached in the shared code, the two functions share
the cached regex also.

(http://www.xray.mpe.mpg.de/mailing-lists/perl5-porters/1998-01/msg02163.html)

Most of the discussion occurred in February.  The solution here is
that instead of storing the cached regex (or the pointer to it) into
the op tree, you use an extra layer of indirection.  The op tree
should have an offset into the pad, and the cached regex is pointed to
from the pad.  The pad is not shared between threads / recursive
subroutine invocations / anonymous functions, so each one gets its own
cached regex.  Similarly, s/cached regex/backreference variables/.

There are a handful of other features that suffer from the same
problem.  Some of these were discussed in the thread from 1998
(Subject: Shared OPs among closures) This may include stateful scalar
operators such as glob() and .. and ... --- Chip and Sarathy spent
some time discussing these, but I was not able to figure out who was
right.  I just wrote a test for glob(), though, and it appears that
glob() does exhibit the bad behavior.

> Given that this is going to be fixed in one way or another for threaded
> perl, do we want to go all the way and fix it so non-threaded perl does
> lexical scoping for the match variables too?

I think that consensus in 1998 was that it should be fixed by
indirecting through the pad.

Other notes:

Tim Bunce: ``The same issue applies to threads.''
http://www.xray.mpe.mpg.de/mailing-lists/perl5-porters/1998-02/msg00101.html




nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About