Front page | perl.perl5.porters |
Postings from November 1999
Re: Patch for Threading and Regexps
From: Ilya Zakharevich
November 20, 1999 01:42
Re: Patch for Threading and Regexps
Message ID: 199911200941.EAA26081@monk.mps.ohio-state.edu
Simon Cozens writes:
> However, I've read through your postings on this thread, and, like
> Tom, I'm unclear on what you think the larger bug is; I've read
> through them again, as you suggested, but I still can't find it.
No wonder it was not clear to you. It is probably 3 or 4 months as
I did not write more than a couple of pages about it on p5p. ;-)
But frankly speaking, I was sure that I wrote a regulary scheduled
treatise only a week ago. As it turned out, this was in a personal
Email, not in p5p email! So sorry for implying that the message was
easily available (though I did it on p5p many times too).
From firstname.lastname@example.org Wed Nov 3 17:46:11 1999
Date: Wed, 3 Nov 1999 17:46:11 -0500
From: Ilya Zakharevich <email@example.com>
To: Mark-Jason Dominus <firstname.lastname@example.org>
Subject: Re: Summary for last week up
Reply-To: Ilya Zakharevich <email@example.com>
References: <199911031910.OAA08418@monk.mps.ohio-state.edu> <firstname.lastname@example.org>
Content-Type: text/plain; charset=us-ascii
X-Mailer: Mutt 1.0pre2i
On Wed, Nov 03, 1999 at 04:41:37PM -0500, Mark-Jason Dominus wrote:
> > This all has no relation to reality.
> Yes, thanks. I thought that the way the regex engine decided which
> regex to operate on and what the flags were in the op tree, and the
> pointer to the backreference variables were all in global PL_*
> variables. Is this mistaken? I would like to make a correction. Can
> you explain the real problem to me?
I think Dan did it today, and it was discussed here *many* times, but
it was probably lost in "my shell is better than yours" arguments...
So called "global" variables are in fact per-thread, so there is no
problem here. And $digit variables have no storage associated with
them (they are *calculated* each time they are accessed), thus there
is no problem with this too.
The problem is with the semantic of $digit variables and with the
"saved-data" basing on which these variables are calculated. Remember
auto-localization clause? This means that at each moment of the live
of a program there may be hundreds of "saved-data" items stored, which
come to and go from scope depending on the control flow of the
Each successful match stores its "saved data" somewhere. Where are
they stored? Inside the match-opnode of the compiled code tree for
the program. Additionally, a "global" PL_curpm is set which points to
this opnode. (This global is reset on end of each block) Why is it
reasonable? Since when control jumps inside a running subroutine,
there is at most one enclosing scope which contains a given node.
Does this lead to problems? Yes, since code tree may be shared by
several subroutines (or subroutine invocations). If two of these
subroutines are "active" simultaneously (say, one calls another), then
the successful match in the internal one will overwrite the "saved
data" of the external one. This leads to long-standing "match data of
recursive subroutines" problem.
What may be possible solution? Each invocation of a subroutine has
some storage which is associated to *it only*. They all share a code
tree, but there is some storage which distinguishes them. These are
scratchpad arrays. Nodes of a compile tree may mention *offsets*
inside scratchpads, which are used for bookkeeping etc ("targets").
So the solution may be to allocate an entry in a scratchpad associated
with the match-mode, and move the "saved data" there. Simultaneously,
this would solve the only remaining threading problem as well. To
find all the places which need to be changed (and many of those which
should not) it is enough to grep for PL_curpm and PMOP.
As simple as that. One hour of work, excluding testing.