develooper Front page | perl.perl5.porters | Postings from March 2006

recursion now removed from the regex engine

Thread Next
From:
Dave Mitchell
Date:
March 24, 2006 15:07
Subject:
recursion now removed from the regex engine
Message ID:
20060324230928.GA20717@iabyn.com
I've just applied change 27598, which stops the regex engine from being
recursive, thus fixing a whole bunch of stack-busting bugs.

Essentially, after a bit of tidying up with changes 27526 and 27534, the
only recursion left was S_regmatch() calling itself. My change causes
S_regmatch() to save its current local variables on the heap and then
re-enter the main loop, rather than recursing. Note that this is a
brute-force approach: I haven't done anything clever with the regex engine
to reduce the need for recursion - it's just that now when recursion is
needed, it fakes it.

This first patch basically does the minimum restructuring required to get it
working - basically adding gotos, and renaming and shuffling local
variables into bigger or smaller scopes as appropriate. In particular it
currently just mallocs the storage required each time to save the current
state.  Consequently this code runs the t/op/regex*.t tests about 3%
slower. I'm presuming that this will go away once I replace the mallocs
with arenas or savestack thinggies or something TBA.

There's only one point of recursion left, and that's in /(??{$re})/.
I may worry about that some other time.

Here's a section of comments from the code that tries to explain roughly
how the code's been restructured.

Dave

/*
 * This function used to be heavily recursive, but since this had the
 * effect of blowing the CPU stack on complex regexes, it has been
 * restructured to be iterative, and to save state onto the heap rather
 * than the stack. Essentially whereever regmatch() used to be called, it
 * pushes the current state, notes where to return, then jumps back into
 * the main loop.
 *
 * Originally the structure of this function used to look something like

    S_regmatch() {
	int a = 1, b = 2;
	...
	while (scan != NULL) {
	    ...
    	    switch (OP(scan)) {
		case FOO: {
		    int local = 3;
		    ...
		    if (regmatch(...))  // recurse
			goto yes;
		}
		...
	    }
	}
	yes:
	return 1;
    }

 * Now it looks something like this:

    struct regmatch_state {
	int a, b, local;
	int resume_state;
    };

    S_regmatch() {
	int a = 1, b = 2;
	int local;
	...
	while (scan != NULL) {
	    ...
    	    switch (OP(scan)) {
		case FOO: {
		    local = 3;
		    ...
		    resume_state = resume_FOO;
		    new_scan = ...;
		    goto start_recurse;
		    resume_point_FOO:

		    if (result)  // recurse
			goto yes;
		}
		...
	    }
	    start_recurse:
	    ...push a, b, local onto the heap
	    a = 1; b = 2;
	    scan = new_scan;
	}
	yes:
	result = 1;
	if (states pushed on heap) {
	    ... restore a, b, local from heap
	    switch (resume_state) {
	    case resume_FOO:
		goto resume_point_FOO;
	    ...
	    }
	}
	return result
    }
	    

-- 
The perl5 internals are a complete mess. It's like Jenga - to get the
perl5 tower taller and do something new you select a block somewhere in
the middle, with trepidation pull it out slowly, and then carefully
balance it somewhere new, hoping the whole edifice won't collapse as a
result.
	    - Nicholas Clark, based on an original by Simon Cozens.

Thread Next


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About