develooper Front page | perl.perl5.porters | Postings from May 2003

Re: [perl #22182] regular expression bug (design limitation?)

Thread Previous | Thread Next
From:
Edward Peschko
Date:
May 14, 2003 18:18
Subject:
Re: [perl #22182] regular expression bug (design limitation?)
Message ID:
20030514180921.A1680@mdssirds.comp.pge.com
On Wed, May 14, 2003 at 10:47:06PM -0000, Ilya Zakharevich wrote:
> On Wed, May 14, 2003 at 03:07:05PM -0700, Edward Peschko wrote:
> 
> Sorry, I could not guess what you have in mind...
> 
> Yours,

Suppose you had the following code:

my $string = 'abcdefg';

sub _position_increment 
{ 
	my ($string,$pos) = @_;
	return($pos+3);
}

my $_coderef = \&_position_increment;

if ($string =~ m"($_coderef)")
{
	print $1;   # prints 'abc';
}

I'd want - the moment the regular expression sees $_coderef, for perl to run the function
underlying coderef, passing it the string and the position at which the string was at
when coderef was run.

So - stepping through:

my $string = 'abcdefg'; # pos($string) == 0;

sub _position_increment 
{ 
	my ($string, $pos) = @_;            # when we see this function, increment pos by 3
	return($pos+3);                     # return the new position because we are 
}                                       # automatically succesful. (if we wanted a non
                                        # successful match, return(-1);
                                        
                                        
										
my $_coderef = \&_position_increment;
if ($string =~ m"($_coderef)")
{
	print STDERR "$1";                  # the engine knows we went in with a pos of 0
	                                    # and came out with a pos of 3, hence it saves
										# 'abc' in backreference
}

There's a hook into @INC to do a similar thing:

push(@INC, \&function_modifying_inc_at_point_of_use);

which runs a coderef each time an '@INC' is seen.

Anyways, the point is that the proposed syntax is a lot more intuitive than (?...) since 
the syntax is cleaner, its backwards compatible with the current regex engine, and it 
integrates better with C code. Ultimately, I'd like to be able to write:

int _parens( char *str, int pos)
{
	int len = strlen(str);
    while (str[pos++] != '(' && pos < len) { }
	while (str[pos++] != ')' && pos < len) { }
			
	if (pos == len) return(-1)
	return(pos);
}

my $_parens = \&_parens;

if ($string =~ m"($_parens)stuff")
{
}

and have the regular expression engine:

    a) recognize that $_parens is a code reference
	b) run the code behind that coderef 
	c) increment pos to the position where the first ')' is seen. (as per c code)
    d) auto-fail (with a -1) if no ')' is seen
	e) continue on to match the string 'stuff' after the pos returned by the function 
	   _parens.

In other words, I want the ability to put my own C hooks inside the regex engine.

Ed

Thread Previous | Thread Next


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About