develooper Front page | perl.perl5.porters | Postings from September 2006

[PATCH] Add recursive regexes similar to PCRE

Thread Next
From:
demerphq
Date:
September 30, 2006 16:43
Subject:
[PATCH] Add recursive regexes similar to PCRE
Message ID:
9b18b3110609301643y5ea07b90v75e792a30b3b6b91@mail.gmail.com
With this patch we can support recursive patterns without using
compiled code in the same way that PCRE does.

The addition is of a new symbol (?PARNO) so that we can write things like

  /^(<(?:[^<>]+|(?1))*>)$/

to match a string that contains balanced '<>' patterns.

In an earlier post (Message-ID:
<9b18b3110609100519t555fee1foe316e46f4f680d3d@mail.gmail.com>) I
postulated a special bracket to define the extent of the sub pattern,
but the attached implementation requires that the subpattern be
defined by a capturing parens. This is conformant to the behaviour of
PCRE (and I believe Python), but IMO is a little less than ideal, I
can imagine situations where it would be useful to do this without
needing a capturing parens. Maybe do both for compatibility, but also
a more perlish flavour (ie TMTWONTDI)?

This sits on top of the EVAL/recursion work that Dave Mitchell has
done, which, I have to say, made this patch a lot easier to do than I
thought it would be when I started out. Thanks Dave.

My changes required adding a field to the eval union to store the
paren number that determines the end of the recursion, then when the
CLOSE regop is entered if it corresponds to the curren
cur_eval->u.eval.close_paren then it pretends its actually an END. In
the course of this I added some code to handle situations like this:

  $qr=qr/(??{$qr})/;

making them to throw a fatal error if a eval/recursion goes 50 frames
without consuming input. Otherwise its an infinite loop. Likewise for
/((?1))/.

Also, I noticed that there is a problem with /(??{\"})/ which results
in a panic:

  D:\dev\perl\ver\28902_\win32>..\perl -le"/(??{\"})/;"
  panic: top_env

I haven't tried to fix that one.

The patch also includes some new diagnostics output, and I added some
comments to regcomp.pl so that they end up in regnodes.h, as well as
some other minor fixes/improvements.

Requires regen.

cheers,
Yves



-- 
perl -Mre=debug -e "/just|another|perl|hacker/"

Thread Next


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About