develooper Front page | perl.perl5.porters | Postings from February 2008

improved 2-part regex syntax improvement proposal (for matching balanced strings)

From:
David Nicol
Date:
February 13, 2008 13:46
Subject:
improved 2-part regex syntax improvement proposal (for matching balanced strings)
Message ID:
934f64a20802131345h21ab1e3ftc4da6ff735a9e50a@mail.gmail.com
or even better yet, leave the grouping/choice options out of (?[ code ]) and
use existing systems for that.  if you abut something against a (?[ code ])
and get a string like thisARRAY(0xDEADBEEF)that you'll find that out
soon enough, especially under strict.  Non-match is empty-string as normal,
match provides an arrayref of whatever is captured therein, which is
an arrayref not a string, and might raise a warning or die if it is forced to
interpolate rather than getting dereferenced.  I don't think there is currently
a way to directly capture anything besides a string now, so that would be
a paradigm shift, or at least a paradigm blur.


    my @EltList = $elt_doc =~ m{(
          (?[<\s*(\S+)\s*([^>]*)\s*>\R</\s*\1\s*>])
          |
          (?[<\s*(\S+)\s*([^>]*)\s*/>])
          |
          [^<]+
     )}gx;

\R means, match this regex recursively if possible.  "This" means
the largest piece under regex compilation at this time, and can be
replaced with (??{ code }) is tighter control is needed.  \R could
be used in the example in perldoc perlre to write
       $re = qr{
                  \(
                  (?:
                     (?> [^()]+ )    # Non-parens without backtracking
                   |
                     (??{ $re })     # Group with matching parens
                  )*
                  \)
               }x;
with
       $re = qr{
                  \(
                  (?:
                     (?> [^()]+ )    # Non-parens without backtracking
                   |
                     \R     # Group with matching parens
                  )*
                  \)
               }x;

and later, if $re was used as a component of something else, the \R
within it would still refer to (??{$re}) rather than the new larger thing.

-- 
"The ultraleft underground armed revolutionary groups that burned down
our offices and forced our staff to leave the villages at gunpoint now
disappeared.  We could finally concentrate on the production of fish."
-- Mohammad Yunus



nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About