develooper Front page | perl.perl5.porters | Postings from February 2008

syntax proposal for matching balanced strings

From:
David Nicol
Date:
February 13, 2008 12:14
Subject:
syntax proposal for matching balanced strings
Message ID:
934f64a20802131214y122aa32dh695db9e3cb750a5d@mail.gmail.com
On Feb 13, 2008 1:51 PM, David Nicol <davidnicol@gmail.com> wrote:

>
>
>    my @EltList = $elt_doc =~
>    qr{
>         (?[]:'<\s*(\S+)\s*([^>]*)\s*>'\R'</\s*\1\s*>')
>         |
>         (?[]:'<\s*(\S+)\s*([^>]*)\s*/>')
>    }gx;
>

sorry that was based on a draft idea before separating ?[]: and \R
this might be more correct


   my @EltList = $elt_doc =~
   qr{
        (?[]:<\s*(\S+)\s*([^>]*)\s*>\R</\s*\1\s*>)
         |
        (?[]:<\s*(\S+)\s*([^>]*)\s*/>)
        |
        ([^<]+)
   }gx;


although there are open questions about interaction of | and capturing.
Would it be possible to make (?[]:) aware of when it is one of a group
of alternatives and suppress the non-matching captures?  Or only
allow (?[  your regex here ])  to -- hmm -- that's even better


   my @EltList = $elt_doc =~
   qr{(?[
             <\s*(\S+)\s*([^>]*)\s*>\R</\s*\1\s*>
              |
             <\s*(\S+)\s*([^>]*)\s*/>
             |
             ([^<]+)
       ]x)}g;

(?[ your regex here ])  would capture to exactly one array-ref
containing the captures from the enclosed matching option, and
take modifiers after the closing square bracket.  On match failure,
for instance when used as a failed option, is still an empty arrayref
rather than empty-string (debatable.)

\R means, match this regex recursively if possible.



nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About