develooper Front page | perl.perl6.language.regex | Postings from December 2000

Re: Perl 5's "non-greedy" matching can be TOO greedy!

Thread Previous | Thread Next
From:
Nathan Wiger
Date:
December 14, 2000 15:46
Subject:
Re: Perl 5's "non-greedy" matching can be TOO greedy!
Message ID:
200012142344.PAA27223@postoffice.West.Sun.COM
>The crux of the problem is that non-greedy qualifiers don't affect the
>"earliest match" behavior, which makes the matches more greedy than they
>really ought to be.
>
>Here is a simple example: (tested with perl 5.005_03)
>
>     $_ = "aaaabbbbccccddddeeee";
>     ($greedy) = /(b.*d)/;              # "bbbbccccdddd" (correct)
>     ($non_greedy) = /(b.*?d)/;         # "bbbbccccd" (should be "bccccd"!)
>
>I'm sure this will complicate the NFA for the regexp, but it seems like
>this really ought to be fixed, at least in Perl 6.  (There's a good case to
>be made for fixing it in Perl 5, but people have ignored/missed it for this
>long already...)
>
>Does anyone disagree with the premise, and believe that "bbbbccccd" is the
>CORRECT match for the non-greedy regexp above?

FWIW, I do. I agree with others - the above is exactly how I'd expect it
to behave. When you're working with regexp's, matches are made going
forward (overall). This is not really unique to Perl.

Perhaps what you want is something like this:

    ($non_greedy) = /(b[^bd]*d)/;     # will be "bccccd"

I believe Tom also mentioned this one in his reply:

    ($non_greedy) = /.*(b.*?d)/;      # will be "bccccd"

But I don't see how this could be called a "design flaw". Maybe it differs
from a behavior you might find useful, and maybe some other .* variant
could be considered to do what you like, but when I see the regexp above:

    ($non_greedy) = /(b.*?d)/;         # "bbbbccccd"

I read it something like this:

    "match the first 'b' you find, followed by absolutely anything
    (except a 'd'), up to the first 'd' you find"

Again, .*? is only non-greedy going forward, not both ways. Perhaps this
could be "changed", but I don't think "fixed" is the correct terminology.
You're talking about changing .* from working forward to working outward.

-Nate


Thread Previous | Thread Next


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About