develooper Front page | perl.perl6.language.regex | Postings from December 2000

Re: Perl 5's "non-greedy" matching can be TOO greedy!

From:
Jeff Pinyan
Date:
December 14, 2000 13:27
Subject:
Re: Perl 5's "non-greedy" matching can be TOO greedy!
Message ID:
Pine.GSO.4.21.0012141611170.4213-100000@crusoe.crusoe.net
On Dec 14, Deven T. Corzine said:

>The crux of the problem is that non-greedy qualifiers don't affect the
>"earliest match" behavior, which makes the matches more greedy than they
>really ought to be.

That's because "greediness" is just a measure of crawl vs. backtrack.  The
regex /a.*b/ will match 'a', and as many non-\n characters as possible,
and then look for a 'b'.  Upon failing, it will back up one character.  On
the other hand, /a.*?b/ matches an 'a', and then 0 characters, and then
tries to match a 'b', and upon failing matches another character, etc.

>     $_ = "aaaabbbbccccddddeeee";
>     ($greedy) = /(b.*d)/;              # "bbbbccccdddd" (correct)
>     ($non_greedy) = /(b.*?d)/;         # "bbbbccccd" (should be "bccccd"!)
>
>Does anyone disagree with the premise, and believe that "bbbbccccd" is the
>CORRECT match for the non-greedy regexp above?

>     match as many times as possible (given a particular starting
>     location) while still allowing the rest of the pattern to match.

The starting location is the first 'b' it matches.  Greediness has nothing
to do with the 'b' in your regex -- it has to do with the '.'.  The engine
matches a 'b', and then starts working on 0 or more of anything.

You're asking for something like

  /(?<!b)(b.*?d)/

which is an "optimization" you'll have to incorporate on your own.

-- 
Jeff "japhy" Pinyan     japhy@pobox.com    http://www.pobox.com/~japhy/
CPAN - #1 Perl Resource  (my id:  PINYAN)       http://search.cpan.org/
PerlMonks - An Online Perl Community          http://www.perlmonks.com/
The Perl Archive - Articles, Forums, etc.   http://www.perlarchive.com/




nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About