develooper Front page | perl.perl6.language.regex | Postings from December 2000

Re: Perl 5's "non-greedy" matching can be TOO greedy!

Jeff Pinyan
December 14, 2000 13:27
Re: Perl 5's "non-greedy" matching can be TOO greedy!
Message ID:
On Dec 14, Deven T. Corzine said:

>The crux of the problem is that non-greedy qualifiers don't affect the
>"earliest match" behavior, which makes the matches more greedy than they
>really ought to be.

That's because "greediness" is just a measure of crawl vs. backtrack.  The
regex /a.*b/ will match 'a', and as many non-\n characters as possible,
and then look for a 'b'.  Upon failing, it will back up one character.  On
the other hand, /a.*?b/ matches an 'a', and then 0 characters, and then
tries to match a 'b', and upon failing matches another character, etc.

>     $_ = "aaaabbbbccccddddeeee";
>     ($greedy) = /(b.*d)/;              # "bbbbccccdddd" (correct)
>     ($non_greedy) = /(b.*?d)/;         # "bbbbccccd" (should be "bccccd"!)
>Does anyone disagree with the premise, and believe that "bbbbccccd" is the
>CORRECT match for the non-greedy regexp above?

>     match as many times as possible (given a particular starting
>     location) while still allowing the rest of the pattern to match.

The starting location is the first 'b' it matches.  Greediness has nothing
to do with the 'b' in your regex -- it has to do with the '.'.  The engine
matches a 'b', and then starts working on 0 or more of anything.

You're asking for something like


which is an "optimization" you'll have to incorporate on your own.

Jeff "japhy" Pinyan
CPAN - #1 Perl Resource  (my id:  PINYAN)
PerlMonks - An Online Perl Community
The Perl Archive - Articles, Forums, etc. Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at | Group listing | About