develooper Front page | perl.perl6.language.regex | Postings from December 2000

Re: Perl 5's "non-greedy" matching can be TOO greedy!

Tom Christiansen
December 15, 2000 12:27
Re: Perl 5's "non-greedy" matching can be TOO greedy!
Message ID:
>I made a mistake in phrasing it this way, because it seemed to suggest that
>I thought it was an implementation bug that it returns "bbbbccccd" instead
>of "bccccd".  I didn't make it clear that I was trying to approach this as
>a purely SEMANTIC question, considered in isolation from the implementation
>of the system.  

You keep using "semantic".  However, I do not think that that word
means what you think it means.

>The question is, "what interpretation makes the most sense,
>at a high level", not "why does the current behavior make sense".

There are all three of them different things.

>It's not that there aren't justifications for the current behavior.  It's a
>question of perspective -- from one perspective (mine), "bccccd" makes more
>sense semantically.  

No, sir.  You cannot use the S word for that.

Here are the *SEMANTICS* of pattern matching in Perl:

When there's more than one match, the first match found (that is,
the leftmost) is the winner, with ties being resolved in favor of
the longer string for maximal matches and the shorter string for
minimal matches.

This is *not* an "implementational detail".  These *are* the
semantics.  You are asking for *different* semantics.

What you are doing is simply an attempt to impose a sloppy
English-language description on the behavior of the code.  Just
because you should happen to understand the English does not mean
that this describes the code.

It's like people thinking /<.*?>/ will find a tag because they are
thinking in English, not Perl.  Of course it won't.  

>I believe it it more intuitive, at the highest level.

"Intuitive" is another one of those words frequently bandied 
about that is nearly always misapplied.

    The frobnitz interface is more intuitive.

    The nipple is the only intuitive human interface.

    From my own historical experiences and resulting biases, 
    the frobnitz interface would have been more what
    I personally without regard to anyone else would have
    been expecting.

>>From a different (more implementation-oriented) perspective, the current

No, this is not "implementation-oriented".  It is merely the semantics.

>Hopefully, we can have a rational discussion about whether this semantic
>anomaly is real or imagined, what impact "fixing" it would have on the
>implementation (if it's deemed real), and whether it's worth "fixing".

I do not expect you to be rational, because I do not think we can
agree to your terms.  There is no semantic anomaly, anymore than
thinking that <.*> or <.*?> finds an HTML tag is some sort of
"semantic anomaly".   It is the result of your mistranslating between
English and code.  

>Here's where I see the disconnect happening.  I'm approaching this from a
>semantic perspective, asking myself "what should this match (ideally)?"

No, you're not.  Please stop abusing the S word.  It places you 
on no moral high ground whatsoever.

--tom Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at | Group listing | About