Tom Christiansen wrote:
>For /(?-m:$)/, we have simply:
>
> while (<>) {
> last if /^END$/;
This is one of the cases that I classified as "using $ correctly although
it doesn't actually match the line-ending semantics". Applying the
formulation you used on /^/m:
$ perl -le 'print scalar(()=("abc\n" =~ /$/g))'
2
$
This line has two ends? Consider running the pattern /c\s$/, which
matches "abc ", "abc \n", and, er, "abc\n". (Perl6's "horizontal
whitespace" concept would help with this one.) It at least has the
right number of beginnings (as judged by /^/).
>That connection is that they are "newline-tolerant", not "newline-
>sensitive".
Point taken. That's a case that I missed from my table of sane
line-delimiting rules. So the semantic you're aiming for in the
single-line case is that the line ends before the final "\n" if there is a
"\n" at the end, and otherwise ends at the end of the string. In this
case the appropriate meaning for /$/ would be /(?=\n\z)|(?<!\n)\z/.
This matches exactly once per string.
This still leads to accepting unintended characters when programmers use
the newline-tolerant anchor on input where contextually a trailing \n is
not an (insignificant) line terminator. Unperlish this thought may be,
but I think Perl would have been a better language if it provided separate
line-ending anchors for the newline-terminated and undelimited-line cases
and encouraged their use. It would have avoided this large class of bugs.
>You want the traditional behavior of /$/ because /$/ in ed, sed, awk, vi,
Yes, I well see the concept of "line end" that is being aimed for.
I'm saying that Perl doesn't actually achieve it.
>Undocumented? Of that I'm not sure, and I couldn't see a unidiff that
>showed a doc change.
perlre in 5.8.8:
# Embedded newlines will not be matched by "^" or "$". You may,
# however, wish to treat a string as a multi-line buffer, such that
# the "^" will match after any newline within the string, and "$"
# will match before any newline. At the cost
perlre in 5.10.0:
# Embedded newlines will not be matched by "^" or "$". You may,
# however, wish to treat a string as a multi-line buffer, such that
# the "^" will match after any newline within the string (except
# if the newline is the last character in the string), and "$"
# will match before any newline.
> Perl is merely being newline-tolerant again.
>It's trying to follow what people are expecting to happen if they pulled
>that string into their editor and set line numbers on.
Let's see... /^/m's concept of lines seems to be: (a) there's generally
a sequence of \n-terminated lines; (b) however, if the last character
isn't a \n then there's a non-empty last line that doesn't have a \n
terminator; and (c) if there are no characters at all then this counts
as one empty line, rather than no lines. Smells like vi. I can't say
I've ever wanted to split up a string into lines this way, but I'll
stipulate that it's sane.
But /$/m doesn't agree. It has a different concept of lines, and so
(as in the single-line case) a lot of the strings you demonstrate on
have a different number of line ends from line beginnings (as judged by
/$/m and /^/m). /^...$/ and /^...$/m are not matched pairs.
>No, they won't; you're being absurd and alarmist. But as I've had quite
>enough of your ranting for the night, that means you get to wait until my
>morrow, or more, to learn why you're wrong--and how.
I look forward to it.
-zefram
Thread Previous
|
Thread Next