develooper Front page | perl.perl5.porters | Postings from July 2008

Re: Alarums and Excursions (was [perl #2783] Security of ARGV using 2-argument open)

Thread Previous | Thread Next
From:
Zefram
Date:
July 28, 2008 00:27
Subject:
Re: Alarums and Excursions (was [perl #2783] Security of ARGV using 2-argument open)
Message ID:
20080728072742.GB6303@fysh.org
Tom Christiansen wrote:
>For /(?-m:$)/, we have simply:
>
>    while (<>) {
>	last if /^END$/;

This is one of the cases that I classified as "using $ correctly although
it doesn't actually match the line-ending semantics".  Applying the
formulation you used on /^/m:

$ perl -le 'print scalar(()=("abc\n" =~ /$/g))'
2
$

This line has two ends?  Consider running the pattern /c\s$/, which
matches "abc ", "abc \n", and, er, "abc\n".  (Perl6's "horizontal
whitespace" concept would help with this one.)  It at least has the
right number of beginnings (as judged by /^/).

>That connection is that they are "newline-tolerant", not "newline-
>sensitive".

Point taken.  That's a case that I missed from my table of sane
line-delimiting rules.  So the semantic you're aiming for in the
single-line case is that the line ends before the final "\n" if there is a
"\n" at the end, and otherwise ends at the end of the string.  In this
case the appropriate meaning for /$/ would be /(?=\n\z)|(?<!\n)\z/.
This matches exactly once per string.

This still leads to accepting unintended characters when programmers use
the newline-tolerant anchor on input where contextually a trailing \n is
not an (insignificant) line terminator.  Unperlish this thought may be,
but I think Perl would have been a better language if it provided separate
line-ending anchors for the newline-terminated and undelimited-line cases
and encouraged their use.  It would have avoided this large class of bugs.

>You want the traditional behavior of /$/ because /$/ in ed, sed, awk, vi,

Yes, I well see the concept of "line end" that is being aimed for.
I'm saying that Perl doesn't actually achieve it.

>Undocumented?  Of that I'm not sure, and I couldn't see a unidiff that
>showed a doc change.

perlre in 5.8.8:

#      Embedded newlines will not be matched by "^" or "$".  You may,
#      however, wish to treat a string as a multi-line buffer, such that
#      the "^" will match after any newline within the string, and "$"
#      will match before any newline.  At the cost

perlre in 5.10.0:

#      Embedded newlines will not be matched by "^" or "$".  You may,
#      however, wish to treat a string as a multi-line buffer, such that
#      the "^" will match after any newline within the string (except
#      if the newline is the last character in the string), and "$"
#      will match before any newline.

>                           Perl is merely being newline-tolerant again.
>It's trying to follow what people are expecting to happen if they pulled
>that string into their editor and set line numbers on.

Let's see... /^/m's concept of lines seems to be: (a) there's generally
a sequence of \n-terminated lines; (b) however, if the last character
isn't a \n then there's a non-empty last line that doesn't have a \n
terminator; and (c) if there are no characters at all then this counts
as one empty line, rather than no lines.  Smells like vi.  I can't say
I've ever wanted to split up a string into lines this way, but I'll
stipulate that it's sane.

But /$/m doesn't agree.  It has a different concept of lines, and so
(as in the single-line case) a lot of the strings you demonstrate on
have a different number of line ends from line beginnings (as judged by
/$/m and /^/m).  /^...$/ and /^...$/m are not matched pairs.

>No, they won't; you're being absurd and alarmist.  But as I've had quite
>enough of your ranting for the night, that means you get to wait until my
>morrow, or more, to learn why you're wrong--and how.

I look forward to it.

-zefram

Thread Previous | Thread Next


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About