develooper Front page | perl.perl5.porters | Postings from August 2012

Re: [perl.git] branch blead, updated. v5.17.3-33-gbe4362b

From:
Jerry D. Hedden
Date:
August 27, 2012 09:00
Subject:
Re: [perl.git] branch blead, updated. v5.17.3-33-gbe4362b
Message ID:
CABcAEYzXM6Yjc-jjxTEdhYcR6XfJYS_yFodsiQRRr_wdhS+WDw@mail.gmail.com
Commit 99bd9d90ba5640856c6c421a174ea6e2743a4b3a added a SAVEIV()
command in toke.c (currently line 2490).  However, Perl_save_iv() is
in mathoms.c.  The result is that builds now fail in blead with
-DNO_MATHOMS.

On Tue, Aug 21, 2012 at 5:13 PM, Father Chrysostomos <sprout@cpan.org> wrote:
> In perl.git, the branch blead has been updated
>
> <http://perl5.git.perl.org/perl.git/commitdiff/be4362b774f7b2ce381426b3398934325635a510?hp=e07d7e14bdcbce2e3fdcfb70ac0e637a7e39e2dd>
>
> - Log -----------------------------------------------------------------
> commit be4362b774f7b2ce381426b3398934325635a510
> Merge: e07d7e1 1bf4876
> Author: Father Chrysostomos <sprout@cpan.org>
> Date:   Tue Aug 21 14:13:02 2012 -0700
>
>     [Merge] Here-doc parsing
>
>     I was waiting for 5.17.3 to be released, before merging my work on
>     padlists (which is blocking lexical subs), since I thought it would be
>     mean to inflict it on blead at the last minute before a release.
>
>     So, in the mean time, I decided to fix a small here-doc parsing bug,
>     that prevented them from occurring inside regexp code blocks.
>
>     As often happens, it turned out to be more involved than that....
>
>     I ended up writing a history of here-doc parsing, which you can find
>     in the commit message for 5097bf9b8d, which shows that the way they
>     have interacted with other quote-like operators (or other here-docs)
>     has changed over time in interesting ways.
>
>     While I was fixing those, I started to find other bugs.  Since I was
>     modifying the code, I decided to try applying David Nicol’s patch that
>     allows a here-doc terminator with no newline after it, to avoid creat-
>     ing more conflicts through my changes.  The patch didn’t work.  And
>     while I was resolving what conflicts there were, I figured out a sim-
>     pler approach.  So, instead of trying to investigate into why the
>     patch didn’t work, I just wrote my own version, which used less code.
>     Instead of working back on error to try to see whether we could have
>     accepted a terminator without a newline, we can just tack a newline on
>     the string buffer at EOF and let the rest of the code handle it the
>     usual way.
>
>     I continued to find more bugs as I went, till my ‘Yay, another bug!’
>     started to become ‘What? *Another* bug?’.
>
>     In the end:
>
>     • I fixed here-doc parsing, such that the body starts on the line fol-
>       lowing the <<foo marker, regardless of whether it is inside quotes,
>       string evals, or what have you (but see remaining bugs below).  This
>       was contrary to the documentation, but the documentation was actu-
>       ally wrong half the time, so I corrected it.
>     • Here-doc terminators no longer require a final newline at EOF.
>     • You no longer get crashes with edge cases.
>     • Nulls in comments no longer confuse the here-doc parser.
>
>     And, finally, one bug that I fixed was not related to here-docs per
>     se, but got in the way.  It deserves its own JAPH:
>
>     s/${s|||, \""}Just another Perl hacker,
>     /anything/;
>     print
>
>     There are still two bugs remaining:
>     • Here-docs whose markers occur in single-line s/// patterns where the
>       replacement part is multi-line or starts on a subsequent line are
>       still screwed.
>     • CR and CR LF line terminators are treated inconsistently inside and
>       outside of string evals.
>
>     I’ve decided to set those aside for later and merge what I’ve
>     done so far.
>
> commit 1bf4876033f9a2a0170bb97bc2e862a0b3fead35
> Author: Father Chrysostomos <sprout@cpan.org>
> Date:   Tue Aug 21 14:09:51 2012 -0700
>
>     perlop.pod: Update here-doc-in-quotes parsing rules
>
> M       pod/perlop.pod
>
> commit 372a31d8f53707bcfa9c233ce02a93f778b7bb4b
> Author: Father Chrysostomos <sprout@cpan.org>
> Date:   Tue Aug 21 01:11:34 2012 -0700
>
>     smoke-me diag
>
>     nt,hun
>
> M       configpm
>
> commit 5bd13da38ce1da800b0c3011ce3201c3b88541c3
> Author: Father Chrysostomos <sprout@cpan.org>
> Date:   Tue Aug 21 01:45:15 2012 -0700
>
>     toke.c:scan_heredoc: Use PL_tokenbuf less
>
>     When scanning for a heredoc terminator in a string eval or quote-like
>     operator, the first character we are looking for is always a newline.
>     So instead of setting term to *PL_tokenbuf in those two code paths,
>     we can just hard-code '\n'.
>
> M       toke.c
>
> commit 7cc341114c4476436b593500ef63fa0925f746ca
> Author: Father Chrysostomos <sprout@cpan.org>
> Date:   Mon Aug 20 23:58:59 2012 -0700
>
>     Fix substitution in substitution pattern
>
>     Guess what this prints:
>
>     s/${s|||, \""}Just another Perl hacker,
>     /anything/;
>     print
>
>     And look at this:
>
>     $ perl5.6.2 -e 's/${s|||;\""}/foo\n/; print;'
>     $ perl5.16.0 -e 's/${s|||;\""}/foo\n/; print;'
>     $ perl5.17.2 -e 's/${s|||;\""}/foo\n/; print;'
>     Bus error
>     $ ./miniperl -e 's/${s|||;\""}/foo\n/; print;'
>     Bus error
>
>     The first two gave no output, though they should have shown "foo".
>     And bleadperl now crashes.
>
>     When the lexer parses a quote-like operator, it begins by extracting
>     what is between the quotes.  It puts it in an SV stored in the varia-
>     ble PL_lex_stuff.  Then, if it is y/// or s///, it scans the replace-
>     ment part and puts it in an SV in PL_lex_repl.  When it finishes with
>     it, it sets PL_lex_repl to NULL.
>
>     Now, if you put s/// in the pattern part of s/// (or y in s), the
>     inner s/// will clobber PL_lex_repl with its own replacement string.
>     So, when the outer s/// finish parsing its pattern and wants its
>     replacement string.  If it is not there, it assumes it has already
>     parsed it (whether PL_lex_repl is set is how it remembers which half
>     of s/// it is parsing), and proceeds to feed bad code to the parser,
>     resulting in a bad op tree.
>
>     PL_lex_repl needs to be localised when a quote-like operator is
>     parsed.  Since localisation for quote-like operators happens in a sep-
>     arate yylex call (yylex calls sublex_push, which does it) after the
>     string delimiters are found, at which point PL_lex_repl has already
>     been set (clobbering the previous value), we change the delim-
>     iter-scanning code (scan_{str,trans,subst}) to use the new
>     PL_sublex_info.repl, which sublex_push now copies into PL_lex_repl
>     after localising the latter.
>
> M       perl.h
> M       t/base/lex.t
> M       toke.c
>
> commit 99bd9d90ba5640856c6c421a174ea6e2743a4b3a
> Author: Father Chrysostomos <sprout@cpan.org>
> Date:   Mon Aug 20 19:08:57 2012 -0700
>
>     Fix here-docs in nested quote-like operators
>
>     When the lexer encounters a quote-like operator, it extracts the con-
>     tents of the quotes and starts an inner lexing scope.
>
>     To handle eval "s//<<FOO/e\n...", the here-doc parser peeks into the
>     outer lexing scope’s PL_linestr (current line buffer, which inside an
>     eval contains the entire string of code being parsed; for quote-like
>     operators, that is where the contents of the quote are stored).  It
>     only does this inside a string eval.  When parsing a file, the input
>     comes in one line at a time.  So the here-doc parser steals lines from
>     the input stream for s//<<FOO/e outside an eval.
>
>     This approach fails in this case, as the peekee is the linestr for
>     s///, not for the eval:
>
>     eval ' s//"${\<<END}"/e; print
>     Just another Perl hacker,
>     END
>     'or die $@
>     __END__
>     Can't find string terminator "END" anywhere before EOF at (eval 1) line 1.
>
>     We also need to do this peeking stuff outside of a string eval, to
>     solve this:
>
>     s//"${\<<END}"
>     Just another Perl hacker,
>     END
>     /e; print
>     __END__
>     Can't find string terminator "END" anywhere before EOF at - line 1.
>
>     In the first example above, we need to look not in the parent lexing
>     scope’s linestr, but in that of the grandparent.
>
>     To solve the second example, we need to check whether the outer lexing
>     scope is a quote-like operator when we are not in an eval.
>
>     For parsing here-docs in quotes in eval, we currently store two
>     things, the former buffer pointer and the former linestr, in
>     PL_sublex_info.super_{bufp,lines}tr.  The values for upper scopes are
>     stashed away on the savestack somewhere.
>
>     We need to be able to iterate through the outer lexer scopes till we
>     find one with multiple lines.  Retrieving the information from the
>     savestack would be too complex and error-prone.
>
>     Since PL_linestr is an SV, we can abuse a couple of fields in it.
>     Upgrading it to PVNV gives it both IVX and NVX fields, which are big
>     enough to store pointers.
>
>     IVX is already used to hold an op number.  So for the innermost quoted
>     scope we still need to use PL_sublex_info.super_bufptr.  When entering
>     a new lexing scope (in sublex_push), we can localise the IVX field of
>     the outer PL_linestr SV and set it to what PL_sublex_info.super_bufptr
>     was in that scope.  SvIVX(linestr) is only used for an op number when
>     that linestr’s lexing scope is the innermost one.
>
>     PL_sublex_info.super_linestr can be eliminated and replaced with
>     SvNVX(PL_linestr).
>
> M       perl.h
> M       t/base/lex.t
> M       toke.c
>
> commit 62abd0d789549020431af511c02492ac374cf355
> Author: Father Chrysostomos <sprout@cpan.org>
> Date:   Mon Aug 20 18:06:41 2012 -0700
>
>     Don’t use strchr when scanning for newline after <<foo
>
>     The code that uses this is specifically for parsing <<foo inside a
>     quote-like operator inside a string eval.
>
>     This prints bar:
>
>     eval "s//<<foo/e
>     bar
>     foo
>     ";
>     print $_ || $@;
>
>     This prints Can't find string terminator blah blah blah:
>
>     eval "s//<<foo/e #\0
>     bar
>     foo
>     ";
>     print $_ || $@;
>
>     Nulls in comments are allowed elsewhere.  This prints bar:
>
>     eval "\$_ = <<foo #\0
>     bar
>     foo
>     ";
>     print $_ || $@;
>
>     The problem with strchr is that it is specifically for scanning null-
>     terminated strings.  If embedded nulls are permitted (and should be in
>     this case), memchr should be used.
>
>     This code was added by 0244c3a403.
>
> M       t/base/lex.t
> M       toke.c
>
> commit f35fca86375876704f26fde951b763c2bb533608
> Author: David Nicol <davidnicol@gmail.com>
> Date:   Mon Aug 20 16:22:15 2012 -0700
>
>     [perl #65838] perlop: remove caveat here-doc without newline
>
> M       pod/perlop.pod
>
> commit 043cc6c601c03ac1644cea26a20b39eb34957445
> Author: Father Chrysostomos <sprout@cpan.org>
> Date:   Mon Aug 20 14:55:09 2012 -0700
>
>     here-doc in quotes in multiline s//.../e in eval
>
>     When <<END occurs on the last line of a quote-like operator inside a
>     string eval ("${\<<END}"), it peeks into the linestr buffer of the
>     parent lexing scope (quote-like operators start a new lexing scope
>     with the linestr buffer containing what is between the quotes) to find
>     the body of the here-doc.  It modifies that buffer, stealing however
>     much it needs.
>
>     It was not leaving things in the consistent state that s///e checks
>     for when it finishes parsing the replacement (to make sure s//}+{/
>     doesn’t ‘work’).  Specifically, it was not shrinking the parent buf-
>     fer, so when PL_bufend was reset in sublex_done to the end of the par-
>     ent buffer, it was pointing to the wrong spot.
>
> M       perl.h
> M       t/base/lex.t
> M       toke.c
>
> commit 565b52dfca4375468541e36d53e8a2aba372c056
> Author: Father Chrysostomos <sprout@cpan.org>
> Date:   Mon Aug 20 12:57:29 2012 -0700
>
>     heredoc after "" in s/// in eval
>
>     This works fine:
>
>     eval ' s//<<END.""/e; print
>     Just another Perl hacker,
>     END
>     'or die $@
>     __END__
>     Just another Perl hacker,
>
>     But this doesn’t:
>
>     eval ' s//"$1".<<END/e; print
>     Just another Perl hacker,
>     END
>     'or die $@
>     __END__
>     Can't find string terminator "END" anywhere before EOF at (eval 1) line 1.
>
>     It fails because PL_sublex_info.super_buf*, added by commit
>     0244c3a403, are not localised, so, after the "", s/// sees its own
>     buffer pointers in those variables, instead of its parent string eval.
>
>     This used to happen only with s///e inside s///e, but that was because
>     here-docs would peek inside the parent linestr buffer only inside
>     s///e, and not other quote-like operators.  That was fixed in
>     recent commits.
>
>     Simply moving the assignment of super_buf* into sublex_push does solve
>     the bug for a simple "", as "" does sublex_start, but not sublex_push.
>     We do need to localise those variables for "${\''}", however.
>
> M       t/base/lex.t
> M       toke.c
>
> commit 458391bde9932f85da1dbbd0c2a3fdc3229bb00d
> Author: David Nicol <davidnicol@gmail.com>
> Date:   Sun Aug 19 23:05:40 2012 -0700
>
>     toke.c:S_scan_heredoc: Add comment about <<\FOO
>
> M       toke.c
>
> commit 112d12841320651481950c9079db85afcb9fd48f
> Author: Father Chrysostomos <sprout@cpan.org>
> Date:   Sun Aug 19 23:05:06 2012 -0700
>
>     [perl #65838] Allow here-doc with no final newline
>
>     When reading a line of input while scanning a here-doc, if the line
>     does not end in \n, then we know we have reached the end of input.  By
>     simply tacking a \n on to the buffer, we can meet the expectations of
>     the rest of the here-doc parsing code.  If it turns out the delimiter
>     is not found on that line, it does not matter that we modified it, as
>     we will croak anyway.
>
>     I had to add a new flag to lex_next_chunk.  Before commit f0e67a1d2,
>     S_scan_heredoc would read from the stream itself, without closing any
>     handles.  So the next time through yylex, the eof code would supply
>     the final implicit semicolon.
>
>     Since f0e67a1d2, S_scan_heredoc has been calling lex_next_chunk, which
>     takes care of reading from the stream an supply any final ; at eof.
>     The here-doc parser will just get confused as a result (<<';' would
>     work without any terminator).  The new flag tells lex_next_chunk not
>     to do anything at eof (not even closing handles and resetting the
>     parser state), but to return false and leave everything as it was.
>
> M       t/op/heredoc.t
> M       toke.c
>
> commit 3f29db7f4d1529dc55c67fc4dd36d1a5e9e85b0f
> Author: Father Chrysostomos <sprout@cpan.org>
> Date:   Sun Aug 19 22:41:08 2012 -0700
>
>     heredoc.t: Suppress deprecation warnings
>
> M       t/op/heredoc.t
>
> commit c49688b01872b2b0030cfe2c0d5cd22aa4e70b6b
> Author: Michael G. Schwern <schwern@pobox.com>
> Date:   Fri Jun 12 15:35:00 2009 -0700
>
>     Clean up heredoc.t
>
>     * Made the tests more independent, mostly by decoupling the use of
>       a single $string.  This will make it easier to expand on the test file
>       later.
>
>     * Replace ok( $foo eq $bar ) with is() for better diagnostics
>
>     * Remove unnecessary STDERR redirection.  fresh_perl does that for you.
>
>     * fix fresh_perl to honor progfile and stderr arguments passed in
>       rather than just blowing over them
>
> M       t/op/heredoc.t
> M       t/test.pl
>
> commit c8e9f72fa069d0087a99c77584ea59b938b08604
> Author: David Nicol <davidnicol@gmail.com>
> Date:   Sun Aug 19 22:16:13 2012 -0700
>
>     [perl #65838] Tests for here-docs without final newlines
>
>     and a few error cases
>
> M       MANIFEST
> A       t/op/heredoc.t
>
> commit 5097bf9b8df114433b321066b622851359bb857e
> Author: Father Chrysostomos <sprout@cpan.org>
> Date:   Sun Aug 19 02:45:38 2012 -0700
>
>     [perl #114040] Parse here-docs correctly in quoted constructs
>
>     When parsing code outside a string eval or quoted construct, the lexer
>     reads one line at a time into PL_linestr.
>
>     To parse a here-doc (hereinafter ‘deer hock’, because I spike lunar-
>     isms), the lexer has to pull extra lines out of the input stream ahead
>     of the current line, the value of PL_linestr remaining the same.
>
>     In a string eval, the entire piece of code being parsed is in
>     PL_linestr.
>
>     To parse a deer hock inside a string eval, the lexer has to fiddle
>     with the contents of PL_linestr, scanning for newline characters.
>
>     Originally, S_scan_heredoc just followed those two approaches.
>
>     When the lexer encounters a quoted construct, it looks for the end-
>     ing delimiter (reading from the input stream if necessary), puts the
>     entire quoted thing (minus quotes) in PL_linestr, and then starts an
>     inner lexing scope.
>
>     This means that deer hocks would not nest properly outside of a string
>     eval, because the body of the inner deer hock would be pulled out of
>     the input stream *after* the outer deer hock.
>
>     Larry Wall fixed that in commit fd2d095329 (Jan. 1997), so that this
>     would work:
>
>     <<foo
>     ${\<<bar}
>     ber
>     bar
>     foo
>
>     He did so by following the string eval approach (looking for the deer
>     hock body in PL_linestr) if the deer hock was inside another quoted
>     construct.
>
>     Later, commit a2c066523a (Mar. 1998) fixed this:
>
>     s/^not /substr(<<EOF, 0, 0)/e;
>       Ignored
>     EOF
>
>     by following the string eval approach only if the deer hock was inside
>     another non-backtick deer hock, not just any quoted construct.
>
>     The problem with the string eval approach inside a substitu-
>     tion is that it only looks in PL_linestr, which only contains
>     ‘substr(<<EOF, 0, 0)’ when the lexer is handling the second part of
>     the s/// operator.
>
>     But that unfortunately broke this:
>
>     s/^not /substr(<<EOF, 0, 0)
>       Ignored
>     EOF
>      /e;
>
>     and this:
>
>     print <<`EOF`;
>     ${\<<EOG}
>     echo stuff
>     EOG
>     EOF
>
>     reverting it to the pre-fd2d095329 behaviour, because the outer quoted
>     construct was treated as one line.
>
>     Later on, commit 0244c3a403 (Mar. 1999) fixed this:
>
>     eval 's/.../<<FOO/e
>       stuff
>     FOO
>     ';
>
>     which required a new approach not used before.  When the replacement
>     part of the s/// is being parsed, PL_linestr contains ‘<<FOO’.  The
>     body of the deer hock is not in the input stream (there isn’t one),
>     but in what was the previous value of PL_linestr before the lexer
>     encountered s///.
>
>     So 0244c3a403 fixed that by recording pointers into the outer string
>     and using them in S_scan_heredoc.  That commit, for some reason, was
>     written such that it applied only to substitutions, and not to other
>     quoted constructs.
>
>     It also failed to take interpolation into account, and did not record
>     the outer buffer position, but then tried to use it anyway, resulting
>     in crashes in both these cases:
>
>     eval 's/${ <<END }//';
>     eval 's//${ <<END }//';
>
>     It also failed to take multiline s///’s into account, resulting in
>     neither of these working, because it lost track of the current cursor,
>     leaving it at 'D' instead of the line break following it:
>
>     eval '
>     s//<<END
>     /e;
>     blah blah blah
>     END
>     ;1' or die $@;
>
>     eval '
>     s//<<END
>     blah blah blah
>     END
>     /e;
>     ;1' or die $@;
>
>     S_scan_heredoc currently positions the cursor (s) at the last charac-
>     ter of <<END if there is a line break on the same line.  There is an
>     s++ later on to account, but the code added by 0244c3a403 bypassed it.
>
>     So, in the end, deer hocks could only be nested in other quoted con-
>     structs if the outer construct was in a string eval and was not s///,
>     or was a non-backtick deer hock.
>
>     This commit hopefully fixes most of the problems. :-)
>
>     The s///-in-eval case is a little tricky.  We have to see whether the
>     deer hock label is on the last line of the s///.  If it is, we have
>     to peek into the outer buffer.  Otherwise, we have to treat it like a
>     string eval.
>
>     This commit does not deal with <<END inside the pattern of a multi-
>     line s/// or in nested quotes.
>
> M       t/comp/parser.t
> M       toke.c
>
> commit 5af08aedbe30651caf3374bc93f1aa7385b9531f
> Author: Father Chrysostomos <sprout@cpan.org>
> Date:   Sat Aug 18 23:54:02 2012 -0700
>
>     [perl #70836] Fix err msg for unterminated here-doc in eval
>
>     $ perl -e '<<foo'
>     Can't find string terminator "foo" anywhere before EOF at -e line 1.
>
>     $ perl -e 'eval "<<foo"; die $@'
>     Can't find string terminator "
>     foo" anywhere before EOF at (eval 1) line 1.
>
>     An internal implementation detail is leaking out.
>
>     When the lexer happens to have a multiline string in its line buffer
>     (in a string eval or quoted construct), it looks for "\nfoo" instead
>     of "foo".  It was passing that same string to the error-reporting code
>     (S_missingterm), resulting in that extraneous newline.
>
> M       t/lib/croak/toke
> M       toke.c
> -----------------------------------------------------------------------
>
> Summary of changes:
>  MANIFEST         |    1 +
>  configpm         |    5 ++-
>  perl.h           |    2 +-
>  pod/perlop.pod   |   12 ++--
>  t/base/lex.t     |   42 ++++++++++++++-
>  t/comp/parser.t  |   16 +++++-
>  t/lib/croak/toke |   10 ++++
>  t/op/heredoc.t   |   73 +++++++++++++++++++++++++
>  t/test.pl        |    4 +-
>  toke.c           |  160 +++++++++++++++++++++++++++++++++++++++++-------------
>  10 files changed, 276 insertions(+), 49 deletions(-)
>  create mode 100644 t/op/heredoc.t
>
> diff --git a/MANIFEST b/MANIFEST
> index 71e26a0..7484cd6 100644
> --- a/MANIFEST
> +++ b/MANIFEST
> @@ -5281,6 +5281,7 @@ t/op/hashassign.t         See if hash assignments work
>  t/op/hash-rt85026.t            See if hash iteration/deletion works
>  t/op/hash.t                    See if the complexity attackers are repelled
>  t/op/hashwarn.t                        See if warnings for bad hash assignments work
> +t/op/heredoc.t                 See if heredoc edge and corner cases work
>  t/op/inccode.t                 See if coderefs work in @INC
>  t/op/inccode-tie.t             See if tie to @INC works
>  t/op/incfilter.t               See if the source filters in coderef-in-@INC work
> diff --git a/configpm b/configpm
> index 6ea7419..250f5ce 100755
> --- a/configpm
> +++ b/configpm
> @@ -1135,7 +1135,10 @@ EOS
>  unshift(@INC,'lib');
>  unshift(@INC,'xlib/symbian') if $Opts{cross};
>  require $Config_PM;
> -require $Config_heavy;
> +    if (!eval {require $Config_heavy}) {
> +       open my $fh, "<", $Config_heavy;
> +        print STDERR while <$fh>;
> +    }
>  import Config;
>
>  die "$0: $Config_PM not valid"
> diff --git a/perl.h b/perl.h
> index 47f642f..14f9083 100644
> --- a/perl.h
> +++ b/perl.h
> @@ -3450,8 +3450,8 @@ struct _sublex_info {
>      U16 sub_inwhat;    /* "lex_inwhat" to use */
>      OP *sub_op;                /* "lex_op" to use */
>      char *super_bufptr;        /* PL_parser->bufptr that was */
> -    char *super_bufend;        /* PL_parser->bufend that was */
>      char *re_eval_start;/* start of "(?{..." text */
> +    SV *repl;          /* replacement of s/// or y/// */
>  };
>
>  #include "parser.h"
> diff --git a/pod/perlop.pod b/pod/perlop.pod
> index 983e141..d0cfd85 100644
> --- a/pod/perlop.pod
> +++ b/pod/perlop.pod
> @@ -2474,24 +2474,24 @@ you'll need to remove leading whitespace from each line manually:
>      FINIS
>
>  If you use a here-doc within a delimited construct, such as in C<s///eg>,
> -the quoted material must come on the lines following the final delimiter.
> -So instead of
> +the quoted material must still come on the line following the
> +C<<< <<FOO >>> marker, which means it may be inside the delimited
> +construct:
>
>      s/this/<<E . 'that'
>      the other
>      E
>       . 'more '/eg;
>
> -you have to write
> +It works this way as of Perl 5.18.  Historically, it was inconsistent, and
> +you would have to write
>
>      s/this/<<E . 'that'
>       . 'more '/eg;
>      the other
>      E
>
> -If the terminating identifier is on the last line of the program, you
> -must be sure there is a newline after it; otherwise, Perl will give the
> -warning B<Can't find string terminator "END" anywhere before EOF...>.
> +outside of string evals.
>
>  Additionally, quoting rules for the end-of-string identifier are
>  unrelated to Perl's quoting rules. C<q()>, C<qq()>, and the like are not
> diff --git a/t/base/lex.t b/t/base/lex.t
> index ce16ef1..aaa2aeb 100644
> --- a/t/base/lex.t
> +++ b/t/base/lex.t
> @@ -1,6 +1,6 @@
>  #!./perl
>
> -print "1..57\n";
> +print "1..63\n";
>
>  $x = 'x';
>
> @@ -273,3 +273,43 @@ $test++;
>  @a = (1,2,3);
>  print "not " unless($a[~~2] == 3);
>  print "ok 57\n";
> +
> +$_ = "";
> +eval 's/(?:)/"${\q||}".<<\END/e;
> +ok 58 - heredoc after "" in s/// in eval
> +END
> +';
> +print $_ || "not ok 58\n";
> +
> +$_ = "";
> +eval 's|(?:)|"${\<<\END}"
> +ok 59 - heredoc in "" in multiline s///e in eval
> +END
> +|e
> +';
> +print $_ || "not ok 59\n";
> +
> +$_ = "";
> +eval "s/(?:)/<<foo/e #\0
> +ok 60 - null on same line as heredoc in s/// in eval
> +foo
> +";
> +print $_ || "not ok 60\n";
> +
> +$_ = "";
> +eval ' s/(?:)/"${\<<END}"/e;
> +ok 61 - heredoc in "" in single-line s///e in eval
> +END
> +';
> +print $_ || "not ok 61\n";
> +
> +$_ = "";
> +s|(?:)|"${\<<END}"
> +ok 62 - heredoc in "" in multiline s///e outside eval
> +END
> +|e;
> +print $_ || "not ok 62\n";
> +
> +$_ = "not ok 63 - s/// in s/// pattern\n";
> +s/${s|||;\""}not //;
> +print;
> diff --git a/t/comp/parser.t b/t/comp/parser.t
> index 53b7afa..44fc982 100644
> --- a/t/comp/parser.t
> +++ b/t/comp/parser.t
> @@ -3,7 +3,7 @@
>  # Checks if the parser behaves correctly in edge cases
>  # (including weird syntax errors)
>
> -print "1..141\n";
> +print "1..144\n";
>
>  sub failed {
>      my ($got, $expected, $name) = @_;
> @@ -416,6 +416,20 @@ is $@, "", 'pod inside string in string eval';
>  }";
>  print "ok ", ++$test, " - pod inside string outside of string eval\n";
>
> +like "blah blah blah\n", qr/${\ <<END
> +blah blah blah
> +END
> + }/, 'here docs in multiline quoted construct';
> +like "blah blah blah\n", eval q|qr/${\ <<END
> +blah blah blah
> +END
> + }/|, 'here docs in multiline quoted construct in string eval';
> +
> +# Unterminated here-docs in subst in eval; used to crash
> +eval 's/${<<END}//';
> +eval 's//${<<END}/';
> +print "ok ", ++$test, " - unterminated here-docs in s/// in string eval\n";
> +
>  sub 'Hello'_he_said (_);
>  is prototype "Hello::_he_said", '_', 'initial tick in sub declaration';
>
> diff --git a/t/lib/croak/toke b/t/lib/croak/toke
> index 7ab5a4d..ddfaaeb 100644
> --- a/t/lib/croak/toke
> +++ b/t/lib/croak/toke
> @@ -1,4 +1,14 @@
>  __END__
> +# NAME Unterminated here-doc in string eval
> +eval "<<foo"; die $@
> +EXPECT
> +Can't find string terminator "foo" anywhere before EOF at (eval 1) line 1.
> +########
> +# NAME Unterminated here-doc in s/// string eval
> +eval "s//<<foo/e"; die $@
> +EXPECT
> +Can't find string terminator "foo" anywhere before EOF at (eval 1) line 1.
> +########
>  # NAME Missing name in "my sub"
>  my sub;
>  EXPECT
> diff --git a/t/op/heredoc.t b/t/op/heredoc.t
> new file mode 100644
> index 0000000..5f48828
> --- /dev/null
> +++ b/t/op/heredoc.t
> @@ -0,0 +1,73 @@
> +# tests for heredocs besides what is tested in base/lex.t
> +
> +BEGIN {
> +   chdir 't' if -d 't';
> +   @INC = '../lib';
> +   require './test.pl';
> +}
> +
> +use strict;
> +plan(tests => 7);
> +
> +
> +# heredoc without newline (#65838)
> +{
> +    my $string = <<'HEREDOC';
> +testing for 65838
> +HEREDOC
> +
> +    my $code = "<<'HEREDOC';\n${string}HEREDOC";  # HD w/o newline, in eval-string
> +    my $hd = eval $code or warn "$@ ---";
> +    is($hd, $string, "no terminating newline in string-eval");
> +}
> +
> +
> +# here-doc edge cases
> +{
> +    my $string = "testing for 65838";
> +
> +    fresh_perl_is(
> +        "print <<'HEREDOC';\n${string}\nHEREDOC",
> +        $string,
> +        {},
> +        "heredoc at EOF without trailing newline"
> +    );
> +
> +    fresh_perl_is(
> +        "print <<;\n$string\n",
> +        $string,
> +        { switches => ['-X'] },
> +        "blank-terminated heredoc at EOF"
> +    );
> +    fresh_perl_is(
> +        "print <<\n$string\n",
> +        $string,
> +        { switches => ['-X'] },
> +        "blank-terminated heredoc at EOF and no semicolon"
> +    );
> +}
> +
> +
> +# here-doc parse failures
> +{
> +    fresh_perl_like(
> +        "print <<HEREDOC;\nwibble\n HEREDOC",
> +        qr/find string terminator/,
> +        {},
> +        "string terminator must start at newline"
> +    );
> +
> +    fresh_perl_like(
> +        "print <<;\nno more newlines",
> +        qr/find string terminator/,
> +        { switches => ['-X'] },
> +        "empty string terminator still needs a newline"
> +    );
> +
> +    fresh_perl_like(
> +        "print <<ThisTerminatorIsLongerThanTheData;\nno more newlines",
> +        qr/find string terminator/,
> +        {},
> +        "long terminator fails correctly"
> +    );
> +}
> diff --git a/t/test.pl b/t/test.pl
> index 6d45076..bd5ff3b 100644
> --- a/t/test.pl
> +++ b/t/test.pl
> @@ -793,8 +793,8 @@ sub _fresh_perl {
>      # it feels like the least-worse thing is to assume that auto-vivification
>      # works. At least, this is only going to be a run-time failure, so won't
>      # affect tests using this file but not this function.
> -    $runperl_args->{progfile} = $tmpfile;
> -    $runperl_args->{stderr} = 1;
> +    $runperl_args->{progfile} ||= $tmpfile;
> +    $runperl_args->{stderr}     = 1 unless exists $runperl_args->{stderr};
>
>      open TEST, ">$tmpfile" or die "Cannot open $tmpfile: $!";
>
> diff --git a/toke.c b/toke.c
> index 787a3b6..86b8c7f 100644
> --- a/toke.c
> +++ b/toke.c
> @@ -1249,6 +1249,7 @@ buffer has reached the end of the input text.
>  */
>
>  #define LEX_FAKE_EOF 0x80000000
> +#define LEX_NO_TERM  0x40000000
>
>  bool
>  Perl_lex_next_chunk(pTHX_ U32 flags)
> @@ -1260,7 +1261,7 @@ Perl_lex_next_chunk(pTHX_ U32 flags)
>      STRLEN linestart_pos, last_uni_pos, last_lop_pos;
>      bool got_some_for_debugger = 0;
>      bool got_some;
> -    if (flags & ~(LEX_KEEP_PREVIOUS|LEX_FAKE_EOF))
> +    if (flags & ~(LEX_KEEP_PREVIOUS|LEX_FAKE_EOF|LEX_NO_TERM))
>         Perl_croak(aTHX_ "Lexing code internal error (%s)", "lex_next_chunk");
>      linestr = PL_parser->linestr;
>      buf = SvPVX(linestr);
> @@ -1291,6 +1292,8 @@ Perl_lex_next_chunk(pTHX_ U32 flags)
>      } else if (filter_gets(linestr, old_bufend_pos)) {
>         got_some = 1;
>         got_some_for_debugger = 1;
> +    } else if (flags & LEX_NO_TERM) {
> +       got_some = 0;
>      } else {
>         if (!SvPOK(linestr))   /* can get undefined by filter_gets */
>             sv_setpvs(linestr, "");
> @@ -2456,7 +2459,9 @@ S_sublex_push(pTHX)
>      SAVEI32(PL_lex_casemods);
>      SAVEI32(PL_lex_starts);
>      SAVEI8(PL_lex_state);
> +    SAVESPTR(PL_lex_repl);
>      SAVEPPTR(PL_sublex_info.re_eval_start);
> +    SAVEPPTR(PL_sublex_info.super_bufptr);
>      SAVEVPTR(PL_lex_inpat);
>      SAVEI16(PL_lex_inwhat);
>      SAVECOPLINE(PL_curcop);
> @@ -2471,8 +2476,30 @@ S_sublex_push(pTHX)
>      SAVEGENERICPV(PL_lex_brackstack);
>      SAVEGENERICPV(PL_lex_casestack);
>
> +    /* The here-doc parser needs to be able to peek into outer lexing
> +       scopes to find the body of the here-doc.  We use SvIVX(PL_linestr)
> +       to store the outer PL_bufptr and SvNVX to store the outer
> +       PL_linestr.  Since SvIVX already means something else, we use
> +       PL_sublex_info.super_bufptr for the innermost scope (the one we are
> +       now entering), and a localised SvIVX for outer scopes.
> +     */
> +    SvUPGRADE(PL_linestr, SVt_PVIV);
> +    /* A null super_bufptr means the outer lexing scope is not peekable,
> +       because it is a single line from an input stream. */
> +    SAVEIV(SvIVX(PL_linestr));
> +    SvIVX(PL_linestr) = PTR2IV(PL_sublex_info.super_bufptr);
> +    PL_sublex_info.super_bufptr =
> +       (SvTYPE(PL_linestr) < SVt_PVNV || !SvNVX(PL_linestr))
> +        && (PL_rsfp || PL_parser->filtered)
> +        ? NULL
> +        : PL_bufptr;
> +    SvUPGRADE(PL_lex_stuff, SVt_PVNV);
> +    SvNVX(PL_lex_stuff) = PTR2NV(PL_linestr);
> +
>      PL_linestr = PL_lex_stuff;
> +    PL_lex_repl = PL_sublex_info.repl;
>      PL_lex_stuff = NULL;
> +    PL_sublex_info.repl = NULL;
>      PL_sublex_info.re_eval_start = NULL;
>
>      PL_bufend = PL_bufptr = PL_oldbufptr = PL_oldoldbufptr = PL_linestart
> @@ -2529,6 +2556,8 @@ S_sublex_done(pTHX)
>      /* Is there a right-hand side to take care of? (s//RHS/ or tr//RHS/) */
>      assert(PL_lex_inwhat != OP_TRANSR);
>      if (PL_lex_repl && (PL_lex_inwhat == OP_SUBST || PL_lex_inwhat == OP_TRANS)) {
> +       SvUPGRADE(PL_lex_repl, SVt_PVNV);
> +       SvNVX(PL_lex_repl) = SvNVX(PL_linestr);
>         PL_linestr = PL_lex_repl;
>         PL_lex_inpat = 0;
>         PL_bufend = PL_bufptr = PL_oldbufptr = PL_oldoldbufptr = PL_linestart = SvPVX(PL_linestr);
> @@ -9315,8 +9344,6 @@ S_scan_subst(pTHX_ char *start)
>      if (es) {
>         SV * const repl = newSVpvs("");
>
> -       PL_sublex_info.super_bufptr = s;
> -       PL_sublex_info.super_bufend = PL_bufend;
>         PL_multi_end = 0;
>         pm->op_pmflags |= PMf_EVAL;
>         while (es-- > 0) {
> @@ -9326,13 +9353,13 @@ S_scan_subst(pTHX_ char *start)
>                 sv_catpvs(repl, "do ");
>         }
>         sv_catpvs(repl, "{");
> -       sv_catsv(repl, PL_lex_repl);
> -       if (strchr(SvPVX(PL_lex_repl), '#'))
> +       sv_catsv(repl, PL_sublex_info.repl);
> +       if (strchr(SvPVX(PL_sublex_info.repl), '#'))
>             sv_catpvs(repl, "\n");
>         sv_catpvs(repl, "}");
>         SvEVALED_on(repl);
> -       SvREFCNT_dec(PL_lex_repl);
> -       PL_lex_repl = repl;
> +       SvREFCNT_dec(PL_sublex_info.repl);
> +       PL_sublex_info.repl = repl;
>      }
>
>      PL_lex_op = (OP*)pm;
> @@ -9417,7 +9444,7 @@ S_scan_trans(pTHX_ char *start)
>      o->op_private &= ~OPpTRANS_ALL;
>      o->op_private |= del|squash|complement|
>        (DO_UTF8(PL_lex_stuff)? OPpTRANS_FROM_UTF : 0)|
> -      (DO_UTF8(PL_lex_repl) ? OPpTRANS_TO_UTF   : 0);
> +      (DO_UTF8(PL_sublex_info.repl) ? OPpTRANS_TO_UTF   : 0);
>
>      PL_lex_op = o;
>      pl_yylval.ival = nondestruct ? OP_TRANSR : OP_TRANS;
> @@ -9434,6 +9461,36 @@ S_scan_trans(pTHX_ char *start)
>      return s;
>  }
>
> +/* scan_heredoc
> +   Takes a pointer to the first < in <<FOO.
> +   Returns a pointer to the byte following <<FOO.
> +
> +   This function scans a heredoc, which involves different methods
> +   depending on whether we are in a string eval, quoted construct, etc.
> +   This is because PL_linestr could containing a single line of input, or
> +   a whole string being evalled, or the contents of the current quote-
> +   like operator.
> +
> +   The three methods are:
> +    - Steal lines from the input stream (stream)
> +    - Scan the heredoc in PL_linestr and remove it therefrom (linestr)
> +    - Peek at the PL_linestr of outer lexing scopes (peek)
> +
> +   They are used in these cases:
> +     file scope or filtered eval                       stream
> +     string eval                                       linestr
> +     multiline quoted construct                                linestr
> +     single-line quoted construct in file              stream
> +     single-line quoted construct in eval or quote     peek
> +
> +   Single-line also applies to heredocs that begin on the last line of a
> +   quote-like operator.
> +
> +   Peeking within a quote also involves falling back to the stream method,
> +   if the outer quote-like operators are all on one line (or the heredoc
> +   marker is on the last line).
> +*/
> +
>  STATIC char *
>  S_scan_heredoc(pTHX_ register char *s)
>  {
> @@ -9443,12 +9500,11 @@ S_scan_heredoc(pTHX_ register char *s)
>      I32 len;
>      SV *tmpstr;
>      char term;
> -    const char *found_newline;
> +    const char *found_newline = 0;
>      char *d;
>      char *e;
>      char *peek;
> -    const int outer = (PL_rsfp || PL_parser->filtered)
> -                  && !(PL_lex_inwhat == OP_SCALAR);
> +    const bool infile = PL_rsfp || PL_parser->filtered;
>  #ifdef PERL_MAD
>      I32 stuffstart = s - SvPVX(PL_linestr);
>      char *tstart;
> @@ -9459,10 +9515,9 @@ S_scan_heredoc(pTHX_ register char *s)
>      PERL_ARGS_ASSERT_SCAN_HEREDOC;
>
>      s += 2;
> -    d = PL_tokenbuf;
> +    d = PL_tokenbuf + 1;
>      e = PL_tokenbuf + sizeof PL_tokenbuf - 1;
> -    if (!outer)
> -       *d++ = '\n';
> +    *PL_tokenbuf = '\n';
>      peek = s;
>      while (SPACE_OR_TAB(*peek))
>         peek++;
> @@ -9477,6 +9532,7 @@ S_scan_heredoc(pTHX_ register char *s)
>      }
>      else {
>         if (*s == '\\')
> +            /* <<\FOO is equivalent to <<'FOO' */
>             s++, term = '\'';
>         else
>             term = '"';
> @@ -9495,8 +9551,8 @@ S_scan_heredoc(pTHX_ register char *s)
>
>  #ifdef PERL_MAD
>      if (PL_madskills) {
> -       tstart = PL_tokenbuf + !outer;
> -       PL_thisclose = newSVpvn(tstart, len - !outer);
> +       tstart = PL_tokenbuf + 1;
> +       PL_thisclose = newSVpvn(tstart, len - 1);
>         tstart = SvPVX(PL_linestr) + stuffstart;
>         PL_thisopen = newSVpvn(tstart, s - tstart);
>         stuffstart = s - SvPVX(PL_linestr);
> @@ -9526,10 +9582,8 @@ S_scan_heredoc(pTHX_ register char *s)
>         s = olds;
>      }
>  #endif
> -#ifdef PERL_MAD
> -    found_newline = 0;
> -#endif
> -    if ( outer || !(found_newline = (char*)memchr((void*)s, '\n', PL_bufend - s)) ) {
> +    if ((infile && !PL_lex_inwhat)
> +     || !(found_newline = (char*)memchr((void*)s, '\n', PL_bufend - s))) {
>          herewas = newSVpvn(s,PL_bufend-s);
>      }
>      else {
> @@ -9572,44 +9626,68 @@ S_scan_heredoc(pTHX_ register char *s)
>      CLINE;
>      PL_multi_start = CopLINE(PL_curcop);
>      PL_multi_open = PL_multi_close = '<';
> -    term = *PL_tokenbuf;
> -    if (PL_lex_inwhat == OP_SUBST && PL_in_eval && !PL_rsfp
> -     && !PL_parser->filtered) {
> -       char * const bufptr = PL_sublex_info.super_bufptr;
> -       char * const bufend = PL_sublex_info.super_bufend;
> +    if (PL_lex_inwhat && !found_newline) {
> +       /* Peek into the line buffer of the parent lexing scope, going up
> +          as many levels as necessary to find one with a newline after
> +          bufptr.  See the comments in sublex_push for how IVX and NVX
> +          are abused.
> +        */
> +       SV *linestr = NUM2PTR(SV *, SvNVX(PL_linestr));
> +       char *bufptr = PL_sublex_info.super_bufptr;
> +       char *bufend = SvEND(linestr);
>         char * const olds = s - SvCUR(herewas);
> -       s = strchr(bufptr, '\n');
> -       if (!s)
> -           s = bufend;
> +       char * const real_olds = s;
> +       if (!bufptr) {
> +           s = real_olds;
> +           goto streaming;
> +       }
> +       while (!(s = (char *)memchr((void *)bufptr, '\n', bufend-bufptr))){
> +           if (SvIVX(linestr)) {
> +               bufptr = INT2PTR(char *, SvIVX(linestr));
> +               linestr = NUM2PTR(SV *, SvNVX(linestr));
> +               bufend = SvEND(linestr);
> +           }
> +           else if (infile) {
> +               s = real_olds;
> +               goto streaming;
> +           }
> +           else {
> +               s = bufend;
> +               break;
> +           }
> +       }
>         d = s;
>         while (s < bufend &&
> -         (*s != term || memNE(s,PL_tokenbuf,len)) ) {
> +         (*s != '\n' || memNE(s,PL_tokenbuf,len)) ) {
>             if (*s++ == '\n')
>                 CopLINE_inc(PL_curcop);
>         }
>         if (s >= bufend) {
>             CopLINE_set(PL_curcop, (line_t)PL_multi_start);
> -           missingterm(PL_tokenbuf);
> +           missingterm(PL_tokenbuf + 1);
>         }
>         sv_setpvn(herewas,bufptr,d-bufptr+1);
>         sv_setpvn(tmpstr,d+1,s-d);
>         s += len - 1;
>         sv_catpvn(herewas,s,bufend-s);
>         Copy(SvPVX_const(herewas),bufptr,SvCUR(herewas) + 1,char);
> +       SvCUR_set(linestr,
> +                 bufptr-SvPVX_const(linestr)
> +                  + SvCUR(herewas));
>
>         s = olds;
>         goto retval;
>      }
> -    else if (!outer) {
> +    else if (!infile || found_newline) {
>         d = s;
>         while (s < PL_bufend &&
> -         (*s != term || memNE(s,PL_tokenbuf,len)) ) {
> +         (*s != '\n' || memNE(s,PL_tokenbuf,len)) ) {
>             if (*s++ == '\n')
>                 CopLINE_inc(PL_curcop);
>         }
>         if (s >= PL_bufend) {
>             CopLINE_set(PL_curcop, (line_t)PL_multi_start);
> -           missingterm(PL_tokenbuf);
> +           missingterm(PL_tokenbuf + 1);
>         }
>         sv_setpvn(tmpstr,d+1,s-d);
>  #ifdef PERL_MAD
> @@ -9632,6 +9710,9 @@ S_scan_heredoc(pTHX_ register char *s)
>      }
>      else
>         sv_setpvs(tmpstr,"");   /* avoid "uninitialized" warning */
> +  streaming:
> +    term = PL_tokenbuf[1];
> +    len--;
>      while (s >= PL_bufend) {   /* multiple line string? */
>  #ifdef PERL_MAD
>         if (PL_madskills) {
> @@ -9644,11 +9725,16 @@ S_scan_heredoc(pTHX_ register char *s)
>  #endif
>         PL_bufptr = s;
>         CopLINE_inc(PL_curcop);
> -       if (!outer || !lex_next_chunk(0)) {
> +       if (!lex_next_chunk(LEX_NO_TERM)
> +        && (!SvCUR(tmpstr) || SvEND(tmpstr)[-1] != '\n')) {
>             CopLINE_set(PL_curcop, (line_t)PL_multi_start);
> -           missingterm(PL_tokenbuf);
> +           missingterm(PL_tokenbuf + 1);
>         }
>         CopLINE_dec(PL_curcop);
> +       if (!SvCUR(PL_linestr) || PL_bufend[-1] != '\n') {
> +           lex_grow_linestr(SvCUR(PL_linestr) + 2);
> +           sv_catpvs(PL_linestr, "\n\0");
> +       }
>         s = PL_bufptr;
>  #ifdef PERL_MAD
>         stuffstart = s - SvPVX(PL_linestr);
> @@ -9671,7 +9757,7 @@ S_scan_heredoc(pTHX_ register char *s)
>         else if (PL_bufend - PL_linestart == 1 && PL_bufend[-1] == '\r')
>             PL_bufend[-1] = '\n';
>  #endif
> -       if (*s == term && memEQ(s,PL_tokenbuf,len)) {
> +       if (*s == term && memEQ(s,PL_tokenbuf + 1,len)) {
>             STRLEN off = PL_bufend - 1 - SvPVX_const(PL_linestr);
>             *(SvPVX(PL_linestr) + off ) = ' ';
>             lex_grow_linestr(SvCUR(PL_linestr) + SvCUR(herewas) + 1);
> @@ -10224,7 +10310,7 @@ S_scan_str(pTHX_ char *start, int keep_quoted, int keep_delims, int re_reparse)
>      */
>
>      if (PL_lex_stuff)
> -       PL_lex_repl = sv;
> +       PL_sublex_info.repl = sv;
>      else
>         PL_lex_stuff = sv;
>      return s;
>
> --
> Perl5 Master Repository



nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About