develooper Front page | perl.perl5.porters | Postings from May 2015

Re: [perl.git] branch blead, updated. v5.22.0-RC1-11-g3b50e65

Thread Previous | Thread Next
From:
demerphq
Date:
May 20, 2015 05:37
Subject:
Re: [perl.git] branch blead, updated. v5.22.0-RC1-11-g3b50e65
Message ID:
CANgJU+XhJLsxMA6ZPQJ-5MjSvm-mrAu_kpF5tuSX4M25cYo4vA@mail.gmail.com
On 20 May 2015 at 06:03, Karl Williamson <public@khwilliamson.com> wrote:
> In perl.git, the branch blead has been updated
> commit 6acea139a4492dc2f272bfc6de52ec8b6510da2c
> Author: Karl Williamson <khw@cpan.org>
> Date:   Tue May 19 14:20:20 2015 -0600
>
>     perldelta: Rmv reference to internal flag
>
>     SCF_DO_SUBSTR is a flag internal to the current implementation of the
>     regular expression optimizer.  There is no need to proclaim its
>     existence to the outside world, and is just extraneous noise.
>
>     I myself do not understand this flag, and I've spent more time looking
>     at this code than all but a few people likely to be reading this
>     perldelta.  If someone who does understand it could explain it to me, I
>     would add comments to the code (after the freeze) to aid future readers.
>
....
>  =item *
>
> -During the optimization phase of a regexp compilation, we no longer
> -recurse into C<GOSUB>/C<GOSTART> when the internal C<SCF_DO_SUBSTR> flag
> -is false. This prevents the optimizer from running "forever" and
> -exhausting all memory.
> +The optimization phase of a regexp compilation could run "forever" and
> +exhaust all memory under certain circumstances; now fixed.
>  L<[perl #122283]|https://rt.perl.org/Ticket/Display.html?id=122283>.

SCF_DO_SUBSTR is the flag that tells the regexp analyzer to track the
longest substring in the pattern. When it is not set the optimiser
keeps track of position, but does not keep track of the actual strings
seen,

So for instance /foo/ will be parsed with SCF_DO_SUBSTR being true,
but /foo/i will not.

Similarly, /foo.*(blah|erm|huh).*fnorble/ will have "foo" and
"fnorble"  parsed with SCF_DO_SUBSTR on, but while processing the
(...) it will be turned off because of the alternation (BRANCH).

At first I was not sure that removing this from the docs makes
sense,as I thought it is visible when using re debug under certain
circumstances, but I cant remember what they are so maybe its fine to
remove it.

I do however object to removing the GOSUB/GOSTART from this change
description, IMO as rewritten the only person who has any chance of
understanding what was fixed was the person who filed 122283.

Also, I am not sure you are being consistent with your rationale. Why
would a user care where in the compilation process it hung anymore
than which flags were involved?

I think perhaps, assuming perldelta is intended primarily for end
users, then it is better is to reword it as:

Use of GOSUB or GOSTART constructs in regexes, ( (?&0) or (?&FOO) or
similar constructs) could lead to exponential processing time and
space, manifesting as out of memory errors and/or very long
compilation times for a pattern has been fixed. You should now be able
to use these constructs freely without performance penalties.

Or something along those lines. On the other hand, if the perldelta is
a note to future perl maintainers then IMO this patch should be
reverted outright.

Yves

Thread Previous | Thread Next


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About