develooper Front page | perl.perl5.porters | Postings from February 2020

Re: 5.30.2 soon

Thread Previous | Thread Next
Nicholas Clark
February 6, 2020 17:47
Re: 5.30.2 soon
Message ID:
On Thu, Feb 06, 2020 at 01:41:21PM +0000, Steve Hay via perl5-porters wrote:

> (This renders nicely when viewed in Firefox and some other browsers, but is
> also readable as a text file.)
> If there are any more changes you think should be included that match the
> criteria for back-porting set out in perlpolicy.pod then please let me know.

At work we upgraded to 5.30.1 and hit the bug that was fixed by

commit 3b2e5620ed4a6b341f97ffd1d4b6466cc2c4bc5b
Author: Karl Williamson <>
Date:   Fri Aug 23 12:40:24 2019 -0600

    PATCH: [perl #134329] Use after free in regcomp.c

    A compiled regex is composed of nodes, forming a linked list, with
    normally a maximum of 16 bits used to specify the offset of the next
    link.  For patterns that require more space than this, the nodes that
    jump around are replaced with ones that have wider offsets.  Most nodes
    are unaffected, as they just contain the offset of the next node, and
    that number is always small.  The jump nodes are the ones affected.

    When compiling a pattern, the 16 bit mechanism is used, until it
    overflows, at which point the pattern is recompiled with the long jumps

    When I rewrote the compiler last year to make it generally one pass, I
    noticed a lot of the cases where a node was added didn't check if the
    result overflowed (the function that does this returns FALSE in that
    case).  I presumed the prior authors knew better, and did not change
    things, except to put in a bogus value in the link (offset) field that
    should cause a crash if it were used.  That's what's happening in this

    But seeing this example, it's clear that the return value should be
    checked every time, because you can reach the limit at any time.  This
    commit changes to do that, and to require the function's return value to
    not be ignored, to guard against future changes.

    My guess is that the reason it generally worked when there were multiple
    passes is that the first pass didn't do anything except count space, and
    that at some point before the end of the pass the return value did get
    checked, so by the time the nodes were allocated for real, it knew
    enough to use the long jumps.

 MANIFEST                 |   1 +
 embed.fnc                |   4 +-
 proto.h                  |   8 +++-
 regcomp.c                | 109 ++++++++++++++++++++++++++++++++++-------------
 t/re/bigfuzzy_not_utf8.t | Bin 0 -> 36399 bytes
 5 files changed, 88 insertions(+), 34 deletions(-)

I think that that meets the criteria in perlpolicy.pod [without me editing
that file to fit :-)] but I haven't figured out if it really fits (or works).

I've mitigated our problem (super-large machine generated regex, now 33%
smaller*) so that we don't *need* this (currently), but it might be useful
to go in, before others hit it, who don't have such an easy work around.

Nicholas Clark

* machine generated regex used as part of the search subsystem on and friends, to lovingly massage the output from
  "Elasticschrott" into something more useful. And yes, currently we're hiring.
  But sysadmins, on site in Vienna, so maybe not for anyone reading this list.

Thread Previous | Thread Next Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at | Group listing | About