Front page | perl.perl5.porters |
Postings from February 2020
Re: 5.30.2 soon
Thread Previous
|
Thread Next
From:
Nicholas Clark
Date:
February 6, 2020 17:47
Subject:
Re: 5.30.2 soon
Message ID:
20200206171332.q6otch3gxsks5mwq@ceres.etla.org
On Thu, Feb 06, 2020 at 01:41:21PM +0000, Steve Hay via perl5-porters wrote:
> https://github.com/Perl/perl5/blob/maint-votes/votes-5.30.xml
>
> (This renders nicely when viewed in Firefox and some other browsers, but is
> also readable as a text file.)
>
> If there are any more changes you think should be included that match the
> criteria for back-porting set out in perlpolicy.pod then please let me know.
At work we upgraded to 5.30.1 and hit the bug that was fixed by
commit 3b2e5620ed4a6b341f97ffd1d4b6466cc2c4bc5b
Author: Karl Williamson <khw@cpan.org>
Date: Fri Aug 23 12:40:24 2019 -0600
PATCH: [perl #134329] Use after free in regcomp.c
A compiled regex is composed of nodes, forming a linked list, with
normally a maximum of 16 bits used to specify the offset of the next
link. For patterns that require more space than this, the nodes that
jump around are replaced with ones that have wider offsets. Most nodes
are unaffected, as they just contain the offset of the next node, and
that number is always small. The jump nodes are the ones affected.
When compiling a pattern, the 16 bit mechanism is used, until it
overflows, at which point the pattern is recompiled with the long jumps
instead.
When I rewrote the compiler last year to make it generally one pass, I
noticed a lot of the cases where a node was added didn't check if the
result overflowed (the function that does this returns FALSE in that
case). I presumed the prior authors knew better, and did not change
things, except to put in a bogus value in the link (offset) field that
should cause a crash if it were used. That's what's happening in this
ticket.
But seeing this example, it's clear that the return value should be
checked every time, because you can reach the limit at any time. This
commit changes to do that, and to require the function's return value to
not be ignored, to guard against future changes.
My guess is that the reason it generally worked when there were multiple
passes is that the first pass didn't do anything except count space, and
that at some point before the end of the pass the return value did get
checked, so by the time the nodes were allocated for real, it knew
enough to use the long jumps.
MANIFEST | 1 +
embed.fnc | 4 +-
proto.h | 8 +++-
regcomp.c | 109 ++++++++++++++++++++++++++++++++++-------------
t/re/bigfuzzy_not_utf8.t | Bin 0 -> 36399 bytes
5 files changed, 88 insertions(+), 34 deletions(-)
I think that that meets the criteria in perlpolicy.pod [without me editing
that file to fit :-)] but I haven't figured out if it really fits (or works).
I've mitigated our problem (super-large machine generated regex, now 33%
smaller*) so that we don't *need* this (currently), but it might be useful
to go in, before others hit it, who don't have such an easy work around.
Nicholas Clark
* machine generated regex used as part of the search subsystem on
https://geizhals.eu/ and friends, to lovingly massage the output from
"Elasticschrott" into something more useful. And yes, currently we're hiring.
But sysadmins, on site in Vienna, so maybe not for anyone reading this list.
Thread Previous
|
Thread Next