Front page | perl.perl5.porters |
Postings from August 2012
[perl #114356] REGEXPs have massive reference counts
Thread Next
From:
Nicholas Clark
Date:
August 1, 2012 02:49
Subject:
[perl #114356] REGEXPs have massive reference counts
Message ID:
rt-3.6.HEAD-11172-1343814578-509.114356-75-0@perl.org
# New Ticket Created by Nicholas Clark
# Please include the string: [perl #114356]
# in the subject line of all future correspondence about this issue.
# <URL: https://rt.perl.org:443/rt3/Ticket/Display.html?id=114356 >
At some point since perl 5.16.0, if one builds blead with -DDEBUGGING
and sets PERL_DESTRUCT_LEVEL=2 in the environment, mktables has started
taking "forever"* to run. Moreover, it seems that it "hangs" in global
destruction.
I've added a --timeout option to the bisect runner to make it easy to work
out when. It finds that it's this commit, merged as part of Dave's fix
of code blocks:
commit 9f141731d83a1ac6294a5580a5b11ff41490309a
Author: David Mitchell <davem@iabyn.com>
Date: Fri Nov 4 10:12:20 2011 +0000
Move bulk of pp_regcomp() into re_op_compile()
When called, pp_regcomp() is presented with a list of SVs on the stack.
Previously, it would perform (amongst other things):
* overloading those SVs;
* concatenating them;
* detection of bare /$qr/;
* detection of unchanged pattern;
optionally followed by a call to the built-in or an external regexp
compiler.
Since we want to avoid premature concatenation (so that we can handle
/$runtime(?{...})/), move all these activities from pp_regcomp() into
re_op_compile().
This makes re_op_compile() a bit cumbersome, with a large arg list,
but I haven't found any way of only moving only a subset of the above.
Note that a side-effect of this is that qr-overloading now works for all
regex compilations, not just those reached via pp_regcomp(); in particular
this now invokes the qr method rather than the "" method if available:
/(??{ $overloaded_object })/
which seems crazy, but I checked, and it's true. It seems that after this
commit some SVs of type SVt_REGEXP have massively inflated reference counts,
and this results in Perl_sv_clean_all() being called tens of thousands of
times. Running mktables under gdb in 9f141731d83a1ac6^ I see this:
Creating Perl synonyms
Writing tables
Making pod file
Making test script
Updating 'mktables.lst'
Breakpoint 4, Perl_sv_clean_all () at sv.c:628
628 PL_in_clean_all = TRUE;
(gdb) finish
Run till exit from #0 Perl_sv_clean_all () at sv.c:628
0x0000000000407d78 in perl_destruct (my_perl=0x992010) at perl.c:1072
1072 while (sv_clean_all() > 2)
Value returned is $8 = 59215
(gdb) c
Continuing.
Breakpoint 4, Perl_sv_clean_all () at sv.c:628
628 PL_in_clean_all = TRUE;
(gdb) call S_visit(&Perl_sv_dump, SVt_REGEXP, 255)
$9 = 0
(gdb)
Running it at 9f141731d83a1ac6 I get 472 lines of output (attached), which
start like this:
Creating Perl synonyms
Writing tables
Making pod file
Making test script
Updating 'mktables.lst'
Breakpoint 4, Perl_sv_clean_all () at sv.c:628
628 PL_in_clean_all = TRUE;
(gdb) finish
Run till exit from #0 Perl_sv_clean_all () at sv.c:628
0x0000000000407d78 in perl_destruct (my_perl=0x992010) at perl.c:1072
1072 while (sv_clean_all() > 2)
Value returned is $24 = 59353
(gdb) c
Continuing.
Breakpoint 4, Perl_sv_clean_all () at sv.c:628
628 PL_in_clean_all = TRUE;
(gdb) call S_visit(&Perl_sv_dump, SVt_REGEXP, 255)
SV = REGEXP(0x5678df0) at 0x57f3428
REFCNT = 20
FLAGS = (POK,FAKE,BREAK,pPOK)
PV = 0x55f0580 "(?^aax:^ ( .{27} # Don't look before the\n # indent.\n \\ * # Don't look in leading\n # blanks past the indent\n [^ ] .* # Find the right-most\n (?: # acceptable break:\n [ \\s = ] # space or equal\n | - (?! [.0-9] ) # or non-unary minus.\n ) # $1 includes the character\n ))"\0
CUR = 604
LEN = 608
EXTFLAGS = 0x2000288 (PMf_EXTENDED,ANCH_BOL,COPY_DONE)
INTFLAGS = 0x4
NPARENS = 1
LASTPAREN = 1
LASTCLOSEPAREN = 1
MINLEN = 29
MINLENRET = 29
GOFS = 0
PRE_PREFIX = 7
SEEN_EVALS = 0
SUBLEN = 68
SUBBEG = 0x56277a0 " XPerlSpace (Perl extension). \\s, including beyond AS"
ENGINE = 0x6f6a00
MOTHER_RE = 0x0
PAREN_NAMES = 0x0
SUBSTRS = 0x5634930
PPRIVATE = 0x563e4c0
OFFS = 0x55db2f0
QR_ANONCV = 0x0
Note that that regular expression seems to correspond to this code:
# Otherwise fold at an acceptable break char closest to
# the max length. Look at just the maximal initial
# segment of the line
my $segment = substr($line[$i], 0, $max - 1);
if ($segment =~
/^ ( .{$hanging_indent} # Don't look before the
# indent.
\ * # Don't look in leading
# blanks past the indent
[^ ] .* # Find the right-most
(?: # acceptable break:
[ \s = ] # space or equal
| - (?! [.0-9] ) # or non-unary minus.
) # $1 includes the character
)/x)
which is reached 8011 times, and matches 8001 times.
However, two very similar patterns seems to be present later, differing only
in .{27} being .{29} and .{0} and having different SUBLEN, SUBBEG, SUBSTRS,
PPRIVATE and OFFS, with reference counts of 1186 and 6799.
20 + 1186 + 6799 is 8005. Is that suspicious?
I assume that a reference count is long longer being dropped when it should
be, but it's not obvious to me how the logic works, and hence whether
anything I might suggest adds more bugs than it solves.
Nicholas Clark
* seems actually to be a factor of 12 longer
Thread Next
-
[perl #114356] REGEXPs have massive reference counts
by Nicholas Clark