develooper Front page | perl.perl5.porters | Postings from June 2013

Re: Perl 5.18 and Regexp::Grammars

Thread Previous | Thread Next
Dave Mitchell
June 28, 2013 00:22
Re: Perl 5.18 and Regexp::Grammars
Message ID:
On Fri, Jun 28, 2013 at 12:05:13AM +0100, Dave Mitchell wrote:
> On Fri, Jun 28, 2013 at 08:46:45AM +1000, Damian Conway wrote:
> > None of the tests in the current release segfault, because I turned
> > off the tests that did. If you run the previous version of the test
> > t/seplist_countedhash_M_.t (attached) under the current version (1.030)
> > of, it segfauts under this version of 5.18:
> Thanks, I can reproduce it now.

[ cutting Damian out of the CC for this subthread as he's probably got
better things to worry about ]

I can reduce the failure to the following code:


which SEGVs during compilation. 'git blame' blames Karl.
(which is not to say that there aren't other SEGing issues, which might
involve code blocks, but this first SEGV is innocent of them).

Karl, is this something you can deal with?


3018b823898645e44b8c37c70ac5c6302b031381 is the first bad commit
commit 3018b823898645e44b8c37c70ac5c6302b031381
Author: Karl Williamson <>
Date:   Mon Dec 17 21:37:40 2012 -0700

    Consolidate some regex OPS
    The regular rexpression operation POSIXA works on any of the (currently)
    16 posix classes (like \w and [:graph:]) under the regex modifier /a.
    This commit creates similar operations for the other modifiers: POSIXL
    (for /l), POSIXD (for /d), POSIXU (for /u), plus their complements.
    It causes these ops to be generated instead of the ALNUM, DIGIT,
    HORIZWS, SPACE, and VERTWS ops, as well as all their variants.  The net
    saving is 22 regnode types.
    The reason to do this is for maintenance.  As of this commit, there are
    now 22 fewer node types for which code has to be maintained.  The code
    for each variant was essentially the same logic, but on different
    operands.  It would be easy to make a change to one copy and forget to
    make the corresponding change in the others.  Indeed, this patch fixes
    [perl #114272] in which one copy was out of sync with others.
    This patch actually reduces the number of separate code paths to 5:
    POSIXA, NPOSIXA, POSIXL, POSIXD, and POSIXU.  The complements of the
    last 3 use the same code path as their non-complemented version, except
    that a variable is initialized differently.  The code then XORs this
    variable with its result to do the complementing or not.  Further, the
    POSIXD branch now just checks if the target string being matched is
    UTF-8 or not, and then jumps to either the POSIXU or POSIXA code
    respectively.  So, there are effectively only 4 cases that are coded:
    POSIXA, NPOSIXA, POSIXL, and POSIXU.  (POSIXA doesn't have to worry
    about UTF-8, while NPOSIXA does, hence these for efficiency are coded
    Removing all this code saves memory.  The output of the Linux size
    command shows that the perl executable was shrunk by 33K bytes on my
    platform compiled under -O0 (.7%) and by 18K bytes (1.3%) under -O2.
    The reason this patch was doable was previous work in numbering the
    POSIX classes, so that they could be indexed in arrays and bit
    positions.  This is a large patch; I didn't see how to break it into
    smaller components.
    I chose to make this code more efficient as opposed to saving even more
    memory.  Thus there is a separate loop that is jumped to after we know
    we have to load a swash; this just saves having to test if the swash is
    loaded each time through the loop.  I avoid loading the swash until
    absolutely necessary.  In places in the previous version of this code,
    the swash was loaded when the input was UTF-8, even if it wasn't yet
    needed (and might never be if the input didn't contain anything above
    Latin1); apparently to avoid the extra test per iteration.
    The Perl test suite runs slightly faster on my platform with this patch
    under -O0, and the speeds are indistinguishable under -O2.  This is in
    spite of these new POSIX regops being unknown to the regex optimizer
    (this will be addressed in future commits), and extra machine
    instructions being required for each character (the xor, and some
    shifting and masking).  I expect this is a result of better caching, and
    not loading swashes unless absolutely necessary.

[davem@robin bleed]$ valgrind ./miniperl ~/tmp/p2
==11524== Memcheck, a memory error detector
==11524== Copyright (C) 2002-2012, and GNU GPL'd, by Julian Seward et al.
==11524== Using Valgrind-3.8.1 and LibVEX; rerun with -h for copyright info
==11524== Command: ./miniperl /home/davem/tmp/p2
==11524== Invalid read of size 4
==11524==    at 0x4EC7E1: Perl_re_op_compile (regcomp.c:6461)
==11524==    by 0x42933B: Perl_pmruntime (op.c:4657)
==11524==    by 0x4BECE1: Perl_yyparse (perly.y:1330)
==11524==    by 0x40A56F: S_parse_body (perl.c:2313)
==11524==    by 0x408D2C: perl_parse (perl.c:1625)
==11524==    by 0x45059C: main (miniperlmain.c:111)
==11524==  Address 0x4 is not stack'd, malloc'd or (recently) free'd
==11524== Process terminating with default action of signal 11 (SIGSEGV)
==11524==  Access not within mapped region at address 0x4
==11524==    at 0x4EC7E1: Perl_re_op_compile (regcomp.c:6461)
==11524==    by 0x42933B: Perl_pmruntime (op.c:4657)
==11524==    by 0x4BECE1: Perl_yyparse (perly.y:1330)
==11524==    by 0x40A56F: S_parse_body (perl.c:2313)
==11524==    by 0x408D2C: perl_parse (perl.c:1625)
==11524==    by 0x45059C: main (miniperlmain.c:111)
==11524==  If you believe this happened as a result of a stack
==11524==  overflow in your program's main thread (unlikely but
==11524==  possible), you can try to increase the size of the
==11524==  main thread stack using the --main-stacksize= flag.
==11524==  The main thread stack size used in this run was 8388608.
==11524== HEAP SUMMARY:
==11524==     in use at exit: 199,574 bytes in 722 blocks
==11524==   total heap usage: 810 allocs, 88 frees, 209,456 bytes allocated
==11524== LEAK SUMMARY:
==11524==    definitely lost: 0 bytes in 0 blocks
==11524==    indirectly lost: 0 bytes in 0 blocks
==11524==      possibly lost: 0 bytes in 0 blocks
==11524==    still reachable: 199,574 bytes in 722 blocks
==11524==         suppressed: 0 bytes in 0 blocks
==11524== Rerun with --leak-check=full to see details of leaked memory
==11524== For counts of detected and suppressed errors, rerun with: -v
==11524== ERROR SUMMARY: 1 errors from 1 contexts (suppressed: 2 from 2)
Segmentation fault (core dumped)

A power surge on the Bridge is rapidly and correctly diagnosed as a faulty
capacitor by the highly-trained and competent engineering staff.
    -- Things That Never Happen in "Star Trek" #9

Thread Previous | Thread Next Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at | Group listing | About