On Thu Sep 11 06:07:03 2014, mmartinec wrote: > Got it down to this small test program: > > #!/usr/bin/perl > > use strict; > use re 'taint'; > > my(@body) = ( > "<mailto:xxxx.xxxx\@outlook.com>", > "A\x{B9}ker\x{E8}eva xxxx.xxxx\@outlook.com \x{201D}", > ); > > for (@body) { > s{ <? (?<!mailto:) \b ( [a-z0-9.]+ \@ \S+ ) \b > (?: > | \s{1,10} (?!phone) [a-z]{2,11} : ) }{ }xgi; > } > > > perl 5.20.{0,1} : > Assertion failed: ((STRLEN)rx->sublen >= (STRLEN)((s - rx->subbeg) + > i)), function Perl_reg_numbered_buff_fetch, file regcomp.c, line 7455. > Abort trap > I think what’s happening is that the kludge to localise $1, etc. is executed when the regexp is in an inconsistent state. rx->subbeg is referring to the string from the previous match ('<mailto:xxxx.xxxx@outlook.com>'), but the offsets for $1 extend beyond the end of the 30-character string: (gdb) p rx->offs[1] $8 = { start = 12, end = 33, start_tmp = 12 } A watchpoint on rx->offs shows that it gets swapped out here in regexec.c: 2706 swap = prog->offs; 2707 /* do we need a save destructor here for eval dies? */ 2708 Newxz(prog->offs, (prog->nparens + 1), regexp_paren_pair); 2709 DEBUG_BUFFERS_r(PerlIO_printf(Perl_debug_log, 2710 "rex=0x%"UVxf" saving offs: orig=0x%"UVxf" new=0x%"UVxf"\n" when the backtrace is like this: #0 Perl_regexec_flags (my_perl=0x100803200, rx=0x10082fdf8, stringarg=0x10060b658 "A¹kerèeva xxxx.xxxx@outlook.com ”", strend=0x10060b67d "", strbeg=0x10060b658 "A¹kerèeva xxxx.xxxx@outlook.com ”", minend=0, sv=0x1008063e8, data=0x0, flags=1) at regexec.c:2709 #1 0x0000000100247f3f in Perl_pp_subst (my_perl=0x100803200) at pp_hot.c:2120 #2 0x00000001001b847c in Perl_runops_debug (my_perl=0x100803200) at dump.c:2231 #3 0x000000010000a8ea in S_run_body (my_perl=0x100803200, oldscope=1) at perl.c:2416 #4 0x0000000100009905 in perl_run (my_perl=0x100803200) at perl.c:2339 #5 0x0000000100072698 in main (argc=3, argv=0x7fff5fbffa78, env=0x7fff5fbffa98) at miniperlmain.c:120 So the ordering of some of this stuff needs to be rethought. A git bisect points me to this commit: commit 44a2ac759eaf811ea851bdf9177a51bf9b95b5ce Author: Yves Orton <demerphq@gmail.com> Date: Fri Dec 29 22:45:51 2006 +0100 Re: [PATCH] Change implementation of %+ to use a proper tied hash interface and add support for %- Message-ID: <9b18b3110612291245q792fe91cu69422d2b81bb4f0b@mail.gmail.com> But I think it’s a false positive. -- Father Chrysostomos --- via perlbug: queue: perl5 status: open https://rt.perl.org/Ticket/Display.html?id=122747Thread Previous | Thread Next