Front page | perl.perl5.porters |
Postings from July 2009
Re: Regex inside regex
Thread Previous
|
Thread Next
From:
Craig A. Berry
Date:
July 26, 2009 15:08
Subject:
Re: Regex inside regex
Message ID:
c9ab31fc0907261508r40fab8a6s5110a8e238704ab0@mail.gmail.com
On Sun, Jul 26, 2009 at 5:19 AM, Bram<p5p@perl.wizbit.be> wrote:
> Citeren "Craig A. Berry" <craig.a.berry@gmail.com>:
>
>> Initial testing is showing that my suggestion of saving and restoring
>> PL_regeol successfully prevents memory corruption, so that may be an
>> appropriate finger in the dike pending full reentrancy in the regex
>> engine. It could also be adapted to do an assert if the length has
>> changed during the sub-op. Not sure what's best.
>
> It prevents some memory corruption but not all.
>
> Summary of my perl5 (revision 5 version 11 subversion 0) configuration:
> Commit id: fc46f0f6acf947702bfc9ac32e05b1f4e8f4c720
>
> (which includes your change):
>
> #!/usr/bin/perl -l
>
> print $];
> *c = sub { "foo" =~ m/bar/ };
> sub parse_re {
> $_[0] =~ m{(?{ c() })};
> }
> parse_re();
> __END__
>
> Running it:
> 5.011000
>
> (no segfault => good)
>
>
> The second example:
>
> #!/usr/bin/perl -l
>
> *c = sub {
> "foo" =~ m/(f)(o)(o)/;
> };
>
> if ("abcdefghi" =~ m/(abc)(def)(?{ c() })(ghi)/) {
> print "match: $1, $2, $3 & $&";
> }
> else {
> print "no match";
> }
>
>
> *d = sub {
> "foo" =~ m/(b)(a)(r)/;
> };
>
> if ("abcdefghi" =~ m/(abc)(def)(?{ d() })(ghi)/) {
> print "match: $1, $2, $3 & $&";
> }
> else {
> print "no match";
> }
> __END__
>
> Running it with | cat -v
>
> match: abc, def, & abcdef
> match: abc, def, ^@^@^@ &
> abcdefghi^@^@^@^Q^@^@^@^@^@^@^@M-^PM-^XM-^?M-7^@^@^@^@^Q^@^@^@M-^@M-^XM-^?M-7M-^@M-^XM-^?[.....]
>
> In the first re: $3 is empty $& is incorrectly set
> In the second re: $3 and $& contain junk
>
> I do not know if that junk in some cases can results in segfaults/out of
> memory errors/panics/...
Thanks for pushing this further. Your tests show more specifically
what we already knew in general, which is that other aspects of regex
state (in addition to the end-of-string markers) are likely to get
hijacked when there is a regex inside of a regex. To see more detail
about what happens, do a -DDEBUGGING build and run with -Dr; you'll
see a lot more about what's going on, including an assert() that shows
it's PL_reglastparen that's been whacked. Here's the first part of
your second example:
$ perl -"Dr" baz.pl
Compiling REx "(f)(o)(o)"
rarest char f at 0
Final program:
1: OPEN1 (3)
3: EXACT <f> (5)
5: CLOSE1 (7)
7: OPEN2 (9)
9: EXACT <o> (11)
11: CLOSE2 (13)
13: OPEN3 (15)
15: EXACT <o> (17)
17: CLOSE3 (19)
19: END (0)
anchored "foo" at 0 (checking anchored) minlen 3
Compiling REx "(abc)(def)(?{ c() })(ghi)"
rarest char b at 1
Final program:
1: OPEN1 (3)
3: EXACT <abc> (5)
5: CLOSE1 (7)
7: OPEN2 (9)
9: EXACT <def> (11)
11: CLOSE2 (13)
13: EVAL (15)
15: OPEN3 (17)
17: EXACT <ghi> (19)
19: CLOSE3 (21)
21: END (0)
anchored "abcdefghi" at 0 (checking anchored) minlen 9 with eval
Enabling $` $& $' support.
EXECUTING...
Guessing start of match in sv for REx "(abc)(def)(?{ c() })(ghi)"
against "abcdefghi"
Found anchored substr "abcdefghi" at offset 0...
Guessed: match at offset 0
Matching REx "(abc)(def)(?{ c() })(ghi)" against "abcdefghi"
0 <> <abcdefghi> | 1:OPEN1(3)
0 <> <abcdefghi> | 3:EXACT <abc>(5)
3 <abc> <defghi> | 5:CLOSE1(7)
3 <abc> <defghi> | 7:OPEN2(9)
3 <abc> <defghi> | 9:EXACT <def>(11)
6 <abcdef> <ghi> | 11:CLOSE2(13)
6 <abcdef> <ghi> | 13:EVAL(15)
Guessing start of match in sv for REx "(f)(o)(o)" against "foo"
Found anchored substr "foo" at offset 0...
Guessed: match at offset 0
Matching REx "(f)(o)(o)" against "foo"
0 <> <foo> | 1:OPEN1(3)
0 <> <foo> | 3:EXACT <f>(5)
1 <f> <oo> | 5:CLOSE1(7)
1 <f> <oo> | 7:OPEN2(9)
1 <f> <oo> | 9:EXACT <o>(11)
2 <fo> <o> | 11:CLOSE2(13)
2 <fo> <o> | 13:OPEN3(15)
2 <fo> <o> | 15:EXACT <o>(17)
3 <foo> <> | 17:CLOSE3(19)
3 <foo> <> | 19:END(0)
Match successful!
12038 <foo%0%0%0%0%0%0%0%0%0%0%0%0%0000%0%0%0%0003%r%360%0%0%0%0%0>
<>| 15:OPEN3(17)
assert error: expression = PL_reglastparen == &rex->lastparen, in file
D0:[craig.blead]regexec.c;1 at line 2845
%SYSTEM-F-OPCCUS, opcode reserved to customer fault at
PC=FFFFFFFF84AB6A50, PS=0000001B
%TRACE-F-TRACEBACK, symbolic stack dump follows
image module routine line rel PC abs PC
DECC$SHR C$SIGNAL gsignal 28009 0000000000001180 FFFFFFFF84AB6A50
DECC$SHR C$ABORT abort 2832 0000000000000022 FFFFFFFF84662162
DECC$SHR C$ASSERT __assert 6478 0000000000000072 FFFFFFFF84EA8B02
DBGPERLSHR REGEXEC S_regmatch 87772 000000000001FB42 00000000004BFBB2
DBGPERLSHR REGEXEC S_regtry 87285 000000000001CDF0 00000000004BCE60
DBGPERLSHR REGEXEC Perl_regexec_flags
86981 000000000001AA50 00000000004BAAC0
DBGPERLSHR PP_HOT Perl_pp_match 84318 0000000000010A50 00000000003484E0
DBGPERLSHR DUMP Perl_runops_debug 93969 0000000000014181 00000000002BB121
DBGPERLSHR PERL S_run_body 85400 000000000000B4E0 0000000000073080
DBGPERLSHR PERL perl_run 85326 000000000000A960 0000000000072500
NDBGPERL PERLMAIN main 83080 0000000000000401 0000000000010401
NDBGPERL PERLMAIN __main 83029 0000000000000120 0000000000010120
PTHREAD$RTL THD_THREAD thdBase 244744 0000000000005BE2 FFFFFFFF84543282
PTHREAD$RTL THD_INIT pthread_main 244538 00000000000006B2 FFFFFFFF844FA6B2
0 FFFFFFFF80B9EE92 FFFFFFFF80B9EE92
DCL 0 000000000006BD22 000000007AE27D22
%TRACE-I-END, end of TRACE stack dump
Thread Previous
|
Thread Next