develooper Front page | perl.perl5.porters | Postings from July 2009

Re: Regex inside regex

Thread Previous | Thread Next
From:
Craig A. Berry
Date:
July 26, 2009 15:08
Subject:
Re: Regex inside regex
Message ID:
c9ab31fc0907261508r40fab8a6s5110a8e238704ab0@mail.gmail.com
On Sun, Jul 26, 2009 at 5:19 AM, Bram<p5p@perl.wizbit.be> wrote:
> Citeren "Craig A. Berry" <craig.a.berry@gmail.com>:
>
>> Initial testing is showing that my suggestion of saving and restoring
>> PL_regeol successfully prevents memory corruption, so that may be an
>> appropriate finger in the dike pending full reentrancy in the regex
>> engine.  It could also be adapted to do an assert if the length has
>> changed during the sub-op.  Not sure what's best.
>
> It prevents some memory corruption but not all.
>
> Summary of my perl5 (revision 5 version 11 subversion 0) configuration:
>  Commit id: fc46f0f6acf947702bfc9ac32e05b1f4e8f4c720
>
> (which includes your change):
>
> #!/usr/bin/perl -l
>
> print $];
> *c = sub { "foo" =~ m/bar/ };
> sub parse_re {
>  $_[0] =~ m{(?{ c() })};
> }
> parse_re();
> __END__
>
> Running it:
> 5.011000
>
> (no segfault => good)
>
>
> The second example:
>
> #!/usr/bin/perl -l
>
> *c = sub {
>  "foo" =~ m/(f)(o)(o)/;
> };
>
> if ("abcdefghi" =~ m/(abc)(def)(?{ c() })(ghi)/) {
>  print "match: $1, $2, $3 & $&";
> }
> else {
>  print "no match";
> }
>
>
> *d = sub {
>  "foo" =~ m/(b)(a)(r)/;
> };
>
> if ("abcdefghi" =~ m/(abc)(def)(?{ d() })(ghi)/) {
>  print "match: $1, $2, $3 & $&";
> }
> else {
>  print "no match";
> }
> __END__
>
> Running it with | cat -v
>
> match: abc, def,  & abcdef
> match: abc, def, ^@^@^@ &
> abcdefghi^@^@^@^Q^@^@^@^@^@^@^@M-^PM-^XM-^?M-7^@^@^@^@^Q^@^@^@M-^@M-^XM-^?M-7M-^@M-^XM-^?[.....]
>
> In the first re: $3 is empty $& is incorrectly set
> In the second re: $3 and $& contain junk
>
> I do not know if that junk in some cases can results in segfaults/out of
> memory errors/panics/...

Thanks for pushing this further.  Your tests show more specifically
what we already knew in general, which is that other aspects of regex
state (in addition to the end-of-string markers) are likely to get
hijacked when there is a regex inside of a regex.  To see more detail
about what happens, do a -DDEBUGGING build and run with -Dr; you'll
see a lot more about what's going on, including an assert() that shows
it's PL_reglastparen that's been whacked.  Here's the first part of
your second example:

$ perl -"Dr" baz.pl
Compiling REx "(f)(o)(o)"
rarest char f at 0
Final program:
   1: OPEN1 (3)
   3:   EXACT <f> (5)
   5: CLOSE1 (7)
   7: OPEN2 (9)
   9:   EXACT <o> (11)
  11: CLOSE2 (13)
  13: OPEN3 (15)
  15:   EXACT <o> (17)
  17: CLOSE3 (19)
  19: END (0)
anchored "foo" at 0 (checking anchored) minlen 3
Compiling REx "(abc)(def)(?{ c() })(ghi)"
rarest char b at 1
Final program:
   1: OPEN1 (3)
   3:   EXACT <abc> (5)
   5: CLOSE1 (7)
   7: OPEN2 (9)
   9:   EXACT <def> (11)
  11: CLOSE2 (13)
  13: EVAL (15)
  15: OPEN3 (17)
  17:   EXACT <ghi> (19)
  19: CLOSE3 (21)
  21: END (0)
anchored "abcdefghi" at 0 (checking anchored) minlen 9 with eval
Enabling $` $& $' support.

EXECUTING...

Guessing start of match in sv for REx "(abc)(def)(?{ c() })(ghi)"
against "abcdefghi"
Found anchored substr "abcdefghi" at offset 0...
Guessed: match at offset 0
Matching REx "(abc)(def)(?{ c() })(ghi)" against "abcdefghi"
   0 <> <abcdefghi>          |  1:OPEN1(3)
   0 <> <abcdefghi>          |  3:EXACT <abc>(5)
   3 <abc> <defghi>          |  5:CLOSE1(7)
   3 <abc> <defghi>          |  7:OPEN2(9)
   3 <abc> <defghi>          |  9:EXACT <def>(11)
   6 <abcdef> <ghi>          | 11:CLOSE2(13)
   6 <abcdef> <ghi>          | 13:EVAL(15)
Guessing start of match in sv for REx "(f)(o)(o)" against "foo"
Found anchored substr "foo" at offset 0...
Guessed: match at offset 0
Matching REx "(f)(o)(o)" against "foo"
   0 <> <foo>                |  1:OPEN1(3)
   0 <> <foo>                |  3:EXACT <f>(5)
   1 <f> <oo>                |  5:CLOSE1(7)
   1 <f> <oo>                |  7:OPEN2(9)
   1 <f> <oo>                |  9:EXACT <o>(11)
   2 <fo> <o>                | 11:CLOSE2(13)
   2 <fo> <o>                | 13:OPEN3(15)
   2 <fo> <o>                | 15:EXACT <o>(17)
   3 <foo> <>                | 17:CLOSE3(19)
   3 <foo> <>                | 19:END(0)
Match successful!
12038 <foo%0%0%0%0%0%0%0%0%0%0%0%0%0000%0%0%0%0003%r%360%0%0%0%0%0>
<>| 15:OPEN3(17)
assert error: expression = PL_reglastparen == &rex->lastparen, in file
D0:[craig.blead]regexec.c;1 at line 2845
%SYSTEM-F-OPCCUS, opcode reserved to customer fault at
PC=FFFFFFFF84AB6A50, PS=0000001B
%TRACE-F-TRACEBACK, symbolic stack dump follows
image     module    routine               line      rel PC           abs PC
DECC$SHR  C$SIGNAL  gsignal              28009 0000000000001180 FFFFFFFF84AB6A50
DECC$SHR  C$ABORT  abort                  2832 0000000000000022 FFFFFFFF84662162
DECC$SHR  C$ASSERT  __assert              6478 0000000000000072 FFFFFFFF84EA8B02
DBGPERLSHR  REGEXEC  S_regmatch          87772 000000000001FB42 00000000004BFBB2
DBGPERLSHR  REGEXEC  S_regtry            87285 000000000001CDF0 00000000004BCE60
DBGPERLSHR  REGEXEC  Perl_regexec_flags
                                         86981 000000000001AA50 00000000004BAAC0
DBGPERLSHR  PP_HOT  Perl_pp_match        84318 0000000000010A50 00000000003484E0
DBGPERLSHR  DUMP  Perl_runops_debug      93969 0000000000014181 00000000002BB121
DBGPERLSHR  PERL  S_run_body             85400 000000000000B4E0 0000000000073080
DBGPERLSHR  PERL  perl_run               85326 000000000000A960 0000000000072500
NDBGPERL  PERLMAIN  main                 83080 0000000000000401 0000000000010401
NDBGPERL  PERLMAIN  __main               83029 0000000000000120 0000000000010120
PTHREAD$RTL  THD_THREAD  thdBase        244744 0000000000005BE2 FFFFFFFF84543282
PTHREAD$RTL  THD_INIT  pthread_main     244538 00000000000006B2 FFFFFFFF844FA6B2
                                             0 FFFFFFFF80B9EE92 FFFFFFFF80B9EE92
DCL                                          0 000000000006BD22 000000007AE27D22
%TRACE-I-END, end of TRACE stack dump

Thread Previous | Thread Next


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About