On Thu, Jun 30, 2011 at 04:31:54AM -0700, Nicholas Clark wrote:
> # New Ticket Created by Nicholas Clark
> # Please include the string: [perl #93824]
> # in the subject line of all future correspondence about this issue.
> # <URL: http://rt.perl.org/rt3/Ticket/Display.html?id=93824 >
>
>
>
> This is a bug report for perl from nick@ccl4.org,
> generated with the help of perlbug 1.39 running under perl 5.15.0.
>
>
> -----------------------------------------------------------------
> [Please describe your issue here]
>
> The regex engine assumes that the scalar it's matching over can't change.
> (In at least some cases)
>
> If you use a (?{}) code block inside a regex to undefine the target scalar,
> um:
>
> $ valgrind ./perl -Ilib -le '$a = "ydydydyd"; warn $_ foreach $a =~ /[^x]d(?{undef $a})[^x]d/g'
> ==46337== Memcheck, a memory error detector
> ==46337== Copyright (C) 2002-2010, and GNU GPL'd, by Julian Seward et al.
> ==46337== Using Valgrind-3.6.1 and LibVEX; rerun with -h for copyright info
> ==46337== Command: ./perl -Ilib -le $a\ =\ "ydydydyd";\ warn\ $_\ foreach\ $a\ =~\ /[^x]d(?{undef\ $a})[^x]d/g
> ==46337==
> --46337-- ./perl:
> --46337-- dSYM directory is missing; consider using --dsymutil=yes
> ==46337== Invalid read of size 1
> ==46337== at 0x1001AB360: S_reginclass (in ./perl)
> ==46337== by 0x1001A174F: S_regmatch (in ./perl)
> ==46337== by 0x10019EADA: S_regtry (in ./perl)
> ==46337== by 0x1001935AA: Perl_regexec_flags (in ./perl)
> ==46337== by 0x1000EE85E: Perl_pp_match (in ./perl)
> ==46337== by 0x1000E5EE7: Perl_runops_standard (in ./perl)
> ==46337== by 0x10002B4BC: S_run_body (in ./perl)
> ==46337== by 0x10002ADEE: perl_run (in ./perl)
> ==46337== by 0x1000014D4: main (in ./perl)
> ==46337== Address 0x1006019b2 is 2 bytes inside a block of size 10 free'd
> ==46337== at 0x100280C7C: free (vg_replace_malloc.c:366)
> ==46337== by 0x1000AE8B5: Perl_safesysfree (in ./perl)
> ==46337== by 0x1001240F9: Perl_pp_undef (in ./perl)
> ==46337== by 0x1000E5EE7: Perl_runops_standard (in ./perl)
> ==46337== by 0x1001A4793: S_regmatch (in ./perl)
> ==46337== by 0x10019EADA: S_regtry (in ./perl)
> ==46337== by 0x1001935AA: Perl_regexec_flags (in ./perl)
> ==46337== by 0x1000EE85E: Perl_pp_match (in ./perl)
> ==46337== by 0x1000E5EE7: Perl_runops_standard (in ./perl)
> ==46337== by 0x10002B4BC: S_run_body (in ./perl)
> ==46337== by 0x10002ADEE: perl_run (in ./perl)
> ==46337== by 0x1000014D4: main (in ./perl)
> ==46337==
> ==46337== Invalid read of size 1
> ==46337== at 0x1001A17CB: S_regmatch (in ./perl)
> ==46337== by 0x10019EADA: S_regtry (in ./perl)
> ==46337== by 0x1001935AA: Perl_regexec_flags (in ./perl)
> ==46337== by 0x1000EE85E: Perl_pp_match (in ./perl)
> ==46337== by 0x1000E5EE7: Perl_runops_standard (in ./perl)
> ==46337== by 0x10002B4BC: S_run_body (in ./perl)
> ==46337== by 0x10002ADEE: perl_run (in ./perl)
> ==46337== by 0x1000014D4: main (in ./perl)
> ==46337== Address 0x1006019b3 is 3 bytes inside a block of size 10 free'd
> ==46337== at 0x100280C7C: free (vg_replace_malloc.c:366)
> ==46337== by 0x1000AE8B5: Perl_safesysfree (in ./perl)
> ==46337== by 0x1001240F9: Perl_pp_undef (in ./perl)
> ==46337== by 0x1000E5EE7: Perl_runops_standard (in ./perl)
> ==46337== by 0x1001A4793: S_regmatch (in ./perl)
> ==46337== by 0x10019EADA: S_regtry (in ./perl)
> ==46337== by 0x1001935AA: Perl_regexec_flags (in ./perl)
> ==46337== by 0x1000EE85E: Perl_pp_match (in ./perl)
> ==46337== by 0x1000E5EE7: Perl_runops_standard (in ./perl)
> ==46337== by 0x10002B4BC: S_run_body (in ./perl)
> ==46337== by 0x10002ADEE: perl_run (in ./perl)
> ==46337== by 0x1000014D4: main (in ./perl)
>
>
> That would be a bad thing :-(
>
> It's not a terrible thing, given that:
>
> For reasons of security, this construct is forbidden if the regular
> expression involves run-time interpolation of variables, unless the
> perilous C<use re 'eval'> pragma has been used (see L<re>), or the
> variables contain results of the C<qr//> operator (see
> L<perlop/"qr/STRINGE<sol>msixpodual">).
>
> My vague understanding of the engine is that there are mechanisms in place to
> copy the target string. Should these also be triggered if the pattern contains
> any code blocks? [or anything else that could have side effects *during* the
> match, if anything else exists]
>
I'd say that given that (?{ }) is still marked experimental, and
there's enough support to keep it this way, to leave it as is. (?{ })
has been around for more than a decade, and I don't think anyone has
reported this before - so it doesn't seem to be a huge issue. Copying
the string for each pattern containing (?{ }) (or (??{ })) seem like a
large price. Perhaps document it as "don't do it".
Abigail
Thread Previous