develooper Front page | perl.perl5.porters | Postings from July 2011

Re: [perl #93824] regex code blocks manipulating regex target cancause undefined behaviour

Thread Previous
From:
Abigail
Date:
July 29, 2011 09:09
Subject:
Re: [perl #93824] regex code blocks manipulating regex target cancause undefined behaviour
Message ID:
20110729160907.GD1411@almanda
On Thu, Jun 30, 2011 at 04:31:54AM -0700, Nicholas Clark wrote:
> # New Ticket Created by  Nicholas Clark 
> # Please include the string:  [perl #93824]
> # in the subject line of all future correspondence about this issue. 
> # <URL: http://rt.perl.org/rt3/Ticket/Display.html?id=93824 >
> 
> 
> 
> This is a bug report for perl from nick@ccl4.org,
> generated with the help of perlbug 1.39 running under perl 5.15.0.
> 
> 
> -----------------------------------------------------------------
> [Please describe your issue here]
> 
> The regex engine assumes that the scalar it's matching over can't change.
> (In at least some cases)
> 
> If you use a (?{}) code block inside a regex to undefine the target scalar,
> um:
> 
> $ valgrind ./perl -Ilib -le '$a = "ydydydyd"; warn $_ foreach $a =~ /[^x]d(?{undef $a})[^x]d/g'
> ==46337== Memcheck, a memory error detector
> ==46337== Copyright (C) 2002-2010, and GNU GPL'd, by Julian Seward et al.
> ==46337== Using Valgrind-3.6.1 and LibVEX; rerun with -h for copyright info
> ==46337== Command: ./perl -Ilib -le $a\ =\ "ydydydyd";\ warn\ $_\ foreach\ $a\ =~\ /[^x]d(?{undef\ $a})[^x]d/g
> ==46337== 
> --46337-- ./perl:
> --46337-- dSYM directory is missing; consider using --dsymutil=yes
> ==46337== Invalid read of size 1
> ==46337==    at 0x1001AB360: S_reginclass (in ./perl)
> ==46337==    by 0x1001A174F: S_regmatch (in ./perl)
> ==46337==    by 0x10019EADA: S_regtry (in ./perl)
> ==46337==    by 0x1001935AA: Perl_regexec_flags (in ./perl)
> ==46337==    by 0x1000EE85E: Perl_pp_match (in ./perl)
> ==46337==    by 0x1000E5EE7: Perl_runops_standard (in ./perl)
> ==46337==    by 0x10002B4BC: S_run_body (in ./perl)
> ==46337==    by 0x10002ADEE: perl_run (in ./perl)
> ==46337==    by 0x1000014D4: main (in ./perl)
> ==46337==  Address 0x1006019b2 is 2 bytes inside a block of size 10 free'd
> ==46337==    at 0x100280C7C: free (vg_replace_malloc.c:366)
> ==46337==    by 0x1000AE8B5: Perl_safesysfree (in ./perl)
> ==46337==    by 0x1001240F9: Perl_pp_undef (in ./perl)
> ==46337==    by 0x1000E5EE7: Perl_runops_standard (in ./perl)
> ==46337==    by 0x1001A4793: S_regmatch (in ./perl)
> ==46337==    by 0x10019EADA: S_regtry (in ./perl)
> ==46337==    by 0x1001935AA: Perl_regexec_flags (in ./perl)
> ==46337==    by 0x1000EE85E: Perl_pp_match (in ./perl)
> ==46337==    by 0x1000E5EE7: Perl_runops_standard (in ./perl)
> ==46337==    by 0x10002B4BC: S_run_body (in ./perl)
> ==46337==    by 0x10002ADEE: perl_run (in ./perl)
> ==46337==    by 0x1000014D4: main (in ./perl)
> ==46337== 
> ==46337== Invalid read of size 1
> ==46337==    at 0x1001A17CB: S_regmatch (in ./perl)
> ==46337==    by 0x10019EADA: S_regtry (in ./perl)
> ==46337==    by 0x1001935AA: Perl_regexec_flags (in ./perl)
> ==46337==    by 0x1000EE85E: Perl_pp_match (in ./perl)
> ==46337==    by 0x1000E5EE7: Perl_runops_standard (in ./perl)
> ==46337==    by 0x10002B4BC: S_run_body (in ./perl)
> ==46337==    by 0x10002ADEE: perl_run (in ./perl)
> ==46337==    by 0x1000014D4: main (in ./perl)
> ==46337==  Address 0x1006019b3 is 3 bytes inside a block of size 10 free'd
> ==46337==    at 0x100280C7C: free (vg_replace_malloc.c:366)
> ==46337==    by 0x1000AE8B5: Perl_safesysfree (in ./perl)
> ==46337==    by 0x1001240F9: Perl_pp_undef (in ./perl)
> ==46337==    by 0x1000E5EE7: Perl_runops_standard (in ./perl)
> ==46337==    by 0x1001A4793: S_regmatch (in ./perl)
> ==46337==    by 0x10019EADA: S_regtry (in ./perl)
> ==46337==    by 0x1001935AA: Perl_regexec_flags (in ./perl)
> ==46337==    by 0x1000EE85E: Perl_pp_match (in ./perl)
> ==46337==    by 0x1000E5EE7: Perl_runops_standard (in ./perl)
> ==46337==    by 0x10002B4BC: S_run_body (in ./perl)
> ==46337==    by 0x10002ADEE: perl_run (in ./perl)
> ==46337==    by 0x1000014D4: main (in ./perl)
> 
> 
> That would be a bad thing :-(
> 
> It's not a terrible thing, given that:
> 
>     For reasons of security, this construct is forbidden if the regular
>     expression involves run-time interpolation of variables, unless the
>     perilous C<use re 'eval'> pragma has been used (see L<re>), or the
>     variables contain results of the C<qr//> operator (see
>     L<perlop/"qr/STRINGE<sol>msixpodual">).
> 
> My vague understanding of the engine is that there are mechanisms in place to
> copy the target string. Should these also be triggered if the pattern contains
> any code blocks? [or anything else that could have side effects *during* the
> match, if anything else exists]
> 


I'd say that given that (?{ }) is still marked experimental, and
there's enough support to keep it this way, to leave it as is. (?{ })
has been around for more than a decade, and I don't think anyone has
reported this before - so it doesn't seem to be a huge issue. Copying
the string for each pattern containing (?{ }) (or (??{ })) seem like a
large price. Perhaps document it as "don't do it".



Abigail

Thread Previous


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About