develooper Front page | perl.perl5.porters | Postings from March 2007

[perl #41831] [PATCH] Bug & fix: hang when using study + taint mode (perl 5.6.1, 5.8.x)

Thread Next
From:
quarl @ cs . berkeley . edu
Date:
March 15, 2007 02:01
Subject:
[perl #41831] [PATCH] Bug & fix: hang when using study + taint mode (perl 5.6.1, 5.8.x)
Message ID:
rt-3.6.HEAD-30201-1173934915-19.41831-75-0@perl.org
# New Ticket Created by  quarl@cs.berkeley.edu 
# Please include the string:  [perl #41831]
# in the subject line of all future correspondence about this issue. 
# <URL: http://rt.perl.org/rt3/Ticket/Display.html?id=41831 >



This is a bug report for perl from quarl@cs.berkeley.edu,
generated with the help of perlbug 1.35 running under perl v5.8.8.


-----------------------------------------------------------------
[Please enter your report here]

Dear Perl hackers,

Perl 5.6.1+ hangs when:
(1) Taint mode (-T) is used
(2) 'study' is used
(3) Regular expression match with an expression starting
    with "^", containing ".*" (or ".*?") matched by a
    newline, using the /m flag, but not the /s flag.

Test case:

# ---------- cut ----------

#!/usr/bin/perl -T

my $DATA = <<'END'
line1 is here
line2 is here
line3 is here
line4 is here

END
    ;

sub read_some_tainted_data() {
    return substr($ENV{HOME},0,1);
}

warn "tainting data";
$DATA .= read_some_tainted_data();

warn "studying data";
study $DATA;

warn "trying to match...";

## don't set $SIG{ALRM}, since we'd never get to a user-level handler as perl
## is stuck in a regexp infinite loop!

alarm(1);

if ($DATA =~ /^line2.*line4/m) {
    print "match\n";
} else {
    print "no match\n";
}

warn "match didn't hang!";


# ---------- cut ----------


The output of 'perl -T -Drv testcase' may be informative:
     
    Compiling REx `^line2.*line4'
    size 10 Got 84 bytes for offset annotations.
    first at 2
    rarest char 4 at 4
    rarest char 2 at 4
       1: MBOL(2)
       2: EXACT <line2>(5)
       5: STAR(7)
       6:   REG_ANY(0)
       7: EXACT <line4>(10)
      10: END(0)
    anchored "line2" at 0 floating "line4" at 5..2147483647 (checking floating) anchored(MBOL) minlen 10
    Offsets: [10]
            1[1] 2[5] 0[0] 0[0] 8[1] 7[1] 9[5] 0[0] 0[0] 14[0]
    Omitting $` $& $' support.
     
    EXECUTING...
     
    tainting data at a line 28.
    studying data at a line 33.
    trying to match... at a line 37.
    Guessing start of match, REx "^line2.*line4" against "line1 is here
    line2 is here
    line3 is here
    line4 is here
     
    /"...
    Found floating substr "line4" at offset 42...
    Found anchored substr "line2" at offset 14...
    Starting position does not contradict /^/m...
    Guessed: match at offset 14
    Matching REx "^line2.*line4" against "line2 is here
    line3 is here
    line4 is here
     
    /"
      Setting an EVAL scope, savestack=12
      14 <here
    > <line2 i>    |  1:  MBOL
      14 <here
    > <line2 i>    |  2:  EXACT <line2>
      19 <line2> < is her>    |  5:  STAR
                               REG_ANY can match 8 times out of 2147483647...
      Setting an EVAL scope, savestack=12
                                failed...
    Guessing start of match, REx "^line2.*line4" against "ine2 is here
    line3 is here
    line4 is here
     
    /"...
    Found floating substr "line4" at offset 27...
    Contradicts anchored substr "line2", trying floating at offset 28...
    Found floating substr "line4" at offset 27...
    Contradicts anchored substr "line2", trying floating at offset 28...
    Found floating substr "line4" at offset 27...
    Contradicts anchored substr "line2", trying floating at offset 28...
    Found floating substr "line4" at offset 27...
    Contradicts anchored substr "line2", trying floating at offset 28...

... continues infinitely until killed.


Gdb backtrace:
     
    #0  0x08151d5b in PerlIO_printf (f=0x8191d6c, fmt=0x8170151 "%s %s substr \"%s%.*s%s\"%s%s") at perlio.c:4918
    #1  0x081321f8 in Perl_re_intuit_start (prog=0x81a4f50, sv=0x819edb0, strpos=0x81a2677 "ine2 is here\nline3 is here\nline4 is here\n\n/",
        strend=0x81a26a2 "", flags=6, data=0x0) at regexec.c:581
    #2  0x08136aed in Perl_regexec_flags (prog=0x81a4f50, stringarg=0x81a2676 "line2 is here\nline3 is here\nline4 is here\n\n/", strend=0x81a26a2 "",
        strbeg=0x81a2668 "line1 is here\nline2 is here\nline3 is here\nline4 is here\n\n/", minend=0, sv=0x819edb0, data=0x0, flags=6) at regexec.c:1778
    #3  0x080d5941 in Perl_pp_match () at pp_hot.c:1340
    #4  0x080bc6b6 in Perl_runops_debug () at dump.c:1459
    #5  0x0806295e in S_run_body (oldscope=1) at perl.c:2366
    #6  0x080624e7 in perl_run (my_perl=0x8182008) at perl.c:2283
    #7  0x0805e8b1 in main (argc=4, argv=0xbfd44f04, env=0xbfd44f18) at perlmain.c:99


This bug appears in Perl versions 5.6.1 and 5.8.x,
OS-agnostic, arch-agnostic (I tested on 5.6.1, 5.8.4, 5.8.4,
5.8.5, 5.8.6, 5.8.8 on Linux, Solaris; IA-32, Sparc; but I'm
pretty sure OS/arch are irrelevant.)

The bug doesn't appear in Perl 5.6.0, 5.005, 5.004.

Specifically, the bug causes an unexpected DoS in
applications such as SpamAssassin, so I believe this is a
high-importance bug, but please change the severity if I
didn't set it appropriately.


I believe the bug was introduced in change 7407 by jhi on
2000-10-23 (while fixing another bug); not sure why it
lasted so long:

    http://public.activestate.com/cgi-bin/perlbrowse/p/7407
     
    Change 7407 by jhi@chaos on 2000/10/23 03:43:12
     
            Subject: Re: [ID 20001021.005] SEGV with regex match 
            From: Hugo <hv@crypt.compulink.co.uk>
            Date: Mon, 23 Oct 2000 00:47:22 +0100
            Message-Id: <200010222347.AAA09697@crypt.compulink.co.uk>
     
    Affected files ...
     
    ... //depot/perl/regexec.c#127 edit
    ... //depot/perl/t/op/pat.t#59 edit

Below is a patch to regexec.c that fixes the problem for me
while passing all existing test cases on my platform, plus a
new regression test t/op/taintstudy.t that now passes but
did not before.  (Two versions of regexec.c patch against
5.6.1 and 5.8.8; I imagine all intermediate versions are
similar.)


--- perl-5.6.1/regexec.c.orig	2007-03-14 20:03:38.000000000 -0700
+++ perl-5.6.1/regexec.c	2007-03-14 20:57:18.000000000 -0700
@@ -378,7 +378,12 @@
 	DEBUG_r(PerlIO_printf(Perl_debug_log, "String too short...\n"));
 	goto fail;
     }
-    strbeg = (sv && SvPOK(sv)) ? strend - SvCUR(sv) : strpos;
+    /* quarl 2007-03-14
+     *     Need to check SvPOKp rather than SvPOK in case of taint mode +
+     *     studied regexp.  Reference: 615e0643-ac86-4c31-9cd3-3526b2fc883c */
+    strbeg = (sv && (SvPOK(sv) || SvPOKp(sv))) ? strend - SvCUR(sv) : strpos;
+    /* DEBUG_r( PerlIO_printf(Perl_debug_log, "## sv_flags=%p, SvPOK=%d, SvPOKp=%d, strbeg = %p\n", sv ? sv->sv_flags : 0, (sv && SvPOK(sv) ? 1:0), (sv && SvPOKp(sv) ? 1:0), strbeg) ); */
+
     check = prog->check_substr;
     if (prog->reganch & ROPT_ANCH) {	/* Match at beg-of-str or after \n */
 	ml_anch = !( (prog->reganch & ROPT_ANCH_SINGLE)

--- perl-5.8.8/regexec.c.orig	2007-03-14 20:02:20.000000000 -0700
+++ perl-5.8.8/regexec.c	2007-03-14 20:57:24.000000000 -0700
@@ -454,7 +454,11 @@
 			      "String too short... [re_intuit_start]\n"));
 	goto fail;
     }
-    strbeg = (sv && SvPOK(sv)) ? strend - SvCUR(sv) : strpos;
+    /* quarl 2007-03-14
+     *     Need to check SvPOKp rather than SvPOK in case of taint mode +
+     *     studied regexp.  Reference: 615e0643-ac86-4c31-9cd3-3526b2fc883c */
+    strbeg = (sv && (SvPOK(sv) || SvPOKp(sv))) ? strend - SvCUR(sv) : strpos;
+    /* DEBUG_r( PerlIO_printf(Perl_debug_log, "## sv_flags=%p, SvPOK=%d, SvPOKp=%d, strbeg = %p\n", sv ? sv->sv_flags : 0, (sv && SvPOK(sv) ? 1:0), (sv && SvPOKp(sv) ? 1:0), strbeg) ); */
     PL_regeol = strend;
     if (do_utf8) {
 	if (!prog->check_utf8 && prog->check_substr)


--- /dev/null	2006-06-06 21:11:25.651033888 -0700
+++ perl-5.8.8/t/op/taintstudy.t	2007-03-14 20:51:25.000000000 -0700
@@ -0,0 +1,77 @@
+#!./perl -T
+
+BEGIN {
+    chdir 't' if -d 't';
+    @INC = '../lib';
+}
+
+$Ok_Level = 0;
+my $test = 1;
+sub ok ($;$) {
+    my($ok, $name) = @_;
+
+    local $_;
+
+    # You have to do it this way or VMS will get confused.
+    printf "%s $test%s\n", $ok   ? 'ok' : 'not ok',
+                           $name ? " - $name" : '';
+
+    printf "# Failed test at line %d\n", (caller($Ok_Level))[2] unless $ok;
+
+    $test++;
+    return $ok;
+}
+
+sub nok ($;$) {
+    my($nok, $name) = @_;
+    local $Ok_Level = 1;
+    ok( !$nok, $name );
+}
+
+use Config;
+my $have_alarm = $Config{d_alarm};
+sub alarm_ok (&) {
+    my $test = shift;
+
+    # quarl 2007-03-14
+    #     ***Don't*** set $SIG{ALRM}, because if we do, we'll never get to the
+    #     user-code handler, as the perl interpreter is stuck in an infinite loop
+    #     inside the regexp engine.  If we don't set it, the default is for
+    #     the OS to kill the process.
+
+    # local $SIG{ALRM} = sub { die "timeout\n" };
+
+    my $match;
+    eval {
+        alarm(2) if $have_alarm;
+        $match = $test->();
+        alarm(0) if $have_alarm;
+    };
+
+    local $Ok_Level = 1;
+    ok( !$match && !$@, 'testing studys that used to hang' );
+}
+
+print "1..1\n";
+
+
+my $DATA = <<'END'
+line1 is here
+line2 is here
+line3 is here
+line4 is here
+
+END
+    ;
+
+sub read_some_tainted_data() {
+    return substr($ENV{HOME},0,1);
+}
+
+$DATA .= read_some_tainted_data();
+
+study $DATA;
+
+# reference: 615e0643-ac86-4c31-9cd3-3526b2fc883c
+alarm_ok { $DATA =~ /^line2.*line4/m };
+

-- 
Regards, Karl



[Please do not change anything below this line]
-----------------------------------------------------------------
---
Flags:
    category=core
    severity=high
---
Site configuration information for perl v5.8.8:

Configured by Debian Project at Wed Dec  6 23:17:41 UTC 2006.

Summary of my perl5 (revision 5 version 8 subversion 8) configuration:
  Platform:
    osname=linux, osvers=2.6.18.3, archname=i486-linux-gnu-thread-multi
    uname='linux saens 2.6.18.3 #1 smp sat nov 25 13:39:52 est 2006 i686 gnulinux '
    config_args='-Dusethreads -Duselargefiles -Dccflags=-DDEBIAN -Dcccdlflags=-fPIC -Darchname=i486-linux-gnu -Dprefix=/usr -Dprivlib=/usr/share/perl/5.8 -Darchlib=/usr/lib/perl/5.8 -Dvendorprefix=/usr -Dvendorlib=/usr/share/perl5 -Dvendorarch=/usr/lib/perl5 -Dsiteprefix=/usr/local -Dsitelib=/usr/local/share/perl/5.8.8 -Dsitearch=/usr/local/lib/perl/5.8.8 -Dman1dir=/usr/share/man/man1 -Dman3dir=/usr/share/man/man3 -Dsiteman1dir=/usr/local/man/man1 -Dsiteman3dir=/usr/local/man/man3 -Dman1ext=1 -Dman3ext=3perl -Dpager=/usr/bin/sensible-pager -Uafs -Ud_csh -Uusesfio -Uusenm -Duseshrplib -Dlibperl=libperl.so.5.8.8 -Dd_dosuid -des'
    hint=recommended, useposix=true, d_sigaction=define
    usethreads=define use5005threads=undef useithreads=define usemultiplicity=define
    useperlio=define d_sfio=undef uselargefiles=define usesocks=undef
    use64bitint=undef use64bitall=undef uselongdouble=undef
    usemymalloc=n, bincompat5005=undef
  Compiler:
    cc='cc', ccflags ='-D_REENTRANT -D_GNU_SOURCE -DTHREADS_HAVE_PIDS -DDEBIAN -fno-strict-aliasing -pipe -I/usr/local/include -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64',
    optimize='-O2',
    cppflags='-D_REENTRANT -D_GNU_SOURCE -DTHREADS_HAVE_PIDS -DDEBIAN -fno-strict-aliasing -pipe -I/usr/local/include'
    ccversion='', gccversion='4.1.2 20061115 (prerelease) (Debian 4.1.1-20)', gccosandvers=''
    intsize=4, longsize=4, ptrsize=4, doublesize=8, byteorder=1234
    d_longlong=define, longlongsize=8, d_longdbl=define, longdblsize=12
    ivtype='long', ivsize=4, nvtype='double', nvsize=8, Off_t='off_t', lseeksize=8
    alignbytes=4, prototype=define
  Linker and Libraries:
    ld='cc', ldflags =' -L/usr/local/lib'
    libpth=/usr/local/lib /lib /usr/lib
    libs=-lgdbm -lgdbm_compat -ldb -ldl -lm -lpthread -lc -lcrypt
    perllibs=-ldl -lm -lpthread -lc -lcrypt
    libc=/lib/libc-2.3.6.so, so=so, useshrplib=true, libperl=libperl.so.5.8.8
    gnulibc_version='2.3.6'
  Dynamic Linking:
    dlsrc=dl_dlopen.xs, dlext=so, d_dlsymun=undef, ccdlflags='-Wl,-E'
    cccdlflags='-fPIC', lddlflags='-shared -L/usr/local/lib'

Locally applied patches:
    

---
@INC for perl v5.8.8:
    /home/quarl/lib/perl
    /etc/perl
    /usr/local/lib/perl/5.8.8
    /usr/local/share/perl/5.8.8
    /usr/lib/perl5
    /usr/share/perl5
    /usr/lib/perl/5.8
    /usr/share/perl/5.8
    /usr/local/lib/site_perl
    /usr/local/lib/perl/5.8.7
    /usr/local/share/perl/5.8.7
    .

---
Environment for perl v5.8.8:
    HOME=/home/quarl
    LANG=en_US.UTF-8
    LANGUAGE=en_US:en_GB:en
    LC_TIME=en_DK
    LD_LIBRARY_PATH=/usr/local/lib
    LOGDIR (unset)
    PATH=/home/quarl/proj/vulnivore/bin:/home/quarl/local/bin:/home/quarl/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/bin:/sbin:/usr/games
    PERLLIB=/home/quarl/lib/perl
    PERL_BADLANG (unset)
    SHELL=/bin/zsh


Thread Next


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About