develooper Front page | perl.perl5.porters | Postings from July 2012

[perl #114272] \V and \H skip too far and match incorrectly

From:
Nicholas Clark
Date:
July 26, 2012 05:34
Subject:
[perl #114272] \V and \H skip too far and match incorrectly
Message ID:
rt-3.6.HEAD-11172-1343306025-88.114272-75-0@perl.org
# New Ticket Created by  Nicholas Clark 
# Please include the string:  [perl #114272]
# in the subject line of all future correspondence about this issue. 
# <URL: https://rt.perl.org:443/rt3/Ticket/Display.html?id=114272 >


Building with -faddress-sanitizer (on OS X) happens to reveal a bug in perl
which is concealed on most other platforms due to how memory allocation is
rounded up.

The bug itself is twofold. Firstly a wrong result:

$ ./miniperl -le '$_ = "\xFFn"; print /\Vn/ ? "y" : "n"' 
n
$ ./miniperl -le '$_ = "\xFFn"; utf8::upgrade($_); print /\Vn/ ? "y" : "n"'
y


(it should be the same, independent of internal representation)

and secondly reading from unallocated memory:

$ valgrind ./miniperl -le '$_ = "\xFFn"; print /\Vn/ ? "y" : "n"'
==17838== Memcheck, a memory error detector
==17838== Copyright (C) 2002-2011, and GNU GPL'd, by Julian Seward et al.
==17838== Using Valgrind-3.7.0 and LibVEX; rerun with -h for copyright info
==17838== Command: ./miniperl -le $_\ =\ "\\xFFn";\ print\ /\\Vn/\ ?\ "y"\ :\ "n"
==17838== 
==17838== Invalid read of size 1
==17838==    at 0x68EADA: S_regmatch (regexec.c:5852)
==17838==    by 0x67FFEC: S_regtry (regexec.c:2729)
==17838==    by 0x67DF72: Perl_regexec_flags (regexec.c:2393)
==17838==    by 0x5632B5: Perl_pp_match (pp_hot.c:1378)
==17838==    by 0x508817: Perl_runops_debug (dump.c:2129)
==17838==    by 0x40B3FA: S_run_body (perl.c:2409)
==17838==    by 0x40A754: perl_run (perl.c:2326)
==17838==    by 0x4481BB: main (miniperlmain.c:119)
==17838==  Address 0x5c8d20d is 5 bytes after a block of size 8 alloc'd
==17838==    at 0x4C24B4A: malloc (vg_replace_malloc.c:263)
==17838==    by 0x509110: Perl_safesysmalloc (util.c:100)
==17838==    by 0x576CC0: Perl_sv_grow (sv.c:1520)
==17838==    by 0x594C39: Perl_sv_setsv_flags (sv.c:4239)
==17838==    by 0x55A8D6: Perl_pp_sassign (pp_hot.c:212)
==17838==    by 0x508817: Perl_runops_debug (dump.c:2129)
==17838==    by 0x40B3FA: S_run_body (perl.c:2409)
==17838==    by 0x40A754: perl_run (perl.c:2326)
==17838==    by 0x4481BB: main (miniperlmain.c:119)
==17838== 
n
==17838== 
==17838== HEAP SUMMARY:
==17838==     in use at exit: 107,704 bytes in 509 blocks
==17838==   total heap usage: 625 allocs, 116 frees, 133,452 bytes allocated
==17838== 
==17838== LEAK SUMMARY:
==17838==    definitely lost: 0 bytes in 0 blocks
==17838==    indirectly lost: 0 bytes in 0 blocks
==17838==      possibly lost: 0 bytes in 0 blocks
==17838==    still reachable: 107,704 bytes in 509 blocks
==17838==         suppressed: 0 bytes in 0 blocks
==17838== Rerun with --leak-check=full to see details of leaked memory
==17838== 
==17838== For counts of detected and suppressed errors, rerun with: -v
==17838== ERROR SUMMARY: 1 errors from 1 contexts (suppressed: 4 from 4)


however, to reveal that, I have to hack sv.h and handy.h to disable initial
over-allocation:

$ git diff handy.h sv.c
diff --git a/handy.h b/handy.h
index 2205742..629b540 100644
--- a/handy.h
+++ b/handy.h
@@ -1143,7 +1143,11 @@ PoisonWith(0xEF) for catching access to freed memory.
        (void)(sizeof(t) > 1 && ((MEM_SIZE)(n)+0.0) > MEM_SIZE_MAX/sizeof(t) && 
 #define MEM_WRAP_CHECK_(n,t) MEM_WRAP_CHECK(n,t),
 
+#if 1
 #define PERL_STRLEN_ROUNDUP(n) ((void)(((n) > MEM_SIZE_MAX - 2 * PERL_STRLEN_RO
+#else
+#define PERL_STRLEN_ROUNDUP(n) (n)
+#endif
 
 #else
 
diff --git a/sv.c b/sv.c
index 2d5444e..3261894 100644
--- a/sv.c
+++ b/sv.c
@@ -1506,10 +1506,10 @@ Perl_sv_grow(pTHX_ register SV *const sv, register STRLE
        s = SvPVX_mutable(sv);
 
     if (newlen > SvLEN(sv)) {          /* need more room? */
-       STRLEN minlen = SvCUR(sv);
+        /*     STRLEN minlen = SvCUR(sv);
        minlen += (minlen >> PERL_STRLEN_EXPAND_SHIFT) + 10;
        if (newlen < minlen)
-           newlen = minlen;
+        newlen = minlen;*/
 #ifndef Perl_safesysmalloc_size
        newlen = PERL_STRLEN_ROUNDUP(newlen);
 #endif


(changing the #if 1 to #if 0 demonstrates that the read beyond end bug is
also present for chr 254)

The error is in this code in S_regmatch:

#define CASE_CLASS(nAmE)                              \
        case nAmE:                                    \
	    if (locinput >= PL_regeol)                \
		sayNO;                                \
            if ((n=is_##nAmE(locinput,utf8_target))) {    \
                locinput += n;                        \
                nextchr = UCHARAT(locinput);          \
            } else                                    \
                sayNO;                                \
            break;                                    \
        case N##nAmE:                                 \
	    if (locinput >= PL_regeol)                \
		sayNO;                                \
            if ((n=is_##nAmE(locinput,utf8_target))) {    \
                sayNO;                                \
            } else {                                  \
                locinput += UTF8SKIP(locinput);       \
                nextchr = UCHARAT(locinput);          \
            }                                         \
            break

        CASE_CLASS(VERTWS);
        CASE_CLASS(HORIZWS);
#undef CASE_CLASS


which is using UTF8SKIP unconditionally, when (if I understand it correctly)
it *should* only be using them if utf8_target is true.

The code in question was added by commit e1d1eefb8c88e0dc in April 2007, so
it's been in everything since 5.010

Nicholas Clark




nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About