develooper Front page | perl.perl5.porters | Postings from December 2004

[perl #33185] UTF-8 string substitution corrupts memory

Thread Next
sroy @ search-box . com
December 26, 2004 02:10
[perl #33185] UTF-8 string substitution corrupts memory
Message ID:
# New Ticket Created by 
# Please include the string:  [perl #33185]
# in the subject line of all future correspondence about this issue. 
# <URL: >

This is a bug report for perl from,
generated with the help of perlbug 1.35 running under perl v5.8.4.


The following test program corrupts perl's memory:

# Demonstrates memory corruption bug in 5.8.4.  The 'use Benchmark'
# line does not affect the corruption, but on my system it
# moves memory allocations around enough to ensure that 'perl -d'
# hangs, and that under 'gdb debugperl' it produces a segmentation fault.
# The exact string is also unimportant as long as it has characters
# that match the string class in the s///.  The one chosen here simply
# reliably reproduces the problem on my system.

use Encode;
use Benchmark;

$_ = decode_utf8('title: Ã¿Â€Ã¿Ã¿Â‚Â’Ã¿ÂÂÃ¿Ã¿ÂˆÃ¿ÂÃ¿Ã¿Â‚', 1);
$_ =~ s/[^[:print:]]/ /g;

Since it's a memory corruption, it may or may not crash when running
under perl, perl -d, or gdb.  To reliably observe the problem under gdb:

1. gdb debugperl (a perl compiled with debugging on)
2. b regexec.c:4373
3. r -d
4. c (from within the perl debugger)
5. c
6. c
7. crash (on my system)

The breakpoint stops the first time perl needs to check whether a
utf8 character is part of a string class.  At this point (step #5) everything
is ok.  By step #6 the value of PL_bostr (my_perl->Tbostr) is corrupted.
To see more details, instead of c at step #6 do:

6. fin
7. s 4

Now the debugger is sitting at the line that corrupts prog->startp.
Ultimately, this corruption leads to a seg fault at pp_hot.c:2151 when perl
tries to copy characters as part of the s/// operation.


In the middle of processing the regular expression, The regex library
demand-loads a bunch of stuff to create the swashes for the [:print:]
expression.  At the end of all that PL_bostr has a completely new value.
I have no idea whether the right fix is to move away from using PL_bostr
in the regex library in favor of some local variable, or to try and
save PL_bostr and restore it before any line that might change it.


Adding a 'use utf8' pragma at the top of the program seems to load everything
ahead of time and avoid the problem with the demand-load.  I have no real
confidence that it avoids other bugs of this sort, though.  Note that
if you add 'use utf8' to the test program, you'll want to get rid of the
decode_utf8 call since now perl interprets the string directly as utf8.


Site configuration information for perl v5.8.4:

Configured by Debian Project at Mon Oct 25 01:52:37 EST 2004.

Summary of my perl5 (revision 5 version 8 subversion 4) configuration:
    osname=linux, osvers=2.4.27-ti1211, archname=i386-linux-thread-multi
    uname='linux kosh 2.4.27-ti1211 #1 sun sep 19 18:17:45 est 2004 i686 gnulinux '
    config_args='-Dusethreads -Duselargefiles -Dccflags=-DDEBIAN -Dcccdlflags=-fPIC -Darchname=i386-linux -Dprefix=/usr -Dprivlib=/usr/share/perl/5.8 -Darchlib=/usr/lib/perl/5.8 -Dvendorprefix=/usr -Dvendorlib=/usr/share/perl5 -Dvendorarch=/usr/lib/perl5 -Dsiteprefix=/usr/local -Dsitelib=/usr/local/share/perl/5.8.4 -Dsitearch=/usr/local/lib/perl/5.8.4 -Dman1dir=/usr/share/man/man1 -Dman3dir=/usr/share/man/man3 -Dsiteman1dir=/usr/local/man/man1 -Dsiteman3dir=/usr/local/man/man3 -Dman1ext=1 -Dman3ext=3perl -Dpager=/usr/bin/sensible-pager -Uafs -Ud_csh -Uusesfio -Uusenm -Duseshrplib -Dd_dosuid -des'
    hint=recommended, useposix=true, d_sigaction=define
    usethreads=define use5005threads=undef useithreads=define usemultiplicity=define
    useperlio=define d_sfio=undef uselargefiles=define usesocks=undef
    use64bitint=undef use64bitall=undef uselongdouble=undef
    usemymalloc=n, bincompat5005=undef
    cc='cc', ccflags ='-D_REENTRANT -D_GNU_SOURCE -DTHREADS_HAVE_PIDS -DDEBIAN -fno-strict-aliasing -I/usr/local/include -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64',
    cppflags='-D_REENTRANT -D_GNU_SOURCE -DTHREADS_HAVE_PIDS -DDEBIAN -fno-strict-aliasing -I/usr/local/include'
    ccversion='', gccversion='3.3.5 (Debian 1:3.3.5-1)', gccosandvers=''
    intsize=4, longsize=4, ptrsize=4, doublesize=8, byteorder=1234
    d_longlong=define, longlongsize=8, d_longdbl=define, longdblsize=12
    ivtype='long', ivsize=4, nvtype='double', nvsize=8, Off_t='off_t', lseeksize=8
    alignbytes=4, prototype=define
  Linker and Libraries:
    ld='cc', ldflags =' -L/usr/local/lib'
    libpth=/usr/local/lib /lib /usr/lib
    libs=-lgdbm -lgdbm_compat -ldb -ldl -lm -lpthread -lc -lcrypt
    perllibs=-ldl -lm -lpthread -lc -lcrypt
    libc=/lib/, so=so, useshrplib=true,
  Dynamic Linking:
    dlsrc=dl_dlopen.xs, dlext=so, d_dlsymun=undef, ccdlflags='-Wl,-E'
    cccdlflags='-fPIC', lddlflags='-shared -L/usr/local/lib'

Locally applied patches:

@INC for perl v5.8.4:

Environment for perl v5.8.4:
    LANG (unset)
    LANGUAGE (unset)
    LD_LIBRARY_PATH (unset)
    LOGDIR (unset)
    PERL_BADLANG (unset)

Thread Next Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at | Group listing | About