develooper Front page | perl.perl5.porters | Postings from August 2019

Re: [perl #134390] Matching fancy Unicode regex against an ASCIIstring leaks memory

Thread Previous | Thread Next
From:
demerphq
Date:
August 30, 2019 15:42
Subject:
Re: [perl #134390] Matching fancy Unicode regex against an ASCIIstring leaks memory
Message ID:
CANgJU+VDG5oT9u6nS-EBjGb9++=s-+fkHX-puMWixcYnRRahEg@mail.gmail.com
I can easily imagine that SV's constructed during compilation arent
cleaned up in this scenario.

Yves

On Fri, 30 Aug 2019 at 17:35, Karl Williamson via RT
<perlbug-followup@perl.org> wrote:
>
> On Fri, 30 Aug 2019 04:52:16 -0700, choroba@matfyz.cz wrote:
> > This is a bug report for perl from choroba@matfyz.cz,
> > generated with the help of perlbug 1.41 running under perl 5.31.4.
> >
> >
> > -----------------------------------------------------------------
> > [Please describe your issue here]
> >
> > If a regex contains a fancy Unicode character and the string being
> > matched doesn't have the UTF8 flag, matching leaks memory.
> >
> > "a" =~ /\N{U+2129}/ while 1; # Don't forget to kill the script before
> > it eats all the memory!
> >
> > Using an upgraded string doesn't leak at all:
> >
> > utf8::upgrade(my $x = 'a');
> > $x =~ /\N{U+2129}/ while 1;
> >
> > See https://www.perlmonks.org/?node_id=11105281 for the original
> > report (with
> > a bit longer examples) and discussion.
> >
> > Ch.
> >
> > [Please do not change anything below this line]
> > -----------------------------------------------------------------
> > ---
> > Flags:
> >      category=core
> >      severity=high
> > ---
> > Site configuration information for perl 5.31.4:
> >
> > Configured by choroba at Mon Aug 26 16:15:05 CEST 2019.
> >
> > Summary of my perl5 (revision 5 version 31 subversion 4)
> > configuration:
> >    Commit id: 6e404ab585deadc1c32d50513f13b50ae395c00d
> >    Platform:
> >      osname=linux
> >      osvers=4.12.14-lp151.28.13-default
> >      archname=x86_64-linux-thread-multi
> >      uname='linux lenonovo 4.12.14-lp151.28.13-default #1 smp wed aug
> > 7 07:20:16 utc 2019 (0c09ad2) x86_64 x86_64 x86_64 gnulinux '
> >      config_args='-rdes -Dusethreads -Dpthread -Dprefix=~/blead
> > -Dusedevel'
> >      hint=recommended
> >      useposix=true
> >      d_sigaction=define
> >      useithreads=define
> >      usemultiplicity=define
> >      use64bitint=define
> >      use64bitall=define
> >      uselongdouble=undef
> >      usemymalloc=n
> >      default_inc_excludes_dot=define
> >      bincompat5005=undef
> >    Compiler:
> >      cc='cc'
> >      ccflags ='-D_REENTRANT -D_GNU_SOURCE -fwrapv -fno-strict-aliasing
> > -pipe -fstack-protector-strong -I/usr/local/include
> > -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64 -D_FORTIFY_SOURCE=2'
> >      optimize='-O2'
> >      cppflags='-D_REENTRANT -D_GNU_SOURCE -fwrapv -fno-strict-aliasing
> > -pipe -fstack-protector-strong -I/usr/local/include'
> >      ccversion=''
> >      gccversion='7.4.1 20190424 [gcc-7-branch revision 270538]'
> >      gccosandvers=''
> >      intsize=4
> >      longsize=8
> >      ptrsize=8
> >      doublesize=8
> >      byteorder=12345678
> >      doublekind=3
> >      d_longlong=define
> >      longlongsize=8
> >      d_longdbl=define
> >      longdblsize=16
> >      longdblkind=3
> >      ivtype='long'
> >      ivsize=8
> >      nvtype='double'
> >      nvsize=8
> >      Off_t='off_t'
> >      lseeksize=8
> >      alignbytes=8
> >      prototype=define
> >    Linker and Libraries:
> >      ld='cc'
> >      ldflags =' -fstack-protector-strong -L/usr/local/lib'
> >      libpth=/usr/local/lib /usr/lib64/gcc/x86_64-suse-linux/7/include-
> > fixed /usr/lib64/gcc/x86_64-suse-linux/7/../../../../x86_64-suse-
> > linux/lib /usr/lib /lib/../lib64 /usr/lib/../lib64 /lib /lib64
> > /usr/lib64 /usr/local/lib64
> >      libs=-lpthread -lgdbm -ldl -lm -lcrypt -lutil -lc -lgdbm_compat
> >      perllibs=-lpthread -ldl -lm -lcrypt -lutil -lc
> >      libc=libc-2.26.so
> >      so=so
> >      useshrplib=false
> >      libperl=libperl.a
> >      gnulibc_version='2.26'
> >    Dynamic Linking:
> >      dlsrc=dl_dlopen.xs
> >      dlext=so
> >      d_dlsymun=undef
> >      ccdlflags='-Wl,-E'
> >      cccdlflags='-fPIC'
> >      lddlflags='-shared -O2 -L/usr/local/lib -fstack-protector-strong'
> >
> >
> > ---
> > @INC for perl 5.31.4:
> >      /home/choroba/blead/lib/perl5/site_perl/5.31.4/x86_64-linux-
> > thread-multi
> >      /home/choroba/blead/lib/perl5/site_perl/5.31.4
> >      /home/choroba/blead/lib/perl5/5.31.4/x86_64-linux-thread-multi
> >      /home/choroba/blead/lib/perl5/5.31.4
> >
> > ---
> > Environment for perl 5.31.4:
> >      HOME=/home/choroba
> >      LANG=en_US.utf8
> >      LANGUAGE (unset)
> >      LC_CTYPE=en_US.UTF-8
> >      LD_LIBRARY_PATH (unset)
> >      LOGDIR (unset)
> >      PATH=/home/choroba/bin:/usr/local/bin:/usr/bin:/bin:/usr/bin/X11:/home/choroba/perl5/bin:/home/choroba/opensource/worktime/bin:.
> >      PERL_BADLANG (unset)
> >      SHELL=/bin/bash
>
> What is happening here is that in re_intuit_start() at line 922 in regexec.c, it determines there is no possible match because you need the target string to be in UTF-8 to match the character in the pattern.  But something is not returning memory when re_intuit_start returns failure.  There are other instances of this failure return in re_intuit_start, and I suspect they leak as well.
>
> I'm thinking someone who knows about the regex memory allocation can answer this without much effort, so I'm deferring to someone like that to step forward
> --
> Karl Williamson
>
> ---
> via perlbug:  queue: perl5 status: new
> https://rt.perl.org/Ticket/Display.html?id=134390



-- 
perl -Mre=debug -e "/just|another|perl|hacker/"

Thread Previous | Thread Next


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About