develooper Front page | perl.perl5.porters | Postings from August 2019

[perl #134390] Matching fancy Unicode regex against an ASCII stringleaks memory

Thread Previous | Thread Next
From:
Karl Williamson via RT
Date:
August 30, 2019 15:35
Subject:
[perl #134390] Matching fancy Unicode regex against an ASCII stringleaks memory
Message ID:
rt-4.0.24-16054-1567179346-517.134390-15-0@perl.org
On Fri, 30 Aug 2019 04:52:16 -0700, choroba@matfyz.cz wrote:
> This is a bug report for perl from choroba@matfyz.cz,
> generated with the help of perlbug 1.41 running under perl 5.31.4.
> 
> 
> -----------------------------------------------------------------
> [Please describe your issue here]
> 
> If a regex contains a fancy Unicode character and the string being
> matched doesn't have the UTF8 flag, matching leaks memory.
> 
> "a" =~ /\N{U+2129}/ while 1; # Don't forget to kill the script before
> it eats all the memory!
> 
> Using an upgraded string doesn't leak at all:
> 
> utf8::upgrade(my $x = 'a');
> $x =~ /\N{U+2129}/ while 1;
> 
> See https://www.perlmonks.org/?node_id=11105281 for the original
> report (with
> a bit longer examples) and discussion.
> 
> Ch.
> 
> [Please do not change anything below this line]
> -----------------------------------------------------------------
> ---
> Flags:
>      category=core
>      severity=high
> ---
> Site configuration information for perl 5.31.4:
> 
> Configured by choroba at Mon Aug 26 16:15:05 CEST 2019.
> 
> Summary of my perl5 (revision 5 version 31 subversion 4)
> configuration:
>    Commit id: 6e404ab585deadc1c32d50513f13b50ae395c00d
>    Platform:
>      osname=linux
>      osvers=4.12.14-lp151.28.13-default
>      archname=x86_64-linux-thread-multi
>      uname='linux lenonovo 4.12.14-lp151.28.13-default #1 smp wed aug
> 7 07:20:16 utc 2019 (0c09ad2) x86_64 x86_64 x86_64 gnulinux '
>      config_args='-rdes -Dusethreads -Dpthread -Dprefix=~/blead
> -Dusedevel'
>      hint=recommended
>      useposix=true
>      d_sigaction=define
>      useithreads=define
>      usemultiplicity=define
>      use64bitint=define
>      use64bitall=define
>      uselongdouble=undef
>      usemymalloc=n
>      default_inc_excludes_dot=define
>      bincompat5005=undef
>    Compiler:
>      cc='cc'
>      ccflags ='-D_REENTRANT -D_GNU_SOURCE -fwrapv -fno-strict-aliasing
> -pipe -fstack-protector-strong -I/usr/local/include
> -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64 -D_FORTIFY_SOURCE=2'
>      optimize='-O2'
>      cppflags='-D_REENTRANT -D_GNU_SOURCE -fwrapv -fno-strict-aliasing
> -pipe -fstack-protector-strong -I/usr/local/include'
>      ccversion=''
>      gccversion='7.4.1 20190424 [gcc-7-branch revision 270538]'
>      gccosandvers=''
>      intsize=4
>      longsize=8
>      ptrsize=8
>      doublesize=8
>      byteorder=12345678
>      doublekind=3
>      d_longlong=define
>      longlongsize=8
>      d_longdbl=define
>      longdblsize=16
>      longdblkind=3
>      ivtype='long'
>      ivsize=8
>      nvtype='double'
>      nvsize=8
>      Off_t='off_t'
>      lseeksize=8
>      alignbytes=8
>      prototype=define
>    Linker and Libraries:
>      ld='cc'
>      ldflags =' -fstack-protector-strong -L/usr/local/lib'
>      libpth=/usr/local/lib /usr/lib64/gcc/x86_64-suse-linux/7/include-
> fixed /usr/lib64/gcc/x86_64-suse-linux/7/../../../../x86_64-suse-
> linux/lib /usr/lib /lib/../lib64 /usr/lib/../lib64 /lib /lib64
> /usr/lib64 /usr/local/lib64
>      libs=-lpthread -lgdbm -ldl -lm -lcrypt -lutil -lc -lgdbm_compat
>      perllibs=-lpthread -ldl -lm -lcrypt -lutil -lc
>      libc=libc-2.26.so
>      so=so
>      useshrplib=false
>      libperl=libperl.a
>      gnulibc_version='2.26'
>    Dynamic Linking:
>      dlsrc=dl_dlopen.xs
>      dlext=so
>      d_dlsymun=undef
>      ccdlflags='-Wl,-E'
>      cccdlflags='-fPIC'
>      lddlflags='-shared -O2 -L/usr/local/lib -fstack-protector-strong'
> 
> 
> ---
> @INC for perl 5.31.4:
>      /home/choroba/blead/lib/perl5/site_perl/5.31.4/x86_64-linux-
> thread-multi
>      /home/choroba/blead/lib/perl5/site_perl/5.31.4
>      /home/choroba/blead/lib/perl5/5.31.4/x86_64-linux-thread-multi
>      /home/choroba/blead/lib/perl5/5.31.4
> 
> ---
> Environment for perl 5.31.4:
>      HOME=/home/choroba
>      LANG=en_US.utf8
>      LANGUAGE (unset)
>      LC_CTYPE=en_US.UTF-8
>      LD_LIBRARY_PATH (unset)
>      LOGDIR (unset)
>      PATH=/home/choroba/bin:/usr/local/bin:/usr/bin:/bin:/usr/bin/X11:/home/choroba/perl5/bin:/home/choroba/opensource/worktime/bin:.
>      PERL_BADLANG (unset)
>      SHELL=/bin/bash

What is happening here is that in re_intuit_start() at line 922 in regexec.c, it determines there is no possible match because you need the target string to be in UTF-8 to match the character in the pattern.  But something is not returning memory when re_intuit_start returns failure.  There are other instances of this failure return in re_intuit_start, and I suspect they leak as well.

I'm thinking someone who knows about the regex memory allocation can answer this without much effort, so I'm deferring to someone like that to step forward
-- 
Karl Williamson

---
via perlbug:  queue: perl5 status: new
https://rt.perl.org/Ticket/Display.html?id=134390

Thread Previous | Thread Next


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About