develooper Front page | perl.perl5.porters | Postings from June 2013

[perl #118297] Mixing up- and down-graded strings in regex broken in 5.18.0

Thread Previous
D. Ilmari Mannsåker
June 4, 2013 18:59
[perl #118297] Mixing up- and down-graded strings in regex broken in 5.18.0
Message ID:
# New Ticket Created by  "D. Ilmari Mannsåker" 
# Please include the string:  [perl #118297]
# in the subject line of all future correspondence about this issue. 
# <URL: >

This is a bug report for perl from,
generated with the help of perlbug 1.39 running under perl 5.18.0.

[Please describe your issue here]

$ perl -e 'utf8::upgrade(my $u = "\x{e5}"); utf8::downgrade(my $d = 
"\x{e5}"); qr{$u $d}'
Malformed UTF-8 character (1 byte, need 3, after start byte 0xe5) in 
regexp compilation at -e line 1.
Malformed UTF-8 character (1 byte, need 3, after start byte 0xe5) in 
regexp compilation at -e line 1.

$ ../perl/Porting/ -j6 --target=miniperl --start=v5.17.11 
--end=v5.18.0-RC1 -e '$u = "\x{666}"; $d = "\x{e5}"; $SIG{__WARN__} = 
sub { die $_[0] }; qr{$u $d}'

35738543f95c2bc8c0545f370c642a84a0fb4b69 is the first bad commit
commit 35738543f95c2bc8c0545f370c642a84a0fb4b69
Author: David Mitchell <>
Date:   Mon Apr 15 17:18:30 2013 +0100

     Perl_re_op_compile(): handle utf8 concating better

     When concatting the list of arguments together to form a final pattern
     string, the code formerly did a quick scan of all the args first, and
     if any of them were SvUTF8, it set the (empty) destination string 
to UTF8
     before concatting all the individual args. This avoided the pattern
     getting upgraded to utf8 halfway through, and thus the indices for code
     blocks becoming invalid.

     However this was not 100% reliable because, as an "XXX" code comment of
     mine pointed out, when overloading is involved it is possible for 
an arg
     to appear initially not to be utf8, but to be utf8 when its value is
     finally accessed. This results an obscure bug (as shown in the test 
     for this commit), where literal /(?{code})/ still required 'use re

     The fix for this is to instead adjust the code block indices on the fly
     if the pattern string happens to get upgraded to utf8. This is easy(er)
     now that we have the new S_pat_upgrade_to_utf8() function.

     As well as fixing the bug, this also simplifies the main concat loop in
     the code, which will make it easier to handle interpolating arrays 
     /@foo/) when we move the interpolation from the join op into the regex
     engine itself shortly.

:100644 100644 f29284632e54afb24df68ec2d0ebfacd8eac5497 
f7f309b281a6683815efa0f6d06b5661ffa41b84 M	regcomp.c
:040000 040000 27e6c237516a8f9cb3caf0745da433604ab15764 
e627a5a459c0bc59d1e0cd8d8f4d837e306d983f M	t
bisect run success
That took 321 seconds

[Please do not change anything below this line]
Site configuration information for perl 5.18.0:

Configured by ilmari at Mon May 20 10:43:21 BST 2013.

Summary of my perl5 (revision 5 version 18 subversion 0) configuration:

     osname=linux, osvers=3.2.0-41-generic, archname=x86_64-linux
     uname='linux zarquon 3.2.0-41-generic #66-ubuntu smp thu apr 25 
03:27:11 utc 2013 x86_64 x86_64 x86_64 gnulinux '
     hint=recommended, useposix=true, d_sigaction=define
     useithreads=undef, usemultiplicity=undef
     useperlio=define, d_sfio=undef, uselargefiles=define, usesocks=undef
     use64bitint=define, use64bitall=define, uselongdouble=undef
     usemymalloc=n, bincompat5005=undef
     cc='cc', ccflags ='-fno-strict-aliasing -pipe -fstack-protector 
-I/usr/local/include -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64',
     cppflags='-fno-strict-aliasing -pipe -fstack-protector 
     ccversion='', gccversion='4.6.3', gccosandvers=''
     intsize=4, longsize=8, ptrsize=8, doublesize=8, byteorder=12345678
     d_longlong=define, longlongsize=8, d_longdbl=define, longdblsize=16
     ivtype='long', ivsize=8, nvtype='double', nvsize=8, Off_t='off_t', 
     alignbytes=8, prototype=define
   Linker and Libraries:
     ld='cc', ldflags =' -fstack-protector -L/usr/local/lib'
     libpth=/usr/local/lib /lib/x86_64-linux-gnu /lib/../lib 
/usr/lib/x86_64-linux-gnu /usr/lib/../lib /lib /usr/lib
     libs=-lnsl -lgdbm -ldb -ldl -lm -lcrypt -lutil -lc -lgdbm_compat
     perllibs=-lnsl -ldl -lm -lcrypt -lutil -lc
     libc=, so=so, useshrplib=false, libperl=libperl.a
   Dynamic Linking:
     dlsrc=dl_dlopen.xs, dlext=so, d_dlsymun=undef, ccdlflags='-Wl,-E'
     cccdlflags='-fPIC', lddlflags='-shared -O2 -L/usr/local/lib 

Locally applied patches:

@INC for perl 5.18.0:

Environment for perl 5.18.0:
     LD_LIBRARY_PATH (unset)
     LOGDIR (unset)
     PERL_BADLANG (unset)
     PERL_CPANM_OPT=--mirror= --mirror-only

Irresistible fashion at your fingertips

The information in this email is confidential and is intended solely for the addressee. Access to this email by anyone else is unauthorised. If you are not the intended recipient, you must not read, use or disseminate the information. Any views expressed in this message are those of the individual sender, except where the sender specifically states them to be the views of Net-A-Porter Group Limited. 

The Net-A-Porter Group Limited is a company registered in England & Wales Number: 3820604 Registered Office: 1 The Village Offices, Westfield, Ariel Way, London, W12 7GF

Thread Previous Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at | Group listing | About