develooper Front page | perl.perl5.porters | Postings from September 2018

[perl #133535] B API for aux_list/OP_MULTICONCAT does not return thelast segment when plain & utf8 representations are different

From:
Atoomic
Date:
September 20, 2018 16:57
Subject:
[perl #133535] B API for aux_list/OP_MULTICONCAT does not return thelast segment when plain & utf8 representations are different
Message ID:
rt-4.0.24-6225-1537462672-571.133535-75-0@perl.org
# New Ticket Created by  Atoomic 
# Please include the string:  [perl #133535]
# in the subject line of all future correspondence about this issue. 
# <URL: https://rt.perl.org/Ticket/Display.html?id=133535 >


This is a bug report for perl from atoomic@cpan.org,
generated with the help of perlbug 1.40 running under perl 5.28.0.


-----------------------------------------------------------------
[Please describe your issue here]

I noticed this while using B API with op/substr.t to compile it using B::C
with Perl 5.28.0

>From the comment in pp_hot.c we can read that in some cases we can have two
sets of segment lengths

     * * If the string has different plain and utf8 representations
     * (e.g. "\x80"), then then aux[PERL_MULTICONCAT_IX_PLAIN_PV/LEN]]
     * holds the plain rep, while aux[PERL_MULTICONCAT_IX_UTF8_PV/LEN]
     * holds the utf8 rep, and there are 2 sets of segment lengths,
     * with the utf8 set following after the plain set.

I've the feeling that B API aux_list for multiconcat is missing to read the
last segment in that scenario

With this simplified version of op/substr.t, it's easier to debug as we
have one single multiconcat op.
________________________________________________________________________________
#!./perl

print "1..1\n";

use utf8;
my $refee = bless [], "\x{100}a";
my $string = $refee;
$string = "$string";
substr $refee, 0, 0, "\xff";
my $expect = "\xff$string"; # <---- multiconcat
print "$refee" eq $expect ? "ok 1\n" : "not ok 1\n";
________________________________________________________________________________


While running the program we are going through this code, where nargs=1,
so we are clearly using not the first but the second segment.

Perl_pp_multiconcat
   │676   const_lens = aux + PERL_MULTICONCAT_IX_LENGTHS; │
   │677
   │678   if (dst_utf8) { │
   │679       const_pv = aux[PERL_MULTICONCAT_IX_UTF8_PV].pv; │
   │680       if (   aux[PERL_MULTICONCAT_IX_PLAIN_PV].pv │
   │681   && const_pv != aux[PERL_MULTICONCAT_IX_PLAIN_PV].pv) │
   │682   /* separate sets of lengths for plain and utf8 */ │
  >│683   const_lens += nargs + 1;

Here is a look at aux

# ----- dump of aux from Perl_pp_multiconcat
# header
aux = aux[0] = 1
aux[1] = \377
aux[2] = 1
aux[3] = "ÿ",
aux[4] = 2

# first element
aux[5] 1    # <---- const_lens
aux[6] -1
# second segment which was not returned by B::API
aux[7] 2
aux[8] -1


Not exactly sure if adding such a rule is good enough but this is fixing
the cases
where before that we would only read the first segment

# Suggested patch to B API for aux_list/OP_MULTICONCAT
if (
aux[PERL_MULTICONCAT_IX_PLAIN_PV].pv
    && aux[PERL_MULTICONCAT_IX_UTF8_PV].pv
    && aux[PERL_MULTICONCAT_IX_UTF8_PV].pv !=
aux[PERL_MULTICONCAT_IX_PLAIN_PV].pv ) {
# read the additional segment
nargs += 2;
}


[Please do not change anything below this line]
-----------------------------------------------------------------
---
Flags:
    category=library
    severity=low
    module=B
---
Site configuration information for perl 5.28.0:

Configured by nicolas at Wed Nov 29 10:26:27 MST 2017.

Summary of my perl5 (revision 5 version 26 subversion 1) configuration:

  Platform:
    osname=darwin
    osvers=15.6.0
    archname=darwin-2level
    uname='darwin nicolas-r.local 15.6.0 darwin kernel version 15.6.0: mon
oct 2 22:20:08 pdt 2017; root:xnu-3248.71.4~1release_x86_64 x86_64 '
    config_args='-de -Dprefix=/usr/local/perl/perls/perl-5.28.0
-Aeval:scriptdir=/usr/local/perl/perls/perl-5.28.0/bin'
    hint=recommended
    useposix=true
    d_sigaction=define
    useithreads=undef
    usemultiplicity=undef
    use64bitint=define
    use64bitall=define
    uselongdouble=undef
    usemymalloc=n
    default_inc_excludes_dot=define
    bincompat5005=undef
  Compiler:
    cc='cc'
    ccflags ='-fno-common -DPERL_DARWIN -fno-strict-aliasing -pipe
-fstack-protector-strong -I/usr/local/include'
    optimize='-O3'
    cppflags='-fno-common -DPERL_DARWIN -fno-strict-aliasing -pipe
-fstack-protector-strong -I/usr/local/include'
    ccversion=''
    gccversion='4.2.1 Compatible Apple LLVM 8.0.0 (clang-800.0.42.1)'
    gccosandvers=''
    intsize=4
    longsize=8
    ptrsize=8
    doublesize=8
    byteorder=12345678
    doublekind=3
    d_longlong=define
    longlongsize=8
    d_longdbl=define
    longdblsize=16
    longdblkind=3
    ivtype='long'
    ivsize=8
    nvtype='double'
    nvsize=8
    Off_t='off_t'
    lseeksize=8
    alignbytes=8
    prototype=define
  Linker and Libraries:
    ld='env MACOSX_DEPLOYMENT_TARGET=10.3 cc'
    ldflags =' -fstack-protector-strong -L/usr/local/lib'
    libpth=/usr/local/lib
/Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/../lib/clang/8.0.0/lib
/Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/lib
/usr/lib
    libs=-lpthread -lgdbm -ldbm -ldl -lm -lutil -lc
    perllibs=-lpthread -ldl -lm -lutil -lc
    libc=
    so=dylib
    useshrplib=false
    libperl=libperl.a
    gnulibc_version=''
  Dynamic Linking:
    dlsrc=dl_dlopen.xs
    dlext=bundle
    d_dlsymun=undef
    ccdlflags=' '
    cccdlflags=' '
    lddlflags=' -bundle -undefined dynamic_lookup -L/usr/local/lib
-fstack-protector-strong'


---
@INC for perl 5.28.0:
    /Users/nicolas/.dotfiles/perl-must-have/lib
    /Users/nicolas/perl5/lib/perl5/
    /usr/local/perl/perls/perl-5.28.0/lib/site_perl/5.28.0/darwin-2level
    /usr/local/perl/perls/perl-5.28.0/lib/site_perl/5.28.0
    /usr/local/perl/perls/perl-5.28.0/lib/5.28.0/darwin-2level
    /usr/local/perl/perls/perl-5.28.0/lib/5.28.0

---
Environment for perl 5.28.0:
    DYLD_LIBRARY_PATH (unset)
    HOME=/Users/nicolas
    LANG=en_US.UTF-8
    LANGUAGE (unset)
    LC_CTYPE=en_US.UTF-8
    LD_LIBRARY_PATH (unset)
    LOGDIR (unset)

PATH=/usr/local/perl/bin:/usr/local/perl/perls/perl-5.28.0/bin:/usr/local/opt/ccache/libexec:/usr/local/bin:/usr/bin:/bin:/usr/sbin:/sbin:/opt/X11/bin:/usr/local/git/bin:/usr/local/MacGPG2/bin:/Users/nicolas/.dotfiles/bin:/Users/nicolas/perl5/bin
    PERL5DB=use Devel::NYTProf

PERL5LIB=/Users/nicolas/.dotfiles/perl-must-have/lib:/Users/nicolas/perl5/lib/perl5/
    PERLBREW_BASHRC_VERSION=0.80
    PERLBREW_HOME=/Users/nicolas/.perlbrew
    PERLBREW_MANPATH=/usr/local/perl/perls/perl-5.28.0/man
    PERLBREW_PATH=/usr/local/perl/bin:/usr/local/perl/perls/perl-5.28.0/bin
    PERLBREW_PERL=perl-5.28.0
    PERLBREW_ROOT=/usr/local/perl
    PERLBREW_VERSION=0.84
    PERL_BADLANG (unset)
    PERL_CPANM_OPT=--quiet
    SHELL=/usr/local/bin/zsh




nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About