develooper Front page | perl.perl5.porters | Postings from September 2012

[perl #114808] split output empty when PATTERN and EXPR have "wide" characters

Thread Previous
From:
Marty O'Brien
Date:
September 9, 2012 21:04
Subject:
[perl #114808] split output empty when PATTERN and EXPR have "wide" characters
Message ID:
rt-3.6.HEAD-11172-1347217236-819.114808-75-0@perl.org
# New Ticket Created by  "Marty O'Brien" 
# Please include the string:  [perl #114808]
# in the subject line of all future correspondence about this issue. 
# <URL: https://rt.perl.org:443/rt3/Ticket/Display.html?id=114808 >



This is a bug report for perl from mobrule@gmail.com,
generated with the help of perlbug 1.39 running under perl 5.12.4.

=head1 BUG REPORT

split output empty when PATTERN and EXPR have "wide" characters

=head1 DESCRIPTION

In a call to the builtin C<split PATTERN, EXPR>  function:

if the C<PATTERN> contains one or more "wide" characters
(that is, greater than C<chr(255)>), and if C<EXPR> contains
a different set of wide characters, then in some cases the
C<split> call will not produce any output.

In addition, when C<warnings> are enabled, Perl will produce
one and sometimes several spurious C<Use of uninitialized value
in split at ...> messages.

This bug has been observed on many different versions of Perl,
from v5.8 to v5.15, on Linux, Cygwin, and Windows.

=head2 ADDITIONAL INFORMATION

A C<split> statement will only exhibit this problem after
the first time it is executed.

=head2 DEMONSTRATION

The code in this script demonstrates the issue. The tests
check that C<split> produced output, and since the cases
are designed so that the C<PATTERN>s do not match the
C<EXPR>s, that the output are one-element lists that contain
the original C<EXPR>.

Tests pass when the characters in C<EXPR> are all below
C<chr(256)>, or after a C<split> statement is executed for
the first time. Tests fail when C<EXPR> contains a "wide"
character, and the output is not from the first time a
C<split> statement has been executed.

=cut

use strict;
use warnings;
use Test::More;
use Encode;
binmode *STDOUT, ":encoding(UTF-8)";
binmode *STDERR, ":encoding(UTF-8)";
sub toUTF8 ($) { Encode::encode("utf-8",$_[0]) }; # for output to Test::Builder

my $text0 = "normal\x{ee}";             # 8-bit string
my $text1 = "\x{444}";                  # single wide char
my $text2 = "ab\x{ccc}de\x{999}gh";     # string containing wide char

my $pattern1 = chr(0xabc);              # single wide char
my $pattern2 = "\x{abc}\x{def}ghi";     # more than one wide char

for ( $text1,      # the first call to each split function is ok
      $text0,      # ok
      $text1,      # tests fail, one warning
      $text2) {    # tests fail, more than one warning

    print STDERR "--------------------\ntext is $_\n";

    print STDERR "pattern is /$pattern1/\n";
    my @list1 = split /$pattern1/, $_;
    ok(@list1 > 0, toUTF8 "split had results for text $_, pattern $pattern1");
    ok($list1[0] eq $_, toUTF8 "correct result for text $_, pattern $pattern1");

    print STDERR "pattern is /$pattern2/\n";
    my @list2 = split /$pattern2/, $_;
    ok(@list2 > 0, toUTF8 "split had results for text $_, pattern $pattern2");
    ok($list2[0] eq $_, toUTF8 "correct result for text $_, pattern $pattern2");
}

done_testing;

=head2 workaround

A workaround is to call C<split> indirectly in a way that insures
the C<PATTERN> is recompiled before each call. For the tests above,
we could define an indirect function

    sub SPLIT { my ($PATTERN, $EXPR) = @_; return split /$PATTERN/,$EXPR }

and call

    my @list1 = SPLIT $pattern1, $_;
    ...
    my @list2 = SPLIT $pattern2, $_;

instead.

=head1 SUBMITTED BY

Marty O'Brien, E<lt>mob@cpan.orgE<gt>

=cut
---
Flags:
    category=core
    severity=medium
---
Site configuration information for perl 5.12.4:

Configured by Debian Project at Tue Sep  6 08:07:52 UTC 2011.

Summary of my perl5 (revision 5 version 12 subversion 4) configuration:
   
  Platform:
    osname=linux, osvers=2.6.24-28-server, archname=i686-linux-gnu-thread-multi-64int
    uname='linux roseapple 2.6.24-28-server #1 smp wed aug 18 21:17:51 utc 2010 i686 i686 i386 gnulinux '
    config_args='-Dusethreads -Duselargefiles -Dccflags=-DDEBIAN -Dcccdlflags=-fPIC -Darchname=i686-linux-gnu -Dprefix=/usr -Dprivlib=/usr/share/perl/5.12 -Darchlib=/usr/lib/perl/5.12 -Dvendorprefix=/usr -Dvendorlib=/usr/share/perl5 -Dvendorarch=/usr/lib/perl5 -Dsiteprefix=/usr/local -Dsitelib=/usr/local/share/perl/5.12.4 -Dsitearch=/usr/local/lib/perl/5.12.4 -Dman1dir=/usr/share/man/man1 -Dman3dir=/usr/share/man/man3 -Dsiteman1dir=/usr/local/man/man1 -Dsiteman3dir=/usr/local/man/man3 -Duse64bitint -Dman1ext=1 -Dman3ext=3perl -Dpager=/usr/bin/sensible-pager -Uafs -Ud_csh -Ud_ualarm -Uusesfio -Uusenm -Ui_libutil -DDEBUGGING=-g -Doptimize=-O2 -Duseshrplib -Dlibperl=libperl.so.5.12.4 -des'
    hint=recommended, useposix=true, d_sigaction=define
    useithreads=define, usemultiplicity=define
    useperlio=define, d_sfio=undef, uselargefiles=define, usesocks=undef
    use64bitint=define, use64bitall=undef, uselongdouble=undef
    usemymalloc=n, bincompat5005=undef
  Compiler:
    cc='cc', ccflags ='-D_REENTRANT -D_GNU_SOURCE -DDEBIAN -fno-strict-aliasing -pipe -fstack-protector -I/usr/local/include -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64',
    optimize='-O2 -g',
    cppflags='-D_REENTRANT -D_GNU_SOURCE -DDEBIAN -fno-strict-aliasing -pipe -fstack-protector -I/usr/local/include'
    ccversion='', gccversion='4.6.1', gccosandvers=''
    intsize=4, longsize=4, ptrsize=4, doublesize=8, byteorder=12345678
    d_longlong=define, longlongsize=8, d_longdbl=define, longdblsize=12
    ivtype='long long', ivsize=8, nvtype='double', nvsize=8, Off_t='off_t', lseeksize=8
    alignbytes=4, prototype=define
  Linker and Libraries:
    ld='cc', ldflags =' -fstack-protector -L/usr/local/lib'
    libpth=/usr/local/lib /lib/i386-linux-gnu /lib/../lib /usr/lib/i386-linux-gnu /usr/lib/../lib /lib /usr/lib
    libs=-lgdbm -lgdbm_compat -ldb -ldl -lm -lpthread -lc -lcrypt
    perllibs=-ldl -lm -lpthread -lc -lcrypt
    libc=, so=so, useshrplib=true, libperl=libperl.so.5.12.4
    gnulibc_version='2.13'
  Dynamic Linking:
    dlsrc=dl_dlopen.xs, dlext=so, d_dlsymun=undef, ccdlflags='-Wl,-E'
    cccdlflags='-fPIC', lddlflags='-shared -O2 -g -L/usr/local/lib -fstack-protector'

Locally applied patches:
    

---
@INC for perl 5.12.4:
    /etc/perl
    /usr/local/lib/perl/5.12.4
    /usr/local/share/perl/5.12.4
    /usr/lib/perl5
    /usr/share/perl5
    /usr/lib/perl/5.12
    /usr/share/perl/5.12
    /usr/local/lib/site_perl
    .

---
Environment for perl 5.12.4:
    HOME=/root
    LANG=en_US.UTF-8
    LANGUAGE (unset)
    LD_LIBRARY_PATH (unset)
    LOGDIR (unset)
    PATH=/root/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games
    PERL_BADLANG (unset)
    SHELL=/bin/bash


Thread Previous


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About