develooper Front page | perl.perl5.porters | Postings from January 2012

[perl #108386] pp_ctl.c:PP(pp_formline) bugs with wide chars in caret fields

From:
Antons Suspans
Date:
January 16, 2012 13:43
Subject:
[perl #108386] pp_ctl.c:PP(pp_formline) bugs with wide chars in caret fields
Message ID:
rt-3.6.HEAD-14510-1326749879-1361.108386-75-0@perl.org
# New Ticket Created by  Antons Suspans 
# Please include the string:  [perl #108386]
# in the subject line of all future correspondence about this issue. 
# <URL: https://rt.perl.org:443/rt3/Ticket/Display.html?id=108386 >



This is a bug report for perl from Antons Suspans <antox@ml.lv>,
generated with the help of perlbug 1.39 running under perl 5.14.2.


-----------------------------------------------------------------
[Please describe your issue here]

Perl `formline' function (and thus `write') misbehaves when
caret fields are filled with Unicode input. Namely when data
has wide characters, but is not longer than the field width,
then printed data is chopped off incorrectly and also CR symbol
may not have the desired effect.

Below are:
- samples of such behavior,
- proposed fix,
- script for reproducing.


* SAMPLES *

Some samples follow, where
- `in' is input (to `formline'),
- `fmt' is applied format,
- `out' is output (from $^A),
- `lft' is what is left in input variable.

in:  `0Аあ^Mあ^Mあ0' (8) / [30;d0,90;e3,81,82;0d;e3,81,82;0d;e3,81,82;30] (15)
fmt: `^<<<'
out: `0Аあ' (3) / [30;d0,90;e3,81,82] (6)
lft: `あ^Mあ0' (4) / [e3,81,82;0d;e3,81,82;30] (8)

in:  `あ^Mあ0' (4) / [e3,81,82;0d;e3,81,82;30] (8)
fmt: `^<<<'
out: `あ あ' (3) / [e3,81,82;20;e3,81,82] (7)      [wrong]
lft: `あ0' (2) / [e3,81,82;30] (4)

in:  `あ0' (2) / [e3,81,82;30] (4)
fmt: `^<<<'
out: `あ0' (2) / [e3,81,82;30] (4)
lft: `\x820' (2) / [82;30] (2)                     [wrong]

Two lines marked with [wrong] expected to be, respectively:
out: `あ' (1) / [e3,81,82] (3)
lft: `' (0) / [] (0)

(See SCRIPT below for Perl code used to obtain these data.)


* PROPOSED FIX *

This wrong behavior is related to
- file `pp_ctl.c',
- function `PP(pp_formline)',
- `case FF_CHECKCHOP:',
- Unicode branch with `itemsize <= fieldsize',
where no chars<->bytes transition is performed.

The following patch can be applied:

--- old/perl-5.14.2/pp_ctl.c
+++ new/perl-5.14.2/pp_ctl.c
@@ -705,10 +705,11 @@
 		    if (itemsize != (I32)len) {
 			I32 itembytes;
 			if (itemsize <= fieldsize) {
-			    const char *send = chophere = s + itemsize;
+			    const char *send = chophere = s + len;
 			    while (s < send) {
 				if (*s == '\r') {
 				    itemsize = s - item;
+				    sv_pos_b2u(sv, &itemsize);
 				    chophere = s;
 				    break;
 				}

With this patch `perl' compiled from sources (using default
options) produces output as expected. Existing `op/write.t'
tests are still passed.

Without applying the patch, as a workaround, users might append
sufficient amount of space characters to their input.

It turns out that this problem was already touched by Nicolas
Clark: https://rt.perl.org/rt3/Public/Bug/Display.html?id=56834

This still does not make `$:' Unicode-aware:
`strchr(PL_chopset, *s)'
persists and chopping is ruled by separate bytes.


* SCRIPT *

Script used to produce data in SAMPLES above:

#!/usr/bin/perl

use strict;
use warnings;

binmode STDOUT, ":encoding(UTF-8)";

test_string('^<<<', "0\x{410}\x{3042}\r\x{3042}\r\x{3042}0");
test_string('^<<<',                   "\x{3042}\r\x{3042}0");
test_string('^<<<',                             "\x{3042}0");

sub test_string {
  my ($format, $string) = @_;
  local $^A = '';
  print 'in:  ', examine_string($string), "\n";
  print "fmt: `$format'\n";
  formline $format, $string;
  print 'out: ', examine_string($^A), "\n";
  print 'lft: ', examine_string($string), "\n";
  print "\n";
}

sub examine_string {
  my $chars = shift;
  my $chars_count = length($chars),
  my @bytes = ();
  my $bytes_count = 0;
  for (split '', $chars) {
    my @b = unpack 'U0C*';
    push @bytes, sprintf('%*v02x', ',', pack 'C*', @b);
    $bytes_count += @b;
  }
  local $" = ';';
  "`$chars' ($chars_count) / [@bytes] ($bytes_count)";
}


[Please do not change anything below this line]
-----------------------------------------------------------------
---
Flags:
    category=core
    severity=low
---
Site configuration information for perl 5.14.2:

Configured by nobody at Tue Jan  3 21:04:36 UTC 2012.

Summary of my perl5 (revision 5 version 14 subversion 2) configuration:
   
  Platform:
    osname=linux, osvers=3.1.6-1-arch, archname=i686-linux-thread-multi
    uname='linux marin 3.1.6-1-arch #1 smp preempt thu dec 22 09:11:48 cet 2011 i686 intel(r) core(tm) i7 cpu 920 @ 2.67ghz genuineintel gnulinux '
    config_args='-des -Dusethreads -Duseshrplib -Doptimize=-march=i686 -mtune=generic -O2 -pipe -fstack-protector --param=ssp-buffer-size=4 -D_FORTIFY_SOURCE=2 -Dprefix=/usr -Dinstallprefix=/usr -Dvendorprefix=/usr -Dprivlib=/usr/share/perl5/core_perl -Darchlib=/usr/lib/perl5/core_perl -Dsitelib=/usr/share/perl5/site_perl -Dsitearch=/usr/lib/perl5/site_perl -Dvendorlib=/usr/share/perl5/vendor_perl -Dvendorarch=/usr/lib/perl5/vendor_perl -Dscriptdir=/usr/bin/core_perl -Dsitescript=/usr/bin/site_perl -Dvendorscript=/usr/bin/vendor_perl -Dinc_version_list=none -Dman1ext=1perl -Dman3ext=3perl -Dlddlflags=-shared -Wl,-O1,--sort-common,--as-needed,-z,relro,--hash-style=gnu -Dldflags=-Wl,-O1,--sort-common,--as-needed,-z,relro,--hash-style=gnu'
    hint=recommended, useposix=true, d_sigaction=define
    useithreads=define, usemultiplicity=define
    useperlio=define, d_sfio=undef, uselargefiles=define, usesocks=undef
    use64bitint=undef, use64bitall=undef, uselongdouble=undef
    usemymalloc=n, bincompat5005=undef
  Compiler:
    cc='cc', ccflags ='-D_REENTRANT -D_GNU_SOURCE -fno-strict-aliasing -pipe -fstack-protector -I/usr/local/include -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64',
    optimize='-march=i686 -mtune=generic -O2 -pipe -fstack-protector --param=ssp-buffer-size=4 -D_FORTIFY_SOURCE=2',
    cppflags='-D_REENTRANT -D_GNU_SOURCE -fno-strict-aliasing -pipe -fstack-protector -I/usr/local/include'
    ccversion='', gccversion='4.6.2 20111223 (prerelease)', gccosandvers=''
    intsize=4, longsize=4, ptrsize=4, doublesize=8, byteorder=1234
    d_longlong=define, longlongsize=8, d_longdbl=define, longdblsize=12
    ivtype='long', ivsize=4, nvtype='double', nvsize=8, Off_t='off_t', lseeksize=8
    alignbytes=4, prototype=define
  Linker and Libraries:
    ld='cc', ldflags ='-Wl,-O1,--sort-common,--as-needed,-z,relro,--hash-style=gnu -fstack-protector -L/usr/local/lib'
    libpth=/usr/local/lib /lib /usr/lib
    libs=-lnsl -lgdbm -ldb -ldl -lm -lcrypt -lutil -lpthread -lc -lgdbm_compat
    perllibs=-lnsl -ldl -lm -lcrypt -lutil -lpthread -lc
    libc=/lib/libc-2.15.so, so=so, useshrplib=true, libperl=libperl.so
    gnulibc_version='2.15'
  Dynamic Linking:
    dlsrc=dl_dlopen.xs, dlext=so, d_dlsymun=undef, ccdlflags='-Wl,-E -Wl,-rpath,/usr/lib/perl5/core_perl/CORE'
    cccdlflags='-fPIC', lddlflags='-shared -Wl,-O1,--sort-common,--as-needed,-z,relro,--hash-style=gnu -L/usr/local/lib -fstack-protector'

Locally applied patches:
    

---
@INC for perl 5.14.2:
    /usr/lib/perl5/site_perl
    /usr/share/perl5/site_perl
    /usr/lib/perl5/vendor_perl
    /usr/share/perl5/vendor_perl
    /usr/lib/perl5/core_perl
    /usr/share/perl5/core_perl
    .

---
Environment for perl 5.14.2:
    HOME=/home/antox
    LANG=en_US.UTF-8
    LANGUAGE (unset)
    LD_LIBRARY_PATH (unset)
    LOGDIR (unset)
    PATH=/home/antox/bin:/usr/local/bin:/usr/bin:/bin:/usr/local/sbin:/usr/sbin:/sbin:/opt/bin:/usr/share/java/apache-ant/bin:/opt/java/bin:/opt/java/db/bin:/opt/java/jre/bin:/usr/bin/vendor_perl:/usr/bin/core_perl
    PERL_BADLANG (unset)
    SHELL=/bin/zsh




nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About