Front page | perl.perl5.porters |
Postings from January 2004
[perl #24888] chomp ignores utf8
From:
Nicholas Clark
Date:
January 12, 2004 20:11
Subject:
[perl #24888] chomp ignores utf8
Message ID:
rt-3.0.8-24888-69959.12.2317179856103@perl.org
# New Ticket Created by Nicholas Clark
# Please include the string: [perl #24888]
# in the subject line of all future correspondence about this issue.
# <URL: http://rt.perl.org/rt3/Ticket/Display.html?id=24888 >
This is a bug report for perl from nick@ccl4.org,
generated with the help of perlbug 1.34 running under perl v5.8.3.
-----------------------------------------------------------------
[Please enter your report here]
While working my way down doop.c, I discovered that chomp completely ignores
utf8 flags in both the chomped string and $/
With the following patch to t/op/chop.t there are many test failures.
I'm not sure of the most efficient way to patch Perl_do_chomp to cure them.
I guess use the existing byte comparison code if utf8 flags are the same
on both the target and $/, and do conversion otherwise, but I'm not going to
look further until after 5.8.3 is released.
ok 52 - start=78 end=78
ok 53 - start=78 end=163
not ok 54 - start=78 end=163 (end as bytes)
# Failed at t/op/chop.t line 203
# got 'NÂ'
# expected 'N£'
ok 55 - start=78 end=163 ($/ as bytes)
ok 56 - start=78 end=164
not ok 57 - start=78 end=164 (end as bytes)
# Failed at t/op/chop.t line 203
# got 'N'
# expected 'N¤'
not ok 58 - start=78 end=164 ($/ as bytes)
# Failed at t/op/chop.t line 209
# got 'N'
# expected 'N¤'
ok 59 - start=78 end=1296
not ok 60 - start=78 end=1296 (end as bytes)
# Failed at t/op/chop.t line 203
# got 'N'
# expected 'NÔ
not ok 61 - start=78 end=1296 ($/ as bytes)
# Failed at t/op/chop.t line 209
# got 'N'
Wide character in print at ./test.pl line 38.
# expected 'NÔ
ok 62 - start=163 end=78
ok 63 - start=163 end=163
not ok 64 - start=163 end=163 (end as bytes)
# Failed at t/op/chop.t line 203
# got '£Â'
# expected '£Â£'
ok 65 - start=163 end=163 ($/ as bytes)
ok 66 - start=163 end=164
not ok 67 - start=163 end=164 (end as bytes)
# Failed at t/op/chop.t line 203
# got '£'
# expected '£Â¤'
not ok 68 - start=163 end=164 ($/ as bytes)
# Failed at t/op/chop.t line 209
# got '£'
# expected '£¤'
ok 69 - start=163 end=1296
not ok 70 - start=163 end=1296 (end as bytes)
# Failed at t/op/chop.t line 203
# got '£'
# expected '£Ô
not ok 71 - start=163 end=1296 ($/ as bytes)
# Failed at t/op/chop.t line 209
# got '£'
Wide character in print at ./test.pl line 38.
# expected '£Ô
ok 72 - start=164 end=78
Malformed UTF-8 character (unexpected end of string) at ./test.pl line 94.
Malformed UTF-8 character (unexpected end of string) at ./test.pl line 95.
not ok 73 - start=164 end=163
# Failed at t/op/chop.t line 193
Wide character in print at ./test.pl line 38.
# got '¤Â'
# expected '¤'
Malformed UTF-8 character (unexpected end of string) at ./test.pl line 94.
Malformed UTF-8 character (unexpected end of string) at ./test.pl line 95.
not ok 74 - start=164 end=163 (end as bytes)
# Failed at t/op/chop.t line 203
Wide character in print at ./test.pl line 38.
# got '¤ÃÂ'
# expected '¤Â£'
not ok 75 - start=164 end=163 ($/ as bytes)
# Failed at t/op/chop.t line 209
# got '¤'
# expected '¤£'
ok 76 - start=164 end=164
not ok 77 - start=164 end=164 (end as bytes)
# Failed at t/op/chop.t line 203
# got '¤Â'
# expected '¤Â¤'
not ok 78 - start=164 end=164 ($/ as bytes)
# Failed at t/op/chop.t line 209
# got '¤'
# expected '¤¤'
ok 79 - start=164 end=1296
ok 80 - start=164 end=1296 (end as bytes)
not ok 81 - start=164 end=1296 ($/ as bytes)
# Failed at t/op/chop.t line 209
# got '¤'
Wide character in print at ./test.pl line 38.
# expected '¤Ô
ok 82 - start=1296 end=78
Malformed UTF-8 character (unexpected end of string) at ./test.pl line 94.
Malformed UTF-8 character (unexpected end of string) at ./test.pl line 95.
not ok 83 - start=1296 end=163
# Failed at t/op/chop.t line 193
Wide character in print at ./test.pl line 38.
# got 'Ô
Wide character in print at ./test.pl line 38.
# expected 'Ô
Malformed UTF-8 character (unexpected end of string) at ./test.pl line 94.
Malformed UTF-8 character (unexpected end of string) at ./test.pl line 95.
not ok 84 - start=1296 end=163 (end as bytes)
# Failed at t/op/chop.t line 203
Wide character in print at ./test.pl line 38.
# got 'Ô
Wide character in print at ./test.pl line 38.
# expected 'Ô
not ok 85 - start=1296 end=163 ($/ as bytes)
# Failed at t/op/chop.t line 209
Wide character in print at ./test.pl line 38.
# got 'Ô
Wide character in print at ./test.pl line 38.
# expected 'Ô
ok 86 - start=1296 end=164
not ok 87 - start=1296 end=164 (end as bytes)
# Failed at t/op/chop.t line 203
Wide character in print at ./test.pl line 38.
# got 'Ô
Wide character in print at ./test.pl line 38.
# expected 'Ô
not ok 88 - start=1296 end=164 ($/ as bytes)
# Failed at t/op/chop.t line 209
Wide character in print at ./test.pl line 38.
# got 'Ô
Wide character in print at ./test.pl line 38.
# expected 'Ô
ok 89 - start=1296 end=1296
ok 90 - start=1296 end=1296 (end as bytes)
not ok 91 - start=1296 end=1296 ($/ as bytes)
# Failed at t/op/chop.t line 209
Wide character in print at ./test.pl line 38.
# got 'Ô
Wide character in print at ./test.pl line 38.
# expected 'Ô
This is not a new utf8 bug.
--- t/op/chop.t.orig Mon Nov 4 06:34:41 2002
+++ t/op/chop.t Mon Jan 12 20:56:02 2004
@@ -6,7 +6,7 @@ BEGIN {
require './test.pl';
}
-plan tests => 51;
+plan tests => 91;
$_ = 'abc';
$c = do foo();
@@ -183,3 +183,29 @@ ok($@ =~ /Can\'t modify.*chop.*in.*assig
eval 'chomp($x, $y) = (1, 2);';
ok($@ =~ /Can\'t modify.*chom?p.*in.*assignment/);
+my @chars = ("N", "\xa3", substr ("\xa4\x{100}", 0, 1), chr 1296);
+foreach my $start (@chars) {
+ foreach my $end (@chars) {
+ local $/ = $end;
+ my $message = "start=" . ord ($start) . " end=" . ord $end;
+ my $string = $start . $end;
+ chomp $string;
+ is ($string, $start, $message);
+
+ my $end_utf8 = $end;
+ utf8::encode ($end_utf8);
+ next if $end_utf8 eq $end;
+
+ # $end ne $end_utf8, so these should not chomp.
+ $string = $start . $end_utf8;
+ my $chomped = $string;
+ chomp $chomped;
+ is ($chomped, $string, "$message (end as bytes)");
+
+ $/ = $end_utf8;
+ $string = $start . $end;
+ $chomped = $string;
+ chomp $chomped;
+ is ($chomped, $string, "$message (\$/ as bytes)");
+ }
+}
[Please do not change anything below this line]
-----------------------------------------------------------------
---
Flags:
category=core
severity=medium
---
Site configuration information for perl v5.8.3:
Configured by nick at Fri Jan 9 10:31:25 GMT 2004.
Summary of my perl5 (revision 5.0 version 8 subversion 3) configuration:
Platform:
osname=linux, osvers=2.4.19-rmk4, archname=armv4l-linux
uname='linux bagpuss.unfortu.net 2.4.19-rmk4 #3 fri oct 25 21:57:55 bst 2002 armv4l unknown '
config_args='-Dusedevel=y -Dcc=ccache gcc-3.0 -Dld=gcc -Ubincompat5005 -Uinstallusrbinperl -Dcf_email=nick@ccl4.org -Dperladmin=nick@ccl4.org -Dinc_version_list= -Dinc_version_list_init=0 -Doptimize=-O1 -Dusethreads=n -Dprefix=/usr/local/perl5.8.3/ -Dinstallman1dir=none -Dinstallman3dir=none -de'
hint=recommended, useposix=true, d_sigaction=define
usethreads=undef use5005threads=undef useithreads=undef usemultiplicity=undef
useperlio=define d_sfio=undef uselargefiles=define usesocks=undef
use64bitint=undef use64bitall=undef uselongdouble=undef
usemymalloc=n, bincompat5005=undef
Compiler:
cc='ccache gcc-3.0', ccflags ='-fno-strict-aliasing -I/usr/local/include -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64',
optimize='-O1',
cppflags='-fno-strict-aliasing -I/usr/local/include'
ccversion='', gccversion='3.0.4', gccosandvers=''
intsize=4, longsize=4, ptrsize=4, doublesize=8, byteorder=1234
d_longlong=define, longlongsize=8, d_longdbl=define, longdblsize=8
ivtype='long', ivsize=4, nvtype='double', nvsize=8, Off_t='off_t', lseeksize=8
alignbytes=4, prototype=define
Linker and Libraries:
ld='gcc', ldflags =' -L/usr/local/lib'
libpth=/usr/local/lib /lib /usr/lib
libs=-lnsl -lgdbm -ldbm -ldb -ldl -lm -lcrypt -lutil -lc
perllibs=-lnsl -ldl -lm -lcrypt -lutil -lc
libc=/lib/libc-2.2.5.so, so=so, useshrplib=false, libperl=libperl.a
gnulibc_version='2.2.5'
Dynamic Linking:
dlsrc=dl_dlopen.xs, dlext=so, d_dlsymun=undef, ccdlflags='-rdynamic'
cccdlflags='-fpic', lddlflags='-shared -L/usr/local/lib'
Locally applied patches:
MAINT22085
---
@INC for perl v5.8.3:
lib
/usr/local/perl5.8.3/lib/5.8.3/armv4l-linux
/usr/local/perl5.8.3/lib/5.8.3
/usr/local/perl5.8.3/lib/site_perl/5.8.3/armv4l-linux
/usr/local/perl5.8.3/lib/site_perl/5.8.3
/usr/local/perl5.8.3/lib/site_perl
.
---
Environment for perl v5.8.3:
HOME=/home/nick
LANG (unset)
LANGUAGE (unset)
LC_CTYPE=en_GB.ISO-8859-1
LD_LIBRARY_PATH (unset)
LOGDIR (unset)
PATH=/home/nick/bin:/usr/local/bin:/usr/bin:/bin:/usr/bin/X11:/usr/games:/sbin:/usr/sbin:/usr/local/sbin
PERL_BADLANG (unset)
SHELL=/bin/bash