develooper Front page | perl.perl5.porters | Postings from March 2008

[perl #51918] UTF-8 (strict) Encode and Decode detect only 1/66 non-characters

Thread Previous
From:
Chris Hall
Date:
March 20, 2008 06:49
Subject:
[perl #51918] UTF-8 (strict) Encode and Decode detect only 1/66 non-characters
Message ID:
rt-3.6.HEAD-25460-1206006102-479.51918-75-0@perl.org
# New Ticket Created by  Chris Hall 
# Please include the string:  [perl #51918]
# in the subject line of all future correspondence about this issue. 
# <URL: http://rt.perl.org/rt3/Ticket/Display.html?id=51918 >



This is a bug report for perl from chris.hall@highwayman.com,
generated with the help of perlbug 1.35 running under perl v5.8.8.


-----------------------------------------------------------------
[Please enter your report here]


Encode::encode('UTF-8', $foo) and Encode::decode('UTF-8', $bar) detect the
Unicode 'non-character' U+FFFF and treat it as an error.

There are 65 other Unicode non-characters:

   U+FFFE
   U+01FFFE, U+02FFFE, U+03FFFE, ... U+10FFFE
   U+01FFFF, U+02FFFF, U+03FFFF, ... U+10FFFF
   U+FDD0..U+FDEF

which one would expect to be treated the same as U+FFFF.

They aren't.  They are accepted as normal characters.

This appears to be a bug.

It's the same under Perl 5.10.0.

(Alternatively, one could argue that detecting the 0xFFFF non-character 
is less than useful -- this is a perfectly good character, and has uses 
internally.  Perhaps Encode should have an option to allow 
non-characters ?  Whichever way you cut it, all non-characters should be 
handled the same way.)

[Please do not change anything below this line]
-----------------------------------------------------------------
---
Flags:
     category=library
     severity=low
---
This perlbug was built using Perl v5.8.8 in the Red Hat build system.
It is being executed now by Perl v5.8.8 - Mon Nov 26 14:25:50 EST 2007.

Site configuration information for perl v5.8.8:

Configured by Red Hat, Inc. at Mon Nov 26 14:25:50 EST 2007.

Summary of my perl5 (revision 5 version 8 subversion 8) configuration:
   Platform:
     osname=linux, osvers=2.6.20-1.3001.fc6xen, archname=x86_64-linux-thread-multi
     uname='linux xenbuilder4.fedora.phx.redhat.com 2.6.20-1.3001.fc6xen #1 smp thu aug 9 16:18:42 edt 2007 x86_64 x86_64 x86_64 gnulinux '
     config_args='-des -Doptimize=-O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector --param=ssp-buffer-size=4 -m64 -
mtune=generic -Dversion=5.8.8 -Dmyhostname=localhost -Dperladmin=root@localhost -Dcc=gcc -Dcf_by=Red Hat, Inc. -Dinstallprefix=/usr -
Dprefix=/usr -Dlibpth=/usr/local/lib64 /lib64 /usr/lib64 -Dprivlib=/usr/lib/perl5/5.8.8 -Dsitelib=/usr/lib/perl5/site_perl/5.8.8 -Dvendorlib=/us
r/lib/perl5/vendor_perl/5.8.8 -Darchlib=/usr/lib64/perl5/5.8.8/x86_64-linux-thread-multi -Dsitearch=/usr/lib64/perl5/site_perl/5.8.8/x86_64-linu
x-thread-multi -Dvendorarch=/usr/lib64/perl5/vendor_perl/5.8.8/x86_64-linux-thread-multi -Darchname=x86_64-linux -Dvendorprefix=/usr -
Dsiteprefix=/usr -Duseshrplib -Dusethreads -Duseithreads -Duselargefiles -Dd_dosuid -Dd_semctl_semun -Di_db -Ui_ndbm -Di_gdbm -Di_shadow -
Di_syslog -Dman3ext=3pm -Duseperlio -Dinstallusrbinperl=n -Ubincompat5005 -Uversiononly -Dpager=/usr/bin/less -isr -Dd_gethostent_r_proto -
Ud_endhostent_r_proto -Ud_sethostent_r_proto -Ud_endprotoent_r_proto -Ud_setprotoent_r_proto -Ud_endservent_r_proto -Ud_setservent_r_proto -
Dinc_version_list=5.8.7 5.8.6 5.8.5 -Dscriptdir=/usr/bin'
     hint=recommended, useposix=true, d_sigaction=define
     usethreads=define use5005threads=undef useithreads=define usemultiplicity=define
     useperlio=define d_sfio=undef uselargefiles=define usesocks=undef
     use64bitint=define use64bitall=define uselongdouble=undef
     usemymalloc=n, bincompat5005=undef
   Compiler:
     cc='gcc', ccflags ='-D_REENTRANT -D_GNU_SOURCE -fno-strict-aliasing -pipe -Wdeclaration-after-statement -I/usr/local/include -
D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64 -I/usr/include/gdbm',
     optimize='-O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector --param=ssp-buffer-size=4 -m64 -mtune=generic',
     cppflags='-D_REENTRANT -D_GNU_SOURCE -fno-strict-aliasing -pipe -Wdeclaration-after-statement -I/usr/local/include -I/usr/include/gdbm'
     ccversion='', gccversion='4.1.2 20070925 (Red Hat 4.1.2-33)', gccosandvers=''
     intsize=4, longsize=8, ptrsize=8, doublesize=8, byteorder=12345678
     d_longlong=define, longlongsize=8, d_longdbl=define, longdblsize=16
     ivtype='long', ivsize=8, nvtype='double', nvsize=8, Off_t='off_t', lseeksize=8
     alignbytes=8, prototype=define
   Linker and Libraries:
     ld='gcc', ldflags =''
     libpth=/usr/local/lib64 /lib64 /usr/lib64
     libs=-lresolv -lnsl -lgdbm -ldb -ldl -lm -lcrypt -lutil -lpthread -lc
     perllibs=-lresolv -lnsl -ldl -lm -lcrypt -lutil -lpthread -lc
     libc=, so=so, useshrplib=true, libperl=libperl.so
     gnulibc_version='2.7'
   Dynamic Linking:
     dlsrc=dl_dlopen.xs, dlext=so, d_dlsymun=undef, ccdlflags='-Wl,-E -Wl,-rpath,/usr/lib64/perl5/5.8.8/x86_64-linux-thread-multi/CORE'
     cccdlflags='-fPIC', lddlflags='-shared -O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector --param=ssp-buffer-size=4 -
m64 -mtune=generic'

Locally applied patches:


---
@INC for perl v5.8.8:
     /usr/lib64/perl5/site_perl/5.8.8/x86_64-linux-thread-multi
     /usr/lib64/perl5/site_perl/5.8.7/x86_64-linux-thread-multi
     /usr/lib64/perl5/site_perl/5.8.6/x86_64-linux-thread-multi
     /usr/lib64/perl5/site_perl/5.8.5/x86_64-linux-thread-multi
     /usr/lib/perl5/site_perl/5.8.8
     /usr/lib/perl5/site_perl/5.8.7
     /usr/lib/perl5/site_perl/5.8.6
     /usr/lib/perl5/site_perl/5.8.5
     /usr/lib/perl5/site_perl
     /usr/lib64/perl5/vendor_perl/5.8.8/x86_64-linux-thread-multi
     /usr/lib64/perl5/vendor_perl/5.8.7/x86_64-linux-thread-multi
     /usr/lib64/perl5/vendor_perl/5.8.6/x86_64-linux-thread-multi
     /usr/lib64/perl5/vendor_perl/5.8.5/x86_64-linux-thread-multi
     /usr/lib/perl5/vendor_perl/5.8.8
     /usr/lib/perl5/vendor_perl/5.8.7
     /usr/lib/perl5/vendor_perl/5.8.6
     /usr/lib/perl5/vendor_perl/5.8.5
     /usr/lib/perl5/vendor_perl
     /usr/lib64/perl5/5.8.8/x86_64-linux-thread-multi
     /usr/lib/perl5/5.8.8
     .

---
Environment for perl v5.8.8:
     HOME=/home/GMCH
     LANG=en_GB.UTF-8
     LANGUAGE (unset)
     LD_LIBRARY_PATH (unset)
     LOGDIR (unset)
     PATH=/usr/kerberos/sbin:/usr/kerberos/bin:/usr/local/sbin:/usr/local/bin:/sbin:/bin:/usr/sbin:/usr/bin:/root/bin
     PERL_BADLANG (unset)
     SHELL=/bin/bash

-- 
Chris Hall               highwayman.com

Thread Previous


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About