Front page | perl.perl5.porters |
Postings from August 2012
[perl #114602] utf8 problems (still, )
Thread Previous
|
Thread Next
From:
Linda Walsh
Date:
August 26, 2012 18:32
Subject:
[perl #114602] utf8 problems (still, )
Message ID:
rt-3.6.HEAD-11172-1346031144-1739.114602-75-0@perl.org
# New Ticket Created by Linda Walsh
# Please include the string: [perl #114602]
# in the subject line of all future correspondence about this issue.
# <URL: https://rt.perl.org:443/rt3/Ticket/Display.html?id=114602 >
This is a bug report for perl from perl-diddler@tlinx.org,
generated with the help of perlbug 1.39 running under perl 5.14.2.
-----------------------------------------------------------------
[Please describe your issue here]
I have PERL5OPT=CSA in my env.
so STDIEO should be utf-8. Right?
I have "use utf8" in my source. So my source is utf8.
I am using a function identifier prefixed with the
function prefix, 'ƒ' (U+192)... but in an earlier bug
I complained that characters in the range should be interpreted
as UTF8 -- because to do otherwise prevents "escaping" the output
to get wide characters..
'ƒ' (U+192) is encoded in UTF as \xc6\x92.
I have a debug routine that prints out the function it was called from.
Instead of "ƒRegister_FStype", I get: "ÆRegister_FStype)" (which I
see in vim displayed as the capital latin AE ligurature, followed by a
hex unprintable for 0x92. In hex (echo'ed to hex dump -- it's:
\xc3 \x86, \xc2 \x92.
It's like it ignored the utf8 flag in my code and read utf8-encoded bytes
in as latin1, then transcoded them to utf8 again on output.
I'd like to vote for "unless you are in "use bytes", values between
128-255 are interpretted as UTF-8 encoded data.
I thought that's what I'd get if I did a utf8 in my code and read the code,
but utf8-compatibility internally broken/inconsisten as evidenced by
code that is tagged as utf8 still gets re-translated when going to a utf-8
output stream.
I hope no one will try to justify why this isn't a bug -- i.e. -- why uTF-8
source is not compatible with UTF-8... as that would just be depressing no matter
what rationalizing.
just got done realizing that vim's not utf-8 compatible in it's RE engine -- so no
wonder it has probs parsing a utf-8 language -- but then had people try to tell
me that it really was utf-8 compat -- even though the REengine is ascii only
didn't work. I have suggested (as well as years ago) that they use the perl RE
engine, as them trying to duplicate all the work that's gone into perl's
unicode seems like a waste, not to mention near impossible to get right.
[Please do not change anything below this line]
-----------------------------------------------------------------
---
Flags:
category=core
severity=medium
---
This perlbug was built using Perl 5.14.2 - Wed Feb 8 15:59:25 UTC 2012
It is being executed now by Perl 5.14.2 - Wed Feb 8 15:55:36 UTC 2012.
Site configuration information for perl 5.14.2:
Configured by abuild at Wed Feb 8 15:55:36 UTC 2012.
Summary of my perl5 (revision 5 version 14 subversion 2) configuration:
Platform:
osname=linux, osvers=3.1.0-1.2-default, archname=x86_64-linux-thread-multi
uname='linux build09 3.1.0-1.2-default #1 smp thu nov 3 14:45:45 utc 2011 (187dde0) x86_64 x86_64 x86_64 gnulinux '
config_args='-ds -e -Dprefix=/usr -Dvendorprefix=/usr -Dinstallusrbinperl -Dusethreads -Di_db -Di_dbm -Di_ndbm -Di_gdbm -Dd_dbm_open -Duseshrplib=true -Doptimize=-fmessage-length=0 -O2 -Wall -D_FORTIFY_SOURCE=2 -fstack-protector -funwind-tables -fasynchronous-unwind-tables -g -Wall -pipe -Accflags=-DPERL_USE_SAFE_PUTENV -Dotherlibdirs=/usr/lib/perl5/site_perl'
hint=recommended, useposix=true, d_sigaction=define
useithreads=define, usemultiplicity=define
useperlio=define, d_sfio=undef, uselargefiles=define, usesocks=undef
use64bitint=define, use64bitall=define, uselongdouble=undef
usemymalloc=n, bincompat5005=undef
Compiler:
cc='cc', ccflags ='-D_REENTRANT -D_GNU_SOURCE -DPERL_USE_SAFE_PUTENV -fno-strict-aliasing -pipe -fstack-protector -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64',
optimize='-fmessage-length=0 -O2 -Wall -D_FORTIFY_SOURCE=2 -fstack-protector -funwind-tables -fasynchronous-unwind-tables -g -Wall -pipe',
cppflags='-D_REENTRANT -D_GNU_SOURCE -DPERL_USE_SAFE_PUTENV -fno-strict-aliasing -pipe -fstack-protector'
ccversion='', gccversion='4.6.2', gccosandvers=''
intsize=4, longsize=8, ptrsize=8, doublesize=8, byteorder=12345678
d_longlong=define, longlongsize=8, d_longdbl=define, longdblsize=16
ivtype='long', ivsize=8, nvtype='double', nvsize=8, Off_t='off_t', lseeksize=8
alignbytes=8, prototype=define
Linker and Libraries:
ld='cc', ldflags =' -L/usr/local/lib64 -fstack-protector'
libpth=/lib64 /usr/lib64 /usr/local/lib64
libs=-lm -ldl -lcrypt -lpthread
perllibs=-lm -ldl -lcrypt -lpthread
libc=/lib64/libc-2.14.1.so, so=so, useshrplib=true, libperl=libperl.so
gnulibc_version='2.14.1'
Dynamic Linking:
dlsrc=dl_dlopen.xs, dlext=so, d_dlsymun=undef, ccdlflags='-Wl,-E -Wl,-rpath,/usr/lib/perl5/5.14.2/x86_64-linux-thread-multi/CORE'
cccdlflags='-fPIC', lddlflags='-shared -L/usr/local/lib64 -fstack-protector'
Locally applied patches:
---
@INC for perl 5.14.2:
/usr/lib/perl5/site_perl/5.14.2/x86_64-linux-thread-multi
/usr/lib/perl5/site_perl/5.14.2
/usr/lib/perl5/vendor_perl/5.14.2/x86_64-linux-thread-multi
/usr/lib/perl5/vendor_perl/5.14.2
/usr/lib/perl5/5.14.2/x86_64-linux-thread-multi
/usr/lib/perl5/5.14.2
/usr/lib/perl5/site_perl/5.14.2/x86_64-linux-thread-multi
/usr/lib/perl5/site_perl/5.14.2
/usr/lib/perl5/site_perl
.
---
Environment for perl 5.14.2:
HOME=/home/law
LANG=en_US.UTF-8
LANGUAGE (unset)
LC_COLLATE=C
LC_CTYPE=en_US.UTF-8
LD_LIBRARY_PATH (unset)
LOGDIR (unset)
PATH=.:/sbin:/usr/local/sbin:/home/law/bin:/usr/local/bin:/usr/bin:/bin:/usr/bin/X11:/usr/X11R6/bin:/usr/games:/opt/kde3/bin:/usr/lib/mit/bin:/usr/lib/mit/sbin:/usr/lib/qt3/bin:/usr/sbin:/etc/local/func_lib:/home/law/lib:/home/law/bin/lib
PERL5OPT=-CSA
PERL_BADLANG (unset)
SHELL=/bin/bash
Thread Previous
|
Thread Next