develooper Front page | perl.perl5.porters | Postings from August 2012

[perl #114602] utf8 problems (still, )

Thread Previous | Thread Next
From:
Linda Walsh
Date:
August 26, 2012 18:32
Subject:
[perl #114602] utf8 problems (still, )
Message ID:
rt-3.6.HEAD-11172-1346031144-1739.114602-75-0@perl.org
# New Ticket Created by  Linda Walsh 
# Please include the string:  [perl #114602]
# in the subject line of all future correspondence about this issue. 
# <URL: https://rt.perl.org:443/rt3/Ticket/Display.html?id=114602 >



This is a bug report for perl from perl-diddler@tlinx.org,
generated with the help of perlbug 1.39 running under perl 5.14.2.


-----------------------------------------------------------------
[Please describe your issue here]
I have PERL5OPT=CSA in my env.

so STDIEO should be utf-8.  Right?

I have "use utf8" in my source.  So my source is utf8.

I am using a function identifier prefixed with the 
function prefix, 'ƒ' (U+192)... but in an earlier bug
I complained that characters in the range should be interpreted
as UTF8 -- because to do otherwise prevents "escaping" the output
to get wide characters..  

'ƒ' (U+192) is encoded in UTF as \xc6\x92.

I have a debug routine that prints out the function it was called from.

Instead of "ƒRegister_FStype", I get: "ƒRegister_FStype)" (which I 
see in vim displayed as the capital latin AE ligurature, followed by a
hex unprintable for 0x92.  In hex (echo'ed to hex dump -- it's:
\xc3 \x86, \xc2 \x92.

It's like it ignored the utf8 flag in my code and read utf8-encoded bytes
in as latin1, then transcoded them to utf8 again on output.

I'd like to vote for "unless you are in "use bytes", values between 
128-255 are interpretted as UTF-8 encoded data.  

I thought that's what I'd get if I did a utf8 in my code and read the code,
but utf8-compatibility internally broken/inconsisten as evidenced by
code that is tagged as utf8 still gets re-translated when going to a utf-8
output stream.

I hope no one will try to justify why this isn't a bug -- i.e. -- why uTF-8
source is not compatible with UTF-8... as that would just be depressing no matter
what rationalizing.  

just got done realizing that vim's not utf-8 compatible in it's RE engine -- so no
wonder it has probs parsing a utf-8 language -- but then had people try to tell
me that it really was utf-8 compat -- even though the REengine is ascii only
didn't work.  I have suggested (as well as years ago) that they use the perl RE
engine, as them trying to duplicate all the work that's gone into perl's 
unicode seems like a waste, not to mention near impossible to get right.





[Please do not change anything below this line]
-----------------------------------------------------------------
---
Flags:
    category=core
    severity=medium
---
This perlbug was built using Perl 5.14.2 - Wed Feb  8 15:59:25 UTC 2012
It is being executed now by  Perl 5.14.2 - Wed Feb  8 15:55:36 UTC 2012.

Site configuration information for perl 5.14.2:

Configured by abuild at Wed Feb  8 15:55:36 UTC 2012.

Summary of my perl5 (revision 5 version 14 subversion 2) configuration:
   
  Platform:
    osname=linux, osvers=3.1.0-1.2-default, archname=x86_64-linux-thread-multi
    uname='linux build09 3.1.0-1.2-default #1 smp thu nov 3 14:45:45 utc 2011 (187dde0) x86_64 x86_64 x86_64 gnulinux '
    config_args='-ds -e -Dprefix=/usr -Dvendorprefix=/usr -Dinstallusrbinperl -Dusethreads -Di_db -Di_dbm -Di_ndbm -Di_gdbm -Dd_dbm_open -Duseshrplib=true -Doptimize=-fmessage-length=0 -O2 -Wall -D_FORTIFY_SOURCE=2 -fstack-protector -funwind-tables -fasynchronous-unwind-tables -g -Wall -pipe -Accflags=-DPERL_USE_SAFE_PUTENV -Dotherlibdirs=/usr/lib/perl5/site_perl'
    hint=recommended, useposix=true, d_sigaction=define
    useithreads=define, usemultiplicity=define
    useperlio=define, d_sfio=undef, uselargefiles=define, usesocks=undef
    use64bitint=define, use64bitall=define, uselongdouble=undef
    usemymalloc=n, bincompat5005=undef
  Compiler:
    cc='cc', ccflags ='-D_REENTRANT -D_GNU_SOURCE -DPERL_USE_SAFE_PUTENV -fno-strict-aliasing -pipe -fstack-protector -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64',
    optimize='-fmessage-length=0 -O2 -Wall -D_FORTIFY_SOURCE=2 -fstack-protector -funwind-tables -fasynchronous-unwind-tables -g -Wall -pipe',
    cppflags='-D_REENTRANT -D_GNU_SOURCE -DPERL_USE_SAFE_PUTENV -fno-strict-aliasing -pipe -fstack-protector'
    ccversion='', gccversion='4.6.2', gccosandvers=''
    intsize=4, longsize=8, ptrsize=8, doublesize=8, byteorder=12345678
    d_longlong=define, longlongsize=8, d_longdbl=define, longdblsize=16
    ivtype='long', ivsize=8, nvtype='double', nvsize=8, Off_t='off_t', lseeksize=8
    alignbytes=8, prototype=define
  Linker and Libraries:
    ld='cc', ldflags =' -L/usr/local/lib64 -fstack-protector'
    libpth=/lib64 /usr/lib64 /usr/local/lib64
    libs=-lm -ldl -lcrypt -lpthread
    perllibs=-lm -ldl -lcrypt -lpthread
    libc=/lib64/libc-2.14.1.so, so=so, useshrplib=true, libperl=libperl.so
    gnulibc_version='2.14.1'
  Dynamic Linking:
    dlsrc=dl_dlopen.xs, dlext=so, d_dlsymun=undef, ccdlflags='-Wl,-E -Wl,-rpath,/usr/lib/perl5/5.14.2/x86_64-linux-thread-multi/CORE'
    cccdlflags='-fPIC', lddlflags='-shared -L/usr/local/lib64 -fstack-protector'

Locally applied patches:
    

---
@INC for perl 5.14.2:
    /usr/lib/perl5/site_perl/5.14.2/x86_64-linux-thread-multi
    /usr/lib/perl5/site_perl/5.14.2
    /usr/lib/perl5/vendor_perl/5.14.2/x86_64-linux-thread-multi
    /usr/lib/perl5/vendor_perl/5.14.2
    /usr/lib/perl5/5.14.2/x86_64-linux-thread-multi
    /usr/lib/perl5/5.14.2
    /usr/lib/perl5/site_perl/5.14.2/x86_64-linux-thread-multi
    /usr/lib/perl5/site_perl/5.14.2
    /usr/lib/perl5/site_perl
    .

---
Environment for perl 5.14.2:
    HOME=/home/law
    LANG=en_US.UTF-8
    LANGUAGE (unset)
    LC_COLLATE=C
    LC_CTYPE=en_US.UTF-8
    LD_LIBRARY_PATH (unset)
    LOGDIR (unset)
    PATH=.:/sbin:/usr/local/sbin:/home/law/bin:/usr/local/bin:/usr/bin:/bin:/usr/bin/X11:/usr/X11R6/bin:/usr/games:/opt/kde3/bin:/usr/lib/mit/bin:/usr/lib/mit/sbin:/usr/lib/qt3/bin:/usr/sbin:/etc/local/func_lib:/home/law/lib:/home/law/bin/lib
    PERL5OPT=-CSA
    PERL_BADLANG (unset)
    SHELL=/bin/bash


Thread Previous | Thread Next


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About