develooper Front page | perl.perl5.porters | Postings from August 2012

[perl #114602] utf8 problems (still, )

Thread Previous | Thread Next
Linda Walsh
August 26, 2012 18:32
[perl #114602] utf8 problems (still, )
Message ID:
# New Ticket Created by  Linda Walsh 
# Please include the string:  [perl #114602]
# in the subject line of all future correspondence about this issue. 
# <URL: >

This is a bug report for perl from,
generated with the help of perlbug 1.39 running under perl 5.14.2.

[Please describe your issue here]
I have PERL5OPT=CSA in my env.

so STDIEO should be utf-8.  Right?

I have "use utf8" in my source.  So my source is utf8.

I am using a function identifier prefixed with the 
function prefix, 'ƒ' (U+192)... but in an earlier bug
I complained that characters in the range should be interpreted
as UTF8 -- because to do otherwise prevents "escaping" the output
to get wide characters..  

'ƒ' (U+192) is encoded in UTF as \xc6\x92.

I have a debug routine that prints out the function it was called from.

Instead of "ƒRegister_FStype", I get: "ƒRegister_FStype)" (which I 
see in vim displayed as the capital latin AE ligurature, followed by a
hex unprintable for 0x92.  In hex (echo'ed to hex dump -- it's:
\xc3 \x86, \xc2 \x92.

It's like it ignored the utf8 flag in my code and read utf8-encoded bytes
in as latin1, then transcoded them to utf8 again on output.

I'd like to vote for "unless you are in "use bytes", values between 
128-255 are interpretted as UTF-8 encoded data.  

I thought that's what I'd get if I did a utf8 in my code and read the code,
but utf8-compatibility internally broken/inconsisten as evidenced by
code that is tagged as utf8 still gets re-translated when going to a utf-8
output stream.

I hope no one will try to justify why this isn't a bug -- i.e. -- why uTF-8
source is not compatible with UTF-8... as that would just be depressing no matter
what rationalizing.  

just got done realizing that vim's not utf-8 compatible in it's RE engine -- so no
wonder it has probs parsing a utf-8 language -- but then had people try to tell
me that it really was utf-8 compat -- even though the REengine is ascii only
didn't work.  I have suggested (as well as years ago) that they use the perl RE
engine, as them trying to duplicate all the work that's gone into perl's 
unicode seems like a waste, not to mention near impossible to get right.

[Please do not change anything below this line]
This perlbug was built using Perl 5.14.2 - Wed Feb  8 15:59:25 UTC 2012
It is being executed now by  Perl 5.14.2 - Wed Feb  8 15:55:36 UTC 2012.

Site configuration information for perl 5.14.2:

Configured by abuild at Wed Feb  8 15:55:36 UTC 2012.

Summary of my perl5 (revision 5 version 14 subversion 2) configuration:
    osname=linux, osvers=3.1.0-1.2-default, archname=x86_64-linux-thread-multi
    uname='linux build09 3.1.0-1.2-default #1 smp thu nov 3 14:45:45 utc 2011 (187dde0) x86_64 x86_64 x86_64 gnulinux '
    config_args='-ds -e -Dprefix=/usr -Dvendorprefix=/usr -Dinstallusrbinperl -Dusethreads -Di_db -Di_dbm -Di_ndbm -Di_gdbm -Dd_dbm_open -Duseshrplib=true -Doptimize=-fmessage-length=0 -O2 -Wall -D_FORTIFY_SOURCE=2 -fstack-protector -funwind-tables -fasynchronous-unwind-tables -g -Wall -pipe -Accflags=-DPERL_USE_SAFE_PUTENV -Dotherlibdirs=/usr/lib/perl5/site_perl'
    hint=recommended, useposix=true, d_sigaction=define
    useithreads=define, usemultiplicity=define
    useperlio=define, d_sfio=undef, uselargefiles=define, usesocks=undef
    use64bitint=define, use64bitall=define, uselongdouble=undef
    usemymalloc=n, bincompat5005=undef
    cc='cc', ccflags ='-D_REENTRANT -D_GNU_SOURCE -DPERL_USE_SAFE_PUTENV -fno-strict-aliasing -pipe -fstack-protector -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64',
    optimize='-fmessage-length=0 -O2 -Wall -D_FORTIFY_SOURCE=2 -fstack-protector -funwind-tables -fasynchronous-unwind-tables -g -Wall -pipe',
    cppflags='-D_REENTRANT -D_GNU_SOURCE -DPERL_USE_SAFE_PUTENV -fno-strict-aliasing -pipe -fstack-protector'
    ccversion='', gccversion='4.6.2', gccosandvers=''
    intsize=4, longsize=8, ptrsize=8, doublesize=8, byteorder=12345678
    d_longlong=define, longlongsize=8, d_longdbl=define, longdblsize=16
    ivtype='long', ivsize=8, nvtype='double', nvsize=8, Off_t='off_t', lseeksize=8
    alignbytes=8, prototype=define
  Linker and Libraries:
    ld='cc', ldflags =' -L/usr/local/lib64 -fstack-protector'
    libpth=/lib64 /usr/lib64 /usr/local/lib64
    libs=-lm -ldl -lcrypt -lpthread
    perllibs=-lm -ldl -lcrypt -lpthread
    libc=/lib64/, so=so, useshrplib=true,
  Dynamic Linking:
    dlsrc=dl_dlopen.xs, dlext=so, d_dlsymun=undef, ccdlflags='-Wl,-E -Wl,-rpath,/usr/lib/perl5/5.14.2/x86_64-linux-thread-multi/CORE'
    cccdlflags='-fPIC', lddlflags='-shared -L/usr/local/lib64 -fstack-protector'

Locally applied patches:

@INC for perl 5.14.2:

Environment for perl 5.14.2:
    LANGUAGE (unset)
    LD_LIBRARY_PATH (unset)
    LOGDIR (unset)
    PERL_BADLANG (unset)

Thread Previous | Thread Next Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at | Group listing | About