develooper Front page | perl.perl5.porters | Postings from September 2006

[perl #40418] Unicode Command Line Arguments

From:
Dale Gerdemann
Date:
September 27, 2006 03:31
Subject:
[perl #40418] Unicode Command Line Arguments
Message ID:
rt-3.5.HEAD-31258-1159352492-417.40418-75-0@perl.org
# New Ticket Created by  Dale Gerdemann 
# Please include the string:  [perl #40418]
# in the subject line of all future correspondence about this issue. 
# <URL: http://rt.perl.org/rt3/Ticket/Display.html?id=40418 >



This is a bug report for perl from dg@tomita.sfs.uni-tuebingen.de,
generated with the help of perlbug 1.35 running under perl v5.8.8.


-----------------------------------------------------------------
[Please enter your report here]

I want to read in a command line argument in Unicode. Following
perldoc.perl.org/perlunicode.html and perldoc.perl.org/encoding.html,
I did the following:

use utf8;
use encoding 'utf8';

my $word = $ARGV[0];
print "$word\n";

The $word is printed okay with Unicode characters, and satisfies
Unicode::CheckUTF8::is_utf8. But with regular expressions, dot matches
a byte rather than a character. After more than an hour of fiddling, I
finally figured out that I needed to do:

utf8::upgrade($word);

Surely, this shouldn't be necessary. I think it's a bug.


[Please do not change anything below this line]
-----------------------------------------------------------------
---
Flags:
    category=core
    severity=medium
---
Site configuration information for perl v5.8.8:

Configured by dg at Wed Sep  6 16:13:32 CEST 2006.

Summary of my perl5 (revision 5 version 8 subversion 8) configuration:
  Platform:
    osname=linux, osvers=2.6.8-2-686-smp, archname=i686-linux
    uname='linux tomita 2.6.8-2-686-smp #1 smp tue aug 16 12:08:30 utc 2005 i686 gnulinux '
    config_args=''
    hint=recommended, useposix=true, d_sigaction=define
    usethreads=undef use5005threads=undef useithreads=undef usemultiplicity=undef
    useperlio=define d_sfio=undef uselargefiles=define usesocks=undef
    use64bitint=undef use64bitall=undef uselongdouble=undef
    usemymalloc=n, bincompat5005=undef
  Compiler:
    cc='cc', ccflags ='-fno-strict-aliasing -pipe -I/usr/local/include -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64',
    optimize='-O2',
    cppflags='-fno-strict-aliasing -pipe -I/usr/local/include'
    ccversion='', gccversion='3.3.5 (Debian 1:3.3.5-13)', gccosandvers=''
    intsize=4, longsize=4, ptrsize=4, doublesize=8, byteorder=1234
    d_longlong=define, longlongsize=8, d_longdbl=define, longdblsize=12
    ivtype='long', ivsize=4, nvtype='double', nvsize=8, Off_t='off_t', lseeksize=8
    alignbytes=4, prototype=define
  Linker and Libraries:
    ld='cc', ldflags =' -L/usr/local/lib'
    libpth=/usr/local/lib /lib /usr/lib
    libs=-lnsl -ldl -lm -lcrypt -lutil -lc
    perllibs=-lnsl -ldl -lm -lcrypt -lutil -lc
    libc=/lib/libc-2.3.2.so, so=so, useshrplib=false, libperl=libperl.a
    gnulibc_version='2.3.2'
  Dynamic Linking:
    dlsrc=dl_dlopen.xs, dlext=so, d_dlsymun=undef, ccdlflags='-Wl,-E'
    cccdlflags='-fpic', lddlflags='-shared -L/usr/local/lib'

Locally applied patches:
    

---
@INC for perl v5.8.8:
    /afs/sfs/lehre/dg/myperl/lib/i686-linux
    /afs/sfs/lehre/dg/myperl/lib
    /afs/sfs/lehre/dg/perl-5.8.8/lib/5.8.8/i686-linux
    /afs/sfs/lehre/dg/perl-5.8.8/lib/5.8.8
    /afs/sfs/lehre/dg/perl-5.8.8/lib/site_perl/5.8.8/i686-linux
    /afs/sfs/lehre/dg/perl-5.8.8/lib/site_perl/5.8.8
    /afs/sfs/lehre/dg/perl-5.8.8/lib/site_perl
    .

---
Environment for perl v5.8.8:
    HOME=/home/gerdemann
    LANG=en_US.UTF-8
    LANGUAGE (unset)
    LD_LIBRARY_PATH (unset)
    LOGDIR (unset)
    PATH=/afs/sfs/lehre/dg/perl-5.8.8/bin:/afs/sfs/lehre/dg/perl-5.8.8/scripts:/home/gerdemann/source/MoMo:/home/milca/a4/bin:/afs/sfs/lehre/dg/fsm-4.0/bin:/usr/ucb:/usr/bin:/bin:/afs/sfs/i386_linux24/sicstus312/bin/sicstus:/afs/sfs/i386_linux24/OOo110/OpenOffice.org1.1.0/program:/usr/local/bin:/usr/local/tex/bin://usr/X11R6/bin:/home/sfb/cl_systems/daVinci_V2.0:/home/gerdemann/Office51/bin:/afs/sfs/lehre/dg/bin:/home/gerdemann/scripts:/home/gerdemann/mg/bin:/home/gerdemann/bin:/afs/sfs/lehre/dg/xerox:.
    PERL5LIB=/afs/sfs/lehre/dg/myperl/lib
    PERL_BADLANG (unset)
    SHELL=/bin/bash




nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About