develooper Front page | perl.perl5.porters | Postings from July 2008

[perl #57040] pos() function doesn't handle unicode well

Thread Next
From:
Marcela Maslanova
Date:
July 17, 2008 06:49
Subject:
[perl #57040] pos() function doesn't handle unicode well
Message ID:
rt-3.6.HEAD-8814-1216291352-431.57040-75-0@perl.org
# New Ticket Created by  Marcela Maslanova 
# Please include the string:  [perl #57040]
# in the subject line of all future correspondence about this issue. 
# <URL: http://rt.perl.org/rt3/Ticket/Display.html?id=57040 >


generated with the help of perlbug 1.36 running under perl 5.10.0.


-----------------------------------------------------------------
[Please enter your report here]

Function pos() doesn't return correct values for unicode strings.
For example:
perl -e '$string = "ěščřžýáíéň";while ($string =~ /š/gi) {printf "Found 
š at %d\n", pos($string)-1;}';

In this case it could be solved 'use utf8'. But the problem is still in 
other functions, which are
using pos(). For example expand from Text::Tabs:
perl -e'chop($ustr="\taa\t..\t\x{100}");for my 
$s("\t\x{010a}\x{010a}\t..\t","\taa\t..\t",$ustr){ 
$_=$s;s/\t/print(pos(),$");"\t"/ge; print "\n"}'
Here should be all numbers the same.

[Please do not change anything below this line]
-----------------------------------------------------------------
---
Flags:
    category=core
    severity=medium
---
This perlbug was built using Perl 5.10.0 in the Fedora build system.
It is being executed now by Perl 5.10.0 - Wed Jul  2 05:13:09 EDT 2008.

Site configuration information for perl 5.10.0:

Configured by Red Hat, Inc. at Wed Jul  2 05:13:09 EDT 2008.

Summary of my perl5 (revision 5 version 10 subversion 0) configuration:
  Platform:
    osname=linux, osvers=2.6.18-92.1.6.el5, archname=i386-linux-thread-multi
    uname='linux x86-6 2.6.18-92.1.6.el5 #1 smp fri jun 20 02:36:06 edt 
2008 i686 i686 i386 gnulinux '
    config_args='-des -Doptimize=-O2 -g -pipe -Wall 
-Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector 
--param=ssp-buffer-size=4 -m32 -march=i386 -mtune=generic 
-fasynchronous-unwind-tables -DPERL_USE_SAFE_PUTENV -Dversion=5.10.0 
-Dmyhostname=localhost -Dperladmin=root@localhost -Dcc=gcc -Dcf_by=Red 
Hat, Inc. -Dinstallprefix=/usr -Dprefix=/usr 
-Dprivlib=/usr/lib/perl5/5.10.0 
-Dsitelib=/usr/local/lib/perl5/site_perl/5.10.0 
-Dvendorlib=/usr/lib/perl5/vendor_perl/5.10.0 
-Darchlib=/usr/lib/perl5/5.10.0/i386-linux-thread-multi 
-Dsitearch=/usr/local/lib/perl5/site_perl/5.10.0/i386-linux-thread-multi 
-Dvendorarch=/usr/lib/perl5/vendor_perl/5.10.0/i386-linux-thread-multi 
-Darchname=i386-linux-thread-multi 
-Dotherlibdirs=/usr/lib/perl5/site_perl/5.10.0 -Dvendorprefix=/usr 
-Dsiteprefix=/usr/local -Duseshrplib -Dusethreads -Duseithreads 
-Duselargefiles -Dd_dosuid -Dd_semctl_semun -Di_db -Ui_ndbm -Di_gdbm 
-Di_shadow -Di_syslog -Dman3ext=3pm -Duseperlio -Dinstallusrbinperl=n 
-Ubincompat5005 -Uversiononly -Dpager=/usr/bin/less -isr 
-Dd_gethostent_r_proto -Ud_endhostent_r_proto -Ud_sethostent_r_proto 
-Ud_endprotoent_r_proto -Ud_setprotoent_r_proto -Ud_endservent_r_proto 
-Ud_setservent_r_proto -Dscriptdir=/usr/bin'
    hint=recommended, useposix=true, d_sigaction=define
    useithreads=define, usemultiplicity=define
    useperlio=define, d_sfio=undef, uselargefiles=define, usesocks=undef
    use64bitint=undef, use64bitall=undef, uselongdouble=undef
    usemymalloc=n, bincompat5005=undef
  Compiler:
    cc='gcc', ccflags ='-D_REENTRANT -D_GNU_SOURCE -DDEBUGGING 
-fno-strict-aliasing -pipe -I/usr/local/include -D_LARGEFILE_SOURCE 
-D_FILE_OFFSET_BITS=64 -I/usr/include/gdbm',
    optimize='-O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions 
-fstack-protector --param=ssp-buffer-size=4 -m32 -march=i386 
-mtune=generic -fasynchronous-unwind-tables -DPERL_USE_SAFE_PUTENV',
    cppflags='-D_REENTRANT -D_GNU_SOURCE -DDEBUGGING 
-fno-strict-aliasing -pipe -I/usr/local/include -I/usr/include/gdbm'
    ccversion='', gccversion='4.3.0 20080428 (Red Hat 4.3.0-8)', 
gccosandvers=''
    intsize=4, longsize=4, ptrsize=4, doublesize=8, byteorder=1234
    d_longlong=define, longlongsize=8, d_longdbl=define, longdblsize=12
    ivtype='long', ivsize=4, nvtype='double', nvsize=8, Off_t='off_t', 
lseeksize=8
    alignbytes=4, prototype=define
  Linker and Libraries:
    ld='gcc', ldflags =' -L/usr/local/lib'
    libpth=/usr/local/lib /lib /usr/lib
    libs=-lresolv -lnsl -lgdbm -ldb -ldl -lm -lcrypt -lutil -lpthread -lc
    perllibs=-lresolv -lnsl -ldl -lm -lcrypt -lutil -lpthread -lc
    libc=/lib/libc-2.8.so, so=so, useshrplib=true, libperl=libperl.so
    gnulibc_version='2.8'
  Dynamic Linking:
    dlsrc=dl_dlopen.xs, dlext=so, d_dlsymun=undef, ccdlflags='-Wl,-E 
-Wl,-rpath,/usr/lib/perl5/5.10.0/i386-linux-thread-multi/CORE'
    cccdlflags='-fPIC', lddlflags='-shared -O2 -g -pipe -Wall 
-Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector 
--param=ssp-buffer-size=4 -m32 -march=i386 -mtune=generic 
-fasynchronous-unwind-tables -DPERL_USE_SAFE_PUTENV -L/usr/local/lib'

Locally applied patches:
   

---
@INC for perl 5.10.0:
    /usr/lib/perl5/5.10.0/i386-linux-thread-multi
    /usr/lib/perl5/5.10.0
    /usr/local/lib/perl5/site_perl/5.10.0/i386-linux-thread-multi
    /usr/local/lib/perl5/site_perl/5.10.0
    /usr/lib/perl5/vendor_perl/5.10.0/i386-linux-thread-multi
    /usr/lib/perl5/vendor_perl/5.10.0
    /usr/lib/perl5/vendor_perl
    /usr/lib/perl5/site_perl/5.10.0/i386-linux-thread-multi
    /usr/lib/perl5/site_perl/5.10.0
    .

---
Environment for perl 5.10.0:
    HOME=/home/marca
    LANG=en_US.UTF-8
    LANGUAGE=
    LD_LIBRARY_PATH (unset)
    LOGDIR (unset)
    
PATH=/usr/lib/qt-3.3/bin:/usr/kerberos/bin:/usr/local/bin:/usr/bin:/bin:/home/marca/bin
    PERL_BADLANG (unset)
    SHELL=/bin/bash


Thread Next


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About