develooper Front page | perl.perl5.porters | Postings from September 2003

Unicode on perl 5.8.0 seems to be broken on simple example

From:
Daniel Shane
Date:
September 16, 2003 13:35
Subject:
Unicode on perl 5.8.0 seems to be broken on simple example
Message ID:
KMEEIFENCMCHHPKOFBKKMEEKCAAA.shane@irosoft.com
Hi,

Here is a bug I entered in perlbug (23769), however I have not gotten any
feedback yet, so maybe people here can help me understand this.

Here is the bug report I sent:

It seems like the unicode regex is broken, here is a simple program that
doesnt seem to do the right thing:

#!/usr/bin/perl -w

use strict;
use Encode;

my $str = "\302\240\302\240\302\240";
my $line = decode("utf8", $str);

if ($line =~ /\x{A0}/) {
  print "Found pattern 1\n";
}

if ($line =~ /\x{A0}+/) {
  print "Found pattern 2\n";
}

if ($line =~ /\x{A0}\x{A0}/) {
  print "Found pattern 3\n";
}

result is :
  Found pattern 1
  Found pattern 3

It should match all three patterns because $str = "\x{A0}\x{A0}" matches all
three patterns.

Noted that this bug also happens when you do this:

open(FILE, "<:utf8", "file.txt");
$line = <FILE>

.....

where file.txt contains 3 utf8 non-breaking spaces.

[Please do not change anything below this line]
-----------------------------------------------------------------
---
Flags:
    category=core
    severity=critical
---
Site configuration information for perl v5.8.0:

Configured by Gerrit at Fri Aug 29 11:53:54 CEST 2003.

Summary of my perl5 (revision 5.0 version 8 subversion 0) configuration:
  Platform:
    osname=cygwin, osvers=1.3.22(0.7832), archname=cygwin-multi-64int
    uname='cygwin_nt-5.0 ismene 1.3.22(0.7832) 2003-03-18 09:20 i586 unknown
unknown cygwin '




config_args='-de -Dmksymlinks -Dusemultiplicity -Duse64bitint -Doptimize=-O3
 -Dman3ext=3pm'
    hint=recommended, useposix=true, d_sigaction=define
    usethreads=undef use5005threads=undef useithreads=undef
usemultiplicity=define
    useperlio=define d_sfio=undef uselargefiles=define usesocks=undef
    use64bitint=define use64bitall=undef uselongdouble=undef
    usemymalloc=y, bincompat5005=undef
  Compiler:
    cc='gcc', ccflags ='-DPERL_USE_SAFE_PUTENV -fno-strict-aliasing',
    optimize='-O3',
    cppflags='-DPERL_USE_SAFE_PUTENV -fno-strict-aliasing'
    ccversion='', gccversion='3.2 20020927 (prerelease)', gccosandvers=''
    intsize=4, longsize=4, ptrsize=4, doublesize=8, byteorder=12345678
    d_longlong=define, longlongsize=8, d_longdbl=define, longdblsize=12
    ivtype='long long', ivsize=8, nvtype='double', nvsize=8, Off_t='off_t',
lseeksize=4
    alignbytes=8, prototype=define
  Linker and Libraries:
    ld='ld2', ldflags =' -s -L/usr/local/lib'
    libpth=/usr/local/lib /usr/lib /lib
    libs=-lgdbm -ldb -lcrypt -lutil -lgdbm_compat
    perllibs=-lcrypt -lutil -lgdbm_compat
    libc=/usr/lib/libc.a, so=dll, useshrplib=true, libperl=libperl.a
    gnulibc_version=''
  Dynamic Linking:
    dlsrc=dl_dlopen.xs, dlext=dll, d_dlsymun=undef, ccdlflags=' -s'
    cccdlflags=' ', lddlflags=' -s -L/usr/local/lib'

Locally applied patches:


---
@INC for perl v5.8.0:
    /usr/lib/perl5/5.8.0/cygwin-multi-64int
    /usr/lib/perl5/5.8.0
    /usr/lib/perl5/site_perl/5.8.0/cygwin-multi-64int
    /usr/lib/perl5/site_perl/5.8.0
    /usr/lib/perl5/site_perl
    .

---
Environment for perl v5.8.0:
    CYGWIN_ROOT=C:\cygwin
    HOME=/home/shane
    LANG (unset)
    LANGUAGE (unset)
    LD_LIBRARY_PATH (unset)
    LOGDIR (unset)

PATH=.:/opt/kde3/bin:/opt/kde3/lib:/opt/kde3/lib/kde3:/usr/bin:/usr/X11R6/bi
n:/cygdrive/c/Perl/bin/:/cygdrive/e/ActiveStatePerl-806/bin/:/cygdrive/c/WIN
DOWS/system32:/cygdrive/c/WINDOWS:/cygdrive/c/WINDOWS/System32/Wbem:/cygdriv
e/c/Program Files/Microsoft Visual Studio .NET
2003/Vc7/bin:/cygdrive/c/Program Files/Microsoft SQL
Server/80/Tools/BINN:/cygdrive/c/PROGRA~1/Borland/Delphi6/Bin:/cygdrive/c/PR
OGRA~1/Borland/Delphi6/Projects/Bpl
    PERL_BADLANG (unset)
    SHELL=/bin/bash




nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About