develooper Front page | perl.perl5.porters | Postings from June 2003

[perl #22814] Non-deterministic problem with unicode regexps

Thread Previous | Thread Next
From:
Call ID Numbers
Date:
June 26, 2003 12:32
Subject:
[perl #22814] Non-deterministic problem with unicode regexps
Message ID:
rt-22814-59914.10.5107605036297@rt.perl.org
# New Ticket Created by  Call ID Numbers 
# Please include the string:  [perl #22814]
# in the subject line of all future correspondence about this issue. 
# <URL: http://rt.perl.org/rt2/Ticket/Display.html?id=22814 >


This is a bug report for perl from roy.badami@globalgraphics.com,
generated with the help of perlbug 1.34 running under perl v5.8.0.


-----------------------------------------------------------------
[Please enter your report here]

I've been attempting to track down a rather non-deterministic bug in a
perl script that was attempting to use unicode regexps.

The symptom there was that random garbage appeared to be getting
appended to the end of the regexp, resulting in a parse error.  (In
the case where I observed it, a plus was being appending to a regexp
that ended in a star, resulting in a nested quantifier error.)

This is the closest I've been able to come to a simple test case.

----------cut here----------
#!/usr/local/bin/perl5

use Unicode::Normalize;
use Encode;     # This appears to be significant...

$a = 'Hello ++World!';

$a = NFKC($a);  # Has side effect of upgrading $a to utf-8
print "$a\n";   # $a prints correctly
'' =~ /$a/;     # But the regexp error message doesn't
----------cut here----------

This fragment deliberately triggers a regexp parse error; however the
corruption of the regexp is apparent in the error message printed.  In
particular, there are two random characters after 'World!':

Hello ++World!
Nested quantifiers in regex; marked by <-- HERE in m/Hello ++ <-- HERE World!Ñ\/ at ./test line 10.

It is helpful to do this test in an environment where you stand a good
chance of seeing control characters (eg in an emacs shell buffer).

If the line

	$a = NFKC($a);

is commented out of the script, avoiding the upgrade to utf8, then the
the error message prints correctly.

This behaviour has also been observed in the prebuilt debian package
perl 5.8.0-15 for x86.

	

[Please do not change anything below this line]
-----------------------------------------------------------------
---
Flags:
    category=core
    severity=high
---
Site configuration information for perl v5.8.0:

Configured by khalid at Fri Aug 16 14:54:49 BST 2002.

Summary of my perl5 (revision 5.0 version 8 subversion 0) configuration:
  Platform:
    osname=solaris, osvers=2.5.1, archname=sun4-solaris
    uname='sunos chihuahua 5.5.1 generic_103640-29 sun4u sparc sunw,ultra-1 '
    config_args='-d -Dcc=gcc -Uinstallusrbinperl -Dlibpth=/usr/lib /usr/ccs/lib -Dpager=/usr/ucb/more -Ui_gdbm -Ui_db -Dstartperl=#!/usr/local/bin/perl5 -Dprefix=/usr/local/soft/perl-5.8.0/run/default/sparc_sun_solaris2.5.1 -Dsiteprefix=/usr/local/'
    hint=recommended, useposix=true, d_sigaction=define
    usethreads=undef use5005threads=undef useithreads=undef usemultiplicity=undef
    useperlio=define d_sfio=undef uselargefiles=define usesocks=undef
    use64bitint=undef use64bitall=undef uselongdouble=undef
    usemymalloc=n, bincompat5005=undef
  Compiler:
    cc='gcc', ccflags ='-fno-strict-aliasing -I/usr/local/include ',
    optimize='-O',
    cppflags='-fno-strict-aliasing -I/usr/local/include'
    ccversion='', gccversion='2.95.2 19991024 (release)', gccosandvers='solaris2.5.1'
    intsize=4, longsize=4, ptrsize=4, doublesize=8, byteorder=4321
    d_longlong=define, longlongsize=8, d_longdbl=define, longdblsize=16
    ivtype='long', ivsize=4, nvtype='double', nvsize=8, Off_t='off_t', lseeksize=4
    alignbytes=8, prototype=define
  Linker and Libraries:
    ld='gcc', ldflags =' '
    libpth=/usr/lib /usr/ccs/lib
    libs=-lsocket -lnsl -ldl -lm -lc
    perllibs=-lsocket -lnsl -ldl -lm -lc
    libc=/lib/libc.so, so=so, useshrplib=false, libperl=libperl.a
    gnulibc_version=''
  Dynamic Linking:
    dlsrc=dl_dlopen.xs, dlext=so, d_dlsymun=undef, ccdlflags=' '
    cccdlflags='-fPIC', lddlflags='-G'

Locally applied patches:
    

---
@INC for perl v5.8.0:
    /usr/local/soft/perl-5.8.0/run/default/sparc_sun_solaris2.5.1/lib/5.8.0/sun4-solaris
    /usr/local/soft/perl-5.8.0/run/default/sparc_sun_solaris2.5.1/lib/5.8.0
    /usr/local//lib/site_perl/5.8.0/sun4-solaris
    /usr/local//lib/site_perl/5.8.0
    /usr/local//lib/site_perl
    .

---
Environment for perl v5.8.0:
    HOME=/u/roy
    LANG (unset)
    LANGUAGE (unset)
    LC_COLLATE=en_UK
    LC_CTYPE=en_UK
    LC_MESSAGES=C
    LC_MONETARY=en_UK
    LC_NUMERIC=en_UK
    LC_TIME=en_UK
    LD_LIBRARY_PATH (unset)
    LOGDIR (unset)
    PATH=:.:/u/roy/bin:/usr/local/bin:/usr/ucb:/usr/bin/bsd:/bin:/usr/bin:/usr/local/X11R5/bin:/usr/bin/X11:/usr/new:/etc:/usr/etc:/usr/5bin:/usr/local/lib/frame/bin:/usr/ccs/bin
    PERL_BADLANG (unset)
    SHELL=/usr/local/bin/bash


Thread Previous | Thread Next


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About