develooper Front page | perl.perl5.porters | Postings from June 2003

[perl #22814] Non-deterministic problem with unicode regexps

Thread Previous | Thread Next
Call ID Numbers
June 26, 2003 12:32
[perl #22814] Non-deterministic problem with unicode regexps
Message ID:
# New Ticket Created by  Call ID Numbers 
# Please include the string:  [perl #22814]
# in the subject line of all future correspondence about this issue. 
# <URL: >

This is a bug report for perl from,
generated with the help of perlbug 1.34 running under perl v5.8.0.

[Please enter your report here]

I've been attempting to track down a rather non-deterministic bug in a
perl script that was attempting to use unicode regexps.

The symptom there was that random garbage appeared to be getting
appended to the end of the regexp, resulting in a parse error.  (In
the case where I observed it, a plus was being appending to a regexp
that ended in a star, resulting in a nested quantifier error.)

This is the closest I've been able to come to a simple test case.

----------cut here----------

use Unicode::Normalize;
use Encode;     # This appears to be significant...

$a = 'Hello ++World!';

$a = NFKC($a);  # Has side effect of upgrading $a to utf-8
print "$a\n";   # $a prints correctly
'' =~ /$a/;     # But the regexp error message doesn't
----------cut here----------

This fragment deliberately triggers a regexp parse error; however the
corruption of the regexp is apparent in the error message printed.  In
particular, there are two random characters after 'World!':

Hello ++World!
Nested quantifiers in regex; marked by <-- HERE in m/Hello ++ <-- HERE World!Ñ\/ at ./test line 10.

It is helpful to do this test in an environment where you stand a good
chance of seeing control characters (eg in an emacs shell buffer).

If the line

	$a = NFKC($a);

is commented out of the script, avoiding the upgrade to utf8, then the
the error message prints correctly.

This behaviour has also been observed in the prebuilt debian package
perl 5.8.0-15 for x86.


[Please do not change anything below this line]
Site configuration information for perl v5.8.0:

Configured by khalid at Fri Aug 16 14:54:49 BST 2002.

Summary of my perl5 (revision 5.0 version 8 subversion 0) configuration:
    osname=solaris, osvers=2.5.1, archname=sun4-solaris
    uname='sunos chihuahua 5.5.1 generic_103640-29 sun4u sparc sunw,ultra-1 '
    config_args='-d -Dcc=gcc -Uinstallusrbinperl -Dlibpth=/usr/lib /usr/ccs/lib -Dpager=/usr/ucb/more -Ui_gdbm -Ui_db -Dstartperl=#!/usr/local/bin/perl5 -Dprefix=/usr/local/soft/perl-5.8.0/run/default/sparc_sun_solaris2.5.1 -Dsiteprefix=/usr/local/'
    hint=recommended, useposix=true, d_sigaction=define
    usethreads=undef use5005threads=undef useithreads=undef usemultiplicity=undef
    useperlio=define d_sfio=undef uselargefiles=define usesocks=undef
    use64bitint=undef use64bitall=undef uselongdouble=undef
    usemymalloc=n, bincompat5005=undef
    cc='gcc', ccflags ='-fno-strict-aliasing -I/usr/local/include ',
    cppflags='-fno-strict-aliasing -I/usr/local/include'
    ccversion='', gccversion='2.95.2 19991024 (release)', gccosandvers='solaris2.5.1'
    intsize=4, longsize=4, ptrsize=4, doublesize=8, byteorder=4321
    d_longlong=define, longlongsize=8, d_longdbl=define, longdblsize=16
    ivtype='long', ivsize=4, nvtype='double', nvsize=8, Off_t='off_t', lseeksize=4
    alignbytes=8, prototype=define
  Linker and Libraries:
    ld='gcc', ldflags =' '
    libpth=/usr/lib /usr/ccs/lib
    libs=-lsocket -lnsl -ldl -lm -lc
    perllibs=-lsocket -lnsl -ldl -lm -lc
    libc=/lib/, so=so, useshrplib=false, libperl=libperl.a
  Dynamic Linking:
    dlsrc=dl_dlopen.xs, dlext=so, d_dlsymun=undef, ccdlflags=' '
    cccdlflags='-fPIC', lddlflags='-G'

Locally applied patches:

@INC for perl v5.8.0:

Environment for perl v5.8.0:
    LANG (unset)
    LANGUAGE (unset)
    LD_LIBRARY_PATH (unset)
    LOGDIR (unset)
    PERL_BADLANG (unset)

Thread Previous | Thread Next Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at | Group listing | About