develooper Front page | perl.perl5.porters | Postings from July 2003

utf8+regex bug in 5.8.1-RC2

Thread Next
From:
Dave Mitchell
Date:
July 30, 2003 16:40
Subject:
utf8+regex bug in 5.8.1-RC2
Message ID:
20030730234031.GG6364@fdgroup.com
The following appears to be a bug in MAINT20320 (and bleed).

A UTF8 string fails to match against a regex which includes a string
interpolation of another UTF8 string.  A plain copy of that first string
matches fine against the same regex.

I first noticed this in the test suite of XML::Simple.

Neither UTF8 nor regexen are really my thing, so I'm hoping someone
else can confirm and/or fix.

Dave.

#!/usr/bin/perl -w

use strict;

my $s1 = qq{0123456\x{100}}; chop $s1; # make it UTF8
my $len = length $s1; # make it attach utf8 magic

my $a = "2\x{100}"; chop $a; # make it UTF8
$len = length $a; # make it attach utf8 magic

my $rex = qr{^.*0.*1${a}3456$}sx;

# make a plain (PV) copy of the string.

my $s2 = substr($s1, 0, length($s1));

use Devel::Peek;
print "\n";
Dump($s1);
print "\n";
Dump($s2);
print "\n";
Dump($a);
print "\n";
print "string1 matches\n" if $s1 =~ $rex; # fails
print "string2 matches\n" if $s2 =~ $rex; # succeeds

__END__



SV = PVMG(0x81647c8) at 0x81422ac
  REFCNT = 1
  FLAGS = (PADBUSY,PADMY,SMG,POK,pPOK,UTF8)
  IV = 0
  NV = 0
  PV = 0x813d3a8 "0123456"\0 [UTF8 "0123456"]
  CUR = 7
  LEN = 10
  MAGIC = 0x814c420
    MG_VIRTUAL = &PL_vtbl_utf8
    MG_TYPE = PERL_MAGIC_utf8(w)
    MG_LEN = 7

SV = PV(0x812ce70) at 0x814a770
  REFCNT = 1
  FLAGS = (PADBUSY,PADMY,POK,pPOK)
  PV = 0x813ec08 "0123456"\0
  CUR = 7
  LEN = 8

SV = PVMG(0x81647e8) at 0x81422a0
  REFCNT = 1
  FLAGS = (PADBUSY,PADMY,SMG,POK,pPOK,UTF8)
  IV = 0
  NV = 0
  PV = 0x814c440 "2"\0 [UTF8 "2"]
  CUR = 1
  LEN = 4
  MAGIC = 0x814c460
    MG_VIRTUAL = &PL_vtbl_utf8
    MG_TYPE = PERL_MAGIC_utf8(w)
    MG_LEN = 1

string2 matches


-- 
"But Sidley Park is already a picture, and a most amiable picture too.
The slopes are green and gentle. The trees are companionably grouped at
intervals that show them to advantage. The rill is a serpentine ribbon
unwound from the lake peaceably contained by meadows on which the right
amount of sheep are tastefully arranged." Lady Croom - Arcadia

Thread Next


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About