develooper Front page | perl.perl5.porters | Postings from April 2011

Unicode regex negated case-insensitivity in 5.14.0-RC1

Thread Next
From:
George Greer
Date:
April 28, 2011 13:48
Subject:
Unicode regex negated case-insensitivity in 5.14.0-RC1
Message ID:
alpine.LFD.2.02.1104281633230.21770@ein.m-l.org
Attempting to run $WORK's data filter/ETL on 5.14.0-RC1, which currently 
runs on 5.10.0 in production.  The module versions are different 
between the two Perl versions but the script itself is the same.

(string scrambled to protect the innocent but still tickle behavior)

   DB<73> $x = "X-Xoqp-SDR-FpCqar4-Duooery-Faad-laeC_cCesspfpads:";

   DB<74> x $x =~ /^[^\x00-\x1f\x7f-\xff :]+:/
0  1
   DB<75> x $x =~ /^[^\x00-\x1f\x7f-\xff :]+:/i
0  1
   DB<76> utf8::upgrade($x)

   DB<77> x $x =~ /^[^\x00-\x1f\x7f-\xff :]+:/
0  1
   DB<78> x $x =~ /^[^\x00-\x1f\x7f-\xff :]+:/i
   empty array

Script version:
 	$x = "X-Xoqp-SDR-FpCqar4-Duooery-Faad-laeC_cCesspfpads:";
 	print $x =~ /^[^\x00-\x1f\x7f-\xff :]+:/  ? 1 : 0
 	print $x =~ /^[^\x00-\x1f\x7f-\xff :]+:/i ? 1 : 0;
 	utf8::upgrade($x);
 	print $x =~ /^[^\x00-\x1f\x7f-\xff :]+:/  ? 1 : 0;
 	print $x =~ /^[^\x00-\x1f\x7f-\xff :]+:/i ? 1 :	0;
 	print "\n";

5.14.0-RC1 (tarball)
 	1110
5.12.3 (Fedora's perl-5.12.3-143.fc14.x86_64)
 	1111
5.10.0 (tarball)
 	1111

Regex is from Mail::Header testing for bad RFC822 field names through 
MIME-tools.

- - - 8< - - - 8< - - -
our $FIELD_NAME = '[^\x00-\x1f\x7f-\xff :]+:';
...
     defined $ctag && $ctag =~ /^($FIELD_NAME|From )/oi
         or croak "Bad RFC822 field name '$tag'\n";
- - - 8< - - - 8< - - -

Using /aa does seem to fix the regex:

1
1
1 /
0 /i
1 /a
0 /ia
1 /aa
1 /iaa

No special 5.14 features used by the script (since it is 5.10 compatible).

-- 
George Greer

Thread Next


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About