develooper Front page | perl.perl5.porters | Postings from April 2011

Re: Unicode regex negated case-insensitivity in 5.14.0-RC1

Thread Previous | Thread Next
From:
Tom Christiansen
Date:
April 29, 2011 14:30
Subject:
Re: Unicode regex negated case-insensitivity in 5.14.0-RC1
Message ID:
4079.1304112616@chthon
Nicholas Clark <nick@ccl4.org> wrote
   on Fri, 29 Apr 2011 22:14:01 BST: 


> Also, as this matches:

> $ ./perl -Ilib -lwe '$_ = "ss"; utf8::upgrade($_); print /\A[\x80-\xFF]\z/i ? "Y" : "N"'
> Y

> shouldn't this?

> $  ./perl -Ilib -lwe '$_ = "ss"; utf8::upgrade($_); print /\A[\x00-\xFF]\z/i ? "Y" : "N"'
> N

Oh my.

OK, I've tried all the cases of i-j for i ranging from 0..0xDF and
for j ranging from 0xDF .. 0x100.

    $_ = "ss";
    utf8::upgrade($_);
    for $i ( 0 .. 0xDF ) {
	for $j ( 0xDF .. 0x100 ) {
	    $pat = sprintf "\\A[\\x{%02X}-\\x{%02X}]\\z", $i, $j;
	    printf "%s\t%s\n", $pat, /$pat/i ? "Y" : "N";
	}
    }

With these results:

    % perl5.12.3 /tmp/range | grep -c 'Y$'
    109
    % perl5.12.3 /tmp/range | grep -c 'N$'
    7507

    % blead /tmp/range | grep -c 'N$'
    3944
    % -Ilib /tmp/range | grep -c 'Y$'
    3672

--tom

Thread Previous | Thread Next


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About