On Sat, 11 Dec 2010 11:47:55 +0900 SADAHIRO Tomoyuki <bqw10602@nifty.com> wrote: > > Could the latter representation (\xc2\x80) appear in a regular-expression character class, too? > > Could with perl 5.8.0, 5.8.1, 5.8.3, 5.8.8. > Cannot with perl 5.8.9, 5.10.0, 5.10.1. > (I didn't run with other versions.) Sadly, it has been broken in a character class. [\xHH\xHH] was not interpreted as a multi-octet character "\xHH\xHH" under use encoding "a-multi-octet-encoding". In an older perl (to 5.8.8), /\xC2\xA0/ matched only U+00C2. (by design) /\xE1\x80\x80/ matched only U+1000. (by design) /[\xC2\xA0]/ matched U+00C2 and U+00A0. (broken) /[\xE1\x80\x80]/ matched U+00E1 and U+0080. (broken) In a newer perl (from 5.8.9), /\xC2\xA0/ matches only "\x{FFFD}"x2. (broken) /\xE1\x80\x80/ matches only "\x{FFFD}"x3. (broken) /[\xE1\x80\x80]/ and /[\xC2\xA0]/ match only U+FFFD. (broken) #!perl use strict; use warnings; use charnames ':full'; use encoding 'UTF-8'; print "perl $]\n"; my $u00e1 = "\N{LATIN SMALL LETTER A WITH ACUTE}"; # U+00E1 print "string-eq: "; print "a\x{1000}z" eq "a\xE1\x80\x80z" ? "ok\n" : "not ok\n"; print "reg-exact: "; print "a\x{1000}z" =~ /a\xE1\x80\x80z/ ? "ok\n" : "not ok\n"; print "reg-class: "; print "a\x{1000}z" =~ /a[\xE1\x80\x80]z/ ? "ok\n" : "not ok\n"; print " vs 00E1: "; print "a${u00e1}z" !~ /a[\xE1\x80\x80]z/ ? "ok\n" : "not ok\n"; print " vs FFFD: "; print "a\x{FFFD}z" !~ /a[\xE1\x80\x80]z/ ? "ok\n" : "not ok\n"; __END__ perl 5.008 string-eq: ok reg-exact: ok reg-class: not ok vs 00E1: not ok vs FFFD: ok perl 5.008001 string-eq: ok reg-exact: ok reg-class: not ok vs 00E1: not ok vs FFFD: ok perl 5.008003 string-eq: ok reg-exact: ok reg-class: not ok vs 00E1: not ok vs FFFD: ok perl 5.008008 string-eq: ok reg-exact: ok reg-class: not ok vs 00E1: not ok vs FFFD: ok perl 5.008009 string-eq: ok reg-exact: not ok reg-class: not ok vs 00E1: ok vs FFFD: not ok perl 5.010000 string-eq: ok reg-exact: not ok reg-class: not ok vs 00E1: ok vs FFFD: not ok perl 5.010001 string-eq: ok reg-exact: not ok reg-class: not ok vs 00E1: ok vs FFFD: not okThread Previous | Thread Next