On Thu Mar 13 14:18:31 2008, chris_hall wrote: > > This is a bug report for perl from chris.hall@highwayman.com, > generated with the help of perlbug 1.35 running under perl v5.8.8. > > > ----------------------------------------------------------------- > > It appears that utf8::valid() disagrees with Encode::encode('utf8', > ...) > for characters \x{14_0000) - \x{1F_0000}. > > I suggest utf8::valid() is broken. > > The following: > > use strict ; > > use Encode qw(FB_QUIET LEAVE_SRC) ; > > printf "Perl v%vd & Encode %s\n", $^V, $Encode::VERSION ; > > # Test characters: 0x0000_FFFF, 0x0001_FFFF, 0x0002_0000, > 0x0002_FFFF, > # 0x0003_0000, ...., 0x7FFF_FFFF. > > my $c = 0xFFFF ; > while ($c <= 0x7FFF_FFFF) { > my $s = chr($c) ; > > my $v = utf8::valid($s) ? 1 : 0 ; > my $o = Encode::encode('utf8', $s, FB_QUIET() | LEAVE_SRC()) ; > > my $r = $o ? 1 : 0 ; > > if ($v != $r) { > printf "0x%04X_%04X: utf8::valid=%d but Encode::encode=%d ", > ($c >> 16), $c & 0xFFFF, $v, $r ; > Encode::_utf8_off($s) ; > print map { sprintf '\x%02X', ord($_) } split(//, $s) ; > print "\n" ; > } ; > > if ($c & 0xFFFF) { $c += 1 ; } else { $c += 0xFFFF ; } ; > } ; > > Produces: > > Perl v5.8.8 & Encode 2.23 > 0x0014_0000: utf8::valid=0 but Encode::encode=1 \xF5\x80\x80\x80 > 0x0014_FFFF: utf8::valid=0 but Encode::encode=1 \xF5\x8F\xBF\xBF > 0x0015_0000: utf8::valid=0 but Encode::encode=1 \xF5\x90\x80\x80 > 0x0015_FFFF: utf8::valid=0 but Encode::encode=1 \xF5\x9F\xBF\xBF > 0x0016_0000: utf8::valid=0 but Encode::encode=1 \xF5\xA0\x80\x80 > 0x0016_FFFF: utf8::valid=0 but Encode::encode=1 \xF5\xAF\xBF\xBF > 0x0017_0000: utf8::valid=0 but Encode::encode=1 \xF5\xB0\x80\x80 > 0x0017_FFFF: utf8::valid=0 but Encode::encode=1 \xF5\xBF\xBF\xBF > 0x0018_0000: utf8::valid=0 but Encode::encode=1 \xF6\x80\x80\x80 > 0x0018_FFFF: utf8::valid=0 but Encode::encode=1 \xF6\x8F\xBF\xBF > 0x0019_0000: utf8::valid=0 but Encode::encode=1 \xF6\x90\x80\x80 > 0x0019_FFFF: utf8::valid=0 but Encode::encode=1 \xF6\x9F\xBF\xBF > 0x001A_0000: utf8::valid=0 but Encode::encode=1 \xF6\xA0\x80\x80 > 0x001A_FFFF: utf8::valid=0 but Encode::encode=1 \xF6\xAF\xBF\xBF > 0x001B_0000: utf8::valid=0 but Encode::encode=1 \xF6\xB0\x80\x80 > 0x001B_FFFF: utf8::valid=0 but Encode::encode=1 \xF6\xBF\xBF\xBF > 0x001C_0000: utf8::valid=0 but Encode::encode=1 \xF7\x80\x80\x80 > 0x001C_FFFF: utf8::valid=0 but Encode::encode=1 \xF7\x8F\xBF\xBF > 0x001D_0000: utf8::valid=0 but Encode::encode=1 \xF7\x90\x80\x80 > 0x001D_FFFF: utf8::valid=0 but Encode::encode=1 \xF7\x9F\xBF\xBF > 0x001E_0000: utf8::valid=0 but Encode::encode=1 \xF7\xA0\x80\x80 > 0x001E_FFFF: utf8::valid=0 but Encode::encode=1 \xF7\xAF\xBF\xBF > 0x001F_0000: utf8::valid=0 but Encode::encode=1 \xF7\xB0\x80\x80 > 0x001F_FFFF: utf8::valid=0 but Encode::encode=1 \xF7\xBF\xBF\xBF > > And the same for: Perl v5.10.0 & Encode 2.23 > > Chris > I'll check to see if the patch included in RT #43294 fixes both problems. Thanks for the report. SteveThread Previous | Thread Next