develooper Front page | perl.perl5.porters | Postings from February 2003

format, PerlIO and utf8

Thread Next
From:
Dan Kogai
Date:
February 8, 2003 16:37
Subject:
format, PerlIO and utf8
Message ID:
9B48FDF6-3BC6-11D7-A6FC-000393AE4244@dan.co.jp
Nick Ing-XS and jhi,

The recent patch by inaba-san has solved encoding pragma issues -- 
well, almost.  Inaba-san and I are still aware that format does not get 
along w/ PerlIO very well.  Consider the code below;

#
use strict;
use Encode;
my $str = "\x{99F1}\x{99DD}"; # camel in ideographs

format STDOUT =
Word: @<<<<<<<
$str
.

binmode(STDOUT=>":utf8");
print $str, "\n"; # this one prints fine
write;            # this one does not
__END__

Note the binmode before print() and write().  Without binmode() the 
script works as expected but with binmode() this does not.  Note this 
is not ":encoding(utf8)".

 > perl t/format.pl | hexdump -C
00000000  e9 a7 b1 e9 a7 9d 0a 57  6f 72 64 3a 20 c3 a9 c2  
|.......Word: ...|
00000010  a7 c2 b1 c3 a9 c2 a7 c2  9d 0a                    |..........|

It appears that write() doubly utf8's the string.

\x{99f1}\x{99dd} =>
\xE9\xA7\xB1\xE9\xA7\x9D =>
\xC3\xA9\xC2\xA7\xC2\xB1\xC3\xA9\xC2\xA7\xC2\x9D

Even though the very use of format in Unicode is moot (i.e. Char width 
problem, BIDI, you name it), Inaba-san suspects it's rather perl core 
problem than PerlIO and he may submit a patch to fix that in future.

Dan the Man with Too Many Encoded Chunks of Information


Thread Next


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About