Front page | perl.perl5.porters |
Postings from December 2004
On read and unicode
Thread Next
From:
perl5-porters
Date:
December 24, 2004 18:24
Subject:
On read and unicode
Message ID:
cqiiuu$tdn$1@post.home.lunix
Currently if you do:
my $in = "f\xfezz";
utf8::upgrade($in);
print "Old in='$in'\n";
print unpack("H*", $in), "\n";
read(STDIN, $in, 3, 2);
print "Now in='$in'\n";
print unpack("H*", $in), "\n";
print "is utf8:", utf8::is_utf8($in) ? 1 : 0, "\n";
and type "abc" as input, you get:
Old in='fþzz'
66c3be7a7a
abc
Now in='fþabc'
66c3be616263
is utf8:0
So it counted 2 (full utf8) chars forward in $in, then dropped the utf8
flag and added in the 3 new bytes.
I think this is an unnecessary exposure of the internal format of the old
$in string, and that
$rc = read($fh, $buf, $len, $off)
should basically behave like:
$buf = substr($buf, $off);
# With the current semantics for the raw/unicode-ness of the filehandle:
$rc = read($fh, my $tmp, $len);
if ($rc) $buf .= $tmp;
In the above case that would return the same (internal) byte sequence,
but the utf8 flag would be on, and the second char value would be preserved
instead of being expanded to its utf-8 encoding, giving $in the
(to my mind) more logical value "fþabc"
Thread Next
-
On read and unicode
by perl5-porters