On Fri, Mar 2, 2012 at 2:03 PM, Eric Brine <ikegami@adaelis.com> wrote:
> On Fri, Mar 2, 2012 at 9:11 AM, Craig A. Berry <craigberry@mac.com> wrote:
>
I was thinking of a situation where something external to Perl limits how
>> much data you can get in one read and thus gives you less than the full
>> amount requested by $/.
>>
>
> That's exactly the situation I described. Here, let me provide the strace
> output.
>
> $ strace perl -e'$/=\40; <>;' < /dev/random
> ...
> read(0, "\5|\200\"\360T0*\325\223\276\322\20S\244\16\341", 8192) = 17
> read(0, "\370\356 \2652\236\27>", 8192) = 8
> read(0, "\0\270\ve\332\223\225\312", 8192) = 8
> read(0, "\316\366\272\311\215.\204\361", 8192) = 8
> ...
>
>
>> I'm pretty sure you'll get mangled UTF-8 if you happen to be
>> mid-character when you hit the end of the device buffer.
>
>
> No, because Perl will just ask for more. You'll get mangled UTF-8 if you
> happen to request a number of bytes that ends you mid-character (which is
> what this ticket is about).
>
> (If we were talking about sysread instead of readline or read, then yes,
> it could happen then. Unlike read and readline, sysread returns as soon as
> bytes are available.)
>
And here's an example where one character is read using two reads:
$ perl -C -e'print "a"x8191, chr(0x2660)' > x
$ ls -l x
-rw------- 1 ikegami group 8194 Mar 2 23:26 x
$ perl -le'use open ":std", ":utf8"; $/=\8194; $_=<>; print $_ eq
("a"x8191).chr(0x2660) ?1:0;' < x
1
strace:
read(0, "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa"..., 8192) = 8192
read(0, "\231\240", 8192) = 2
Thread Previous
|
Thread Next