Front page | perl.perl5.porters |
Postings from May 2003
Re: Meaning of sysread()
Thread Previous
|
Thread Next
From:
Nick Ing-Simmons
Date:
May 21, 2003 05:48
Subject:
Re: Meaning of sysread()
Message ID:
20030521124819.1922.11@bactrian.elixent.com
Jarkko Hietaniemi <jhi@iki.fi> writes:
>I was responsible for the wording change in perlfunc (change #13146,
>a year and a half ago). So I'm biased, but I still do think what it
>says now for sysread() is "correct" and "fine":
>
>First:
>
>> Attempts to read LENGTH I<characters> of data into variable SCALAR
>> from the specified FILEHANDLE, using the system call read(2)
>
>For non-Unicode-caring people the difference between "bytes" and
>"characters" does not exist.
Depends. They might have a favourite multi-byte charset of their own,
but assuming they realize that perl is Unicode then correct.
>
>Then:
>
>> Note the I<characters>: depending on the status of the filehandle,
>> either (8-bit) bytes or characters are read. By default all
>> filehandles operate on bytes, but for example if the filehandle has
>> been opened with the C<:utf8> I/O layer (see L</open>, and the C<open>
>> pragma, L<open>), the I/O will operate on characters, not bytes.
>
>So to answer your question: bytes. Always has done, still does....
>EXCEPT when the filehandle has been marked :utf8, in which case
>Unicode characters are read, even with sysread().
It tries. Because the "top" of the PerlIO layer stack (which is what
pp_sysread() sees) is marked as :utf8. But if that is because
the stack is
:unix :perlio :encoding(big5)
and it is :encoding that is spitting UTF-8. Now when sysread
dives down and calls read(2) then pp_sysread is going to get
big5 octet stream - and the UTF-8 completion logic in pp_sysread()
is going to barf.
>(I think leaving it
>otherwise would be a mess, people having to parse UTF-8 manually.)
It will not work right now except for the case where on-disk
(or in-pipe or whatever) is UTF-8.
>
>I think I like of the suggestions most the PerlIO_syslayer() one.
>There is already enough rope out there :-)
It has the advantage that the utf8-ness flag of the PerlIO it returned
would be correct for the data it was going to provide, so existing
character completion logic would work just fine.
--
Nick Ing-Simmons
http://www.ni-s.u-net.com/
Thread Previous
|
Thread Next