develooper Front page | perl.perl5.porters | Postings from May 2003

Re: Meaning of sysread()

Thread Previous | Thread Next
From:
Nick Ing-Simmons
Date:
May 21, 2003 05:48
Subject:
Re: Meaning of sysread()
Message ID:
20030521124819.1922.11@bactrian.elixent.com
Jarkko Hietaniemi <jhi@iki.fi> writes:
>I was responsible for the wording change in perlfunc (change #13146,
>a year and a half ago).  So I'm biased, but I still do think what it
>says now for sysread() is "correct" and "fine":
>
>First:
>
>> Attempts to read LENGTH I<characters> of data into variable SCALAR
>> from the specified FILEHANDLE, using the system call read(2)
>
>For non-Unicode-caring people the difference between "bytes" and
>"characters" does not exist.

Depends. They might have a favourite multi-byte charset of their own,
but assuming they realize that perl is Unicode then correct.

>
>Then:
>
>> Note the I<characters>: depending on the status of the filehandle,
>> either (8-bit) bytes or characters are read.  By default all
>> filehandles operate on bytes, but for example if the filehandle has
>> been opened with the C<:utf8> I/O layer (see L</open>, and the C<open>
>> pragma, L<open>), the I/O will operate on characters, not bytes.
>
>So to answer your question: bytes.  Always has done, still does....
>EXCEPT when the filehandle has been marked :utf8, in which case
>Unicode characters are read, even with sysread().  

It tries. Because the "top" of the PerlIO layer stack (which is what 
pp_sysread() sees) is marked as :utf8. But if that is because 
the stack is 

:unix :perlio :encoding(big5)

and it is :encoding that is spitting UTF-8. Now when sysread
dives down and calls read(2) then pp_sysread is going to get 
big5 octet stream - and the UTF-8 completion logic in pp_sysread()
is going to barf.

>(I think leaving it
>otherwise would be a mess, people having to parse UTF-8 manually.)

It will not work right now except for the case where on-disk
(or in-pipe or whatever) is UTF-8.

>
>I think I like of the suggestions most the PerlIO_syslayer() one.
>There is already enough rope out there :-)

It has the advantage that the utf8-ness flag of the PerlIO it returned
would be correct for the data it was going to provide, so existing 
character completion logic would work just fine.



-- 
Nick Ing-Simmons
http://www.ni-s.u-net.com/


Thread Previous | Thread Next


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About