Front page | perl.perl5.porters |
Postings from May 2003
Re: Meaning of sysread()
From: Mark Mielke
May 21, 2003 13:48
Re: Meaning of sysread()
Message ID: 20030521205444.GC25674@mark.mielke.cc
On Wed, May 21, 2003 at 05:54:17PM +0100, Nick Ing-Simmons wrote:
> Mark Mielke <firstname.lastname@example.org> writes:
> >sysread() is documented ('man perlfunc') to '[attempt] to read LENGTH
> >characters of data ... [bypassing] buffered IO ...'. If Perl sysread()
> >is actually implemented using C read(), then it would be impossible to
> >read LENGTH characters of data, and only LENGTH characters of data
> >without calling C read() once for each byte. Remember - no buffering.
> You read LENGTH bytes and see how many characters that is.
> You find it is LENGTH-N chars so you read N more bytes and see where
> you are ...
What about swallowing an EOF off a TTY by accident? You read 4096, you
only get 192. You read again, you get 0. EOF is now swallowed, although
the caller has no idea.
Again, it is unlikely that this would cause problems in practice, since if
the characters were multi-byte, and sent atomically, or in the same band
as the EOF, it would be impossible to have a partial multi-byte character
at the end of a stream, however, it shows that this approach is clunky.
What if read() returns 4096 bytes, but only 4092 character, and the
following read() of 4 bytes ends with EINTR? Should read() return 4092
bytes with a a return value of 4092? How will the caller know that
EINTR was received? Why does this matter? Event loops may choose to
reset the loop if any system call returns EINTR, as it is probable
that the EINTR was in response to a signal, or some other asynchronous
event. If the event loop were to miss EINTR, it may continue to
process events at the same priority, deferring a high priority event
until later, because it isn't aware that it exists yet.
C read() wasn't intended to be used to read multi-byte characters. Any
sort of support for Perl sysread() implemented using C read() would
require an elaborate emulation layer which should be a strong hint that
it is the wrong approach.
> >For Perl read(), we don't care, because read() is buffered and is not
> >affected by the same performance issues mentioned earlier. Reading one
> >byte at a time is not expensive for buffered PerlIO handles.
> Nor necessary! as PerlIO can "unread" abitrary amount of data,
> and :encoding snoops the buffer a perl-ish manner, and ...
Unless select() is emulated as well to peek at the PerlIO buffers, this
is not true. This is the situation that I described where select() blocks,
but data is available.
> > - sysread() is supposed to be a more direct system read that avoids
> > the intermediate layers of processing. This definately includes STDIO,
> > and as far as I am concerned, it definatesly includes filtering.
> That is certainly my _personal_ view too - but as designer/maintainer
> I started this thread to (A) make sure that I was not alone in that view
> and (B) to try and collect arguments for the other side as well.
I don't want to give you arguments for the other side. :-)
> The ticket was caused by Net::FTP script using sysread() and perl5.6.1
> on Win32 did CRLF translation on that and it worked. 5.8 does not
> so file becomes CRCRLF on the wire and CRLF when de-ASCIIed at other end.
Net::FTP is broken. \x0D\x0A is what it should be expecting. It should
exactly implement the FTP specification (RFC's, ...). CRLF may or may not
be the same as \x0D\x0A as numerous people have wrote explanations about
in the past.
. . _ ._ . . .__ . . ._. .__ . . . .__ | Neighbourhood Coder
|\/| |_| |_| |/ |_ |\/| | |_ | |/ |_ |
| | | | | \ | \ |__ . | | .|. |__ |__ | \ |__ | Ottawa, Ontario, Canada
One ring to rule them all, one ring to find them, one ring to bring them all
and in the darkness bind them...