develooper Front page | perl.perl5.porters | Postings from July 2019

[perl #134238] sysread OFFSET is bytes or characters?

Thread Previous
Tony Cook via RT
July 10, 2019 00:45
[perl #134238] sysread OFFSET is bytes or characters?
Message ID:
On Sun, 30 Jun 2019 08:42:04 -0700, wrote:
> On Sun, 30 Jun 2019 08:34:00 -0700, wrote:
> > -----------------------------------------------------------------
> > The pod for sysread is ambiguous about OFFSET:
> Never mind, it's not that ambiguous.; the pod says it is chars if the
> filehandle is :utf8.
> *Please close this ticket*
> However it is still unclear how to use sysread to read from a pipe
> except in raw mode...
> If the pipe fh has :utf8 decoding, and the data buffered in the pipe
> stops in the
>  middle of a multi-byte character, what happens?
> Does perl internally read again (possibly blocking) until it gets a
> complete character?
> (I doubt it, and anyway that would cause problems when using select or
> non-blocking io)
> In general, reading from a pipe probably is always wrong except in raw
> mode (no decoding).
> In that case, the app has to use decode() with Encode::FB_QUIET to
> remove complete
>  characters and leave behind any partial-character octets, and then
> sysread must be
> called again with OFFSET set to append to those octets to get the rest
> of the incomplete
> character.  In that case OFFSET must clearly be measured in octets, as
> it is when the
> filehandle is raw.

From 5.30.0 sysread() on a :utf8 handle throws an exception.

Before that a sysread() on a :utf8 handle (or an :encodine(UTF-16BE) handle for example) would read() count bytes, if that didn't end up as count characters in utf8, it would read count-characterlen bytes, repeating until it had the full count() characters. 

Since it uses read() it bypasses any layers, so if the stream is actually UTF-16BE it treats those UTF-16BE bytes as utf8, and no validation is done.

All of that is why it now throws an exception.


via perlbug:  queue: perl5 status: new

Thread Previous Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at | Group listing | About