On Sun, 30 Jun 2019 08:42:04 -0700, jim.avera@gmail.com wrote: > On Sun, 30 Jun 2019 08:34:00 -0700, jim.avera@gmail.com wrote: > > ----------------------------------------------------------------- > > The pod for sysread is ambiguous about OFFSET: > > Never mind, it's not that ambiguous.; the pod says it is chars if the > filehandle is :utf8. > *Please close this ticket* > > However it is still unclear how to use sysread to read from a pipe > except in raw mode... > If the pipe fh has :utf8 decoding, and the data buffered in the pipe > stops in the > middle of a multi-byte character, what happens? > > Does perl internally read again (possibly blocking) until it gets a > complete character? > (I doubt it, and anyway that would cause problems when using select or > non-blocking io) > > In general, reading from a pipe probably is always wrong except in raw > mode (no decoding). > In that case, the app has to use decode() with Encode::FB_QUIET to > remove complete > characters and leave behind any partial-character octets, and then > sysread must be > called again with OFFSET set to append to those octets to get the rest > of the incomplete > character. In that case OFFSET must clearly be measured in octets, as > it is when the > filehandle is raw. From 5.30.0 sysread() on a :utf8 handle throws an exception. Before that a sysread() on a :utf8 handle (or an :encodine(UTF-16BE) handle for example) would read() count bytes, if that didn't end up as count characters in utf8, it would read count-characterlen bytes, repeating until it had the full count() characters. Since it uses read() it bypasses any layers, so if the stream is actually UTF-16BE it treats those UTF-16BE bytes as utf8, and no validation is done. All of that is why it now throws an exception. Tony --- via perlbug: queue: perl5 status: new https://rt.perl.org/Ticket/Display.html?id=134238Thread Previous