develooper Front page | perl.perl5.porters | Postings from February 2012

Re: [perl #79960] Setting $/ to read fixed records can corrupt validUTF-8 input

Thread Previous | Thread Next
From:
David Nicol
Date:
February 21, 2012 20:33
Subject:
Re: [perl #79960] Setting $/ to read fixed records can corrupt validUTF-8 input
Message ID:
CAFwScO-6PrPS0L09rthj-vOTVRZXxjCOmzZNZjizp9x3DxQ5Xw@mail.gmail.com
On Mon, Oct 24, 2011 at 1:54 PM, Eric Brine <ikegami@adaelis.com> wrote:
> ok, so we have:
>
> a: Refusing to read on UTF-8 file handles by croaking.
> b: Treat the record length as bytes. Croak on truncated UTF-8 results.
> c: Treat the record length as characters like read and sysread.
>
>
> - Only (c) can handle both records measured in bytes and records measured in
> chars.
> - Only (c) is consistent with read and sysread.

Why would anyone possibly want fixed-length records in chars? Because
they're packing a Twitter archive? if this use case really exists,
there should be a way to turn it on instead -- maybe by setting $/ to
a reference to an array of integers which will be interpreted as field
lengths in characters and cycled through.

> - (a) and (c) are more self-consistent than (b): One either deals with bytes
> or chars, not both at the same time.
>
> But:
>
> - Only (b) is backwards compatible with existing behaviour (although the
> behaviour isn't exactly documented).
>
> - Eric

d: b, but there is a way to turn off the croaking, and when it has
been turned off, the invalid segments get downgraded to bytes. ("no
strict utf8" perhaps?) The mechanism for making that adjustment is
named in the croak. Would (d) support the forward-looking case of
migrating a working legacy system handling packed records to a new
environment where the input streams are all chars instead, or a future
perl where -C7 is the default, with a minimum of maintenance? Would it
be better in that situation to require byte mode on the file  handle
in question? If so that's

e: a, plus also croak earlier, at compile time if possible, by doing
flow analysis: whenever the parser notices that $/ is going to get set
to a reference, hmm, that isn't practical.

Thread Previous | Thread Next


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About