develooper Front page | perl.perl5.porters | Postings from October 2011

[perl #79960] Setting $/ to read fixed records can corrupt valid UTF-8 input

Thread Next
From:
Father Chrysostomos via RT
Date:
October 23, 2011 13:57
Subject:
[perl #79960] Setting $/ to read fixed records can corrupt valid UTF-8 input
Message ID:
rt-3.6.HEAD-31297-1319403417-774.79960-15-0@perl.org
On Sun Oct 02 22:24:46 2011, ikegami@adaelis.com wrote:
> On Mon, Nov 29, 2010 at 11:05 AM, Nicholas Clark
> <perlbug-followup@perl.org>wrote:
> 
> > Or we could try to do what read and sysread do, and treat the length
> > parameter
> > as characters, so that on a UTF-8 flagged handle we loop until we
> read in
> > sufficient characters. But that blows the idea of "record based"
> completely
> > on a UTF-8 handle.
> >
> 
> It seems that the implication is that (a) or (b) would somehow allow
> records
> over UTF-8 handles. Let's name working like read "(c)".
> 
> --
> 
> Scenario 1: Let's say a record consists of two 5 byte fields of UTF-8
> text,
> and that :utf8 is being used on the handle. Let's say one of the
> records is
> C3 A9 20 20 20 41 42 43 20 20.
> 
> a) Croaks.
> b) Produces "�   ABC  " which cannot be parsed into two fields
> c) One byte too many is returned.
> 
> (b) and (c) aren't useful.
> 
> --
> 
> Scenario 2: Let's say a record consists of two 5 character fields, and
> that
> :utf8 is being used on the handle. Let's say one of the records is C3
> A9 20
> 20 20 20 41 42 43 20 20.
> 
> a) Croaks.
> b) One byte too few is returned.
> c) Produces the fields when passed through C<< decode "UTF-8", unpack
> "A5A5"
> $_ >>.
> 
> (c) is useful, but (b) isn't.
> 
> --
> 
> So:
> (b) is never useful with :utf8 handles

It can be useful if the records are all single fields.


> (c) is sometimes useful with :utf8 handles
> 
> Since (b) and (c) behave the same on binary handles, it seems to me
> that (c)
> is clearly superior to (b).
> 
> Between (a) and (c), I prefer (c) since I don't see enough
> justification to
> artificially restrict what the user can do by giving them (a). I have
> expected $/ to behave like read in the past.
> 
> - Eric




Thread Next


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About