On Sun Oct 02 22:24:46 2011, ikegami@adaelis.com wrote: > On Mon, Nov 29, 2010 at 11:05 AM, Nicholas Clark > <perlbug-followup@perl.org>wrote: > > > Or we could try to do what read and sysread do, and treat the length > > parameter > > as characters, so that on a UTF-8 flagged handle we loop until we > read in > > sufficient characters. But that blows the idea of "record based" > completely > > on a UTF-8 handle. > > > > It seems that the implication is that (a) or (b) would somehow allow > records > over UTF-8 handles. Let's name working like read "(c)". > > -- > > Scenario 1: Let's say a record consists of two 5 byte fields of UTF-8 > text, > and that :utf8 is being used on the handle. Let's say one of the > records is > C3 A9 20 20 20 41 42 43 20 20. > > a) Croaks. > b) Produces "� ABC " which cannot be parsed into two fields > c) One byte too many is returned. > > (b) and (c) aren't useful. > > -- > > Scenario 2: Let's say a record consists of two 5 character fields, and > that > :utf8 is being used on the handle. Let's say one of the records is C3 > A9 20 > 20 20 20 41 42 43 20 20. > > a) Croaks. > b) One byte too few is returned. > c) Produces the fields when passed through C<< decode "UTF-8", unpack > "A5A5" > $_ >>. > > (c) is useful, but (b) isn't. > > -- > > So: > (b) is never useful with :utf8 handles It can be useful if the records are all single fields. > (c) is sometimes useful with :utf8 handles > > Since (b) and (c) behave the same on binary handles, it seems to me > that (c) > is clearly superior to (b). > > Between (a) and (c), I prefer (c) since I don't see enough > justification to > artificially restrict what the user can do by giving them (a). I have > expected $/ to behave like read in the past. > > - EricThread Next