develooper Front page | perl.perl5.porters | Postings from September 2011

Re: [perl #100058] Perl leaves broken UTF-8 in SVs whose UTF8 isset

Thread Previous | Thread Next
From:
Karl Williamson
Date:
September 28, 2011 18:03
Subject:
Re: [perl #100058] Perl leaves broken UTF-8 in SVs whose UTF8 isset
Message ID:
4E83C31A.9010103@khwilliamson.com
On 09/28/2011 05:56 PM, Tom Christiansen wrote:
> Karl Williamson<public@khwilliamson.com>  wrote
>     on Wed, 28 Sep 2011 17:35:01 MDT:
>
>> I do think that the buffer length should only be construed as bytes and
>> not characters.
>
> Could you please explain why you think that?
>
> Why not have
>
>      binmode(FH, ":utf8");
>      $/ = \1000;
>      $_ =<FH>;
>
> mean
>
>      binmode(FH, ":utf8");
>      read(FH, $_, 1000);
>
> I vagule feel like you should never have byte operation
> on an encoded stream.
>
> But maybe I'm wrong.
>
> --tom
>

I found this persuasive (from the original ticket) "Or we could try to 
do what read and sysread do, and treat the length parameter as 
characters, so that on a UTF-8 flagged handle we loop until we read in
sufficient characters. But that blows the idea of "record based" 
completely on a UTF-8 handle."

I would also be ok with just croaking when attempting a byte-type 
operation on an encoded string.


Thread Previous | Thread Next


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About