develooper Front page | perl.perl5.porters | Postings from June 2013

Re: [perl #115262] PerlIO::encoding produces malformed utf8

Thread Previous | Thread Next
From:
Leon Timmermans
Date:
June 22, 2013 21:33
Subject:
Re: [perl #115262] PerlIO::encoding produces malformed utf8
Message ID:
CAHhgV8hVcRxmFF25goun-1VYzEkodhpzze65t6DaGi_7ZWKEJA@mail.gmail.com
On Sun, Oct 14, 2012 at 11:50 PM, Father Chrysostomos via RT
<perlbug-comment@perl.org> wrote:
> use Encode::Encoding;
> package footf8 {
>   @ISA = Encode::Encoding;
>  __PACKAGE__->Define('foo-tf8');
>   sub encode($$;$) {
>     my ($self, $buf, $chk) = @_;
>     use Devel::Peek;
>     Dump $buf;
>     undef $_[1] if $chk;
>     utf8::encode $buf;
>     $buf
>   }
> }
> open $fh, ">encoding(foo-tf8)", \$s;
> print $fh "a"x1023 . chr 256;
> __END__
>
> That script dumps two malformed scalars, because the output is split in
> the middle of chr 256.
>
> Encode::CN::HZ actually expects this and uses some arcane Perl code
> (which looks straightforward, but you have to know internals to
> understand it) to work around it.
>
> Other pure-Perl encoding implementations included with Encode.pm don’t work:
>
> open $fh, ">encoding(utf-7)", \$s;
> print $fh "a"x1023 . chr 256;
> __END__
>
> That produces malformed UTF8 messages.
>
> PerlIO::encoding should be caching the partial characters instead of
> passing them to Perl code.

Yeah, this is the general design of the system. PerlIO doesn't do
characters, it does bytes. While you're right it could emulate
character semantics in Write(), it wouldn't be able to do the same in
Read() in variable-length encodings anyway, so the point is a bit
moot.

Leon

Thread Previous | Thread Next


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About