develooper Front page | perl.perl5.porters | Postings from October 2012

[perl #115262] PerlIO::encoding produces malformed utf8

Thread Next
Father Chrysostomos via RT
October 14, 2012 14:51
[perl #115262] PerlIO::encoding produces malformed utf8
Message ID:
On Sun Oct 14 14:49:40 2012, sprout wrote:
> PerlIO::encoding passes invalid strings to encoding implementations.

A local mail server seemed to think this was spam and refused to send
the message until I had deleted the body.  Here it is in full:

use Encode::Encoding;
package footf8 {
  @ISA = Encode::Encoding;
  sub encode($$;$) {
    my ($self, $buf, $chk) = @_;
    use Devel::Peek;
    Dump $buf;
    undef $_[1] if $chk;
    utf8::encode $buf;
open $fh, ">encoding(foo-tf8)", \$s;
print $fh "a"x1023 . chr 256;

That script dumps two malformed scalars, because the output is split in
the middle of chr 256.

Encode::CN::HZ actually expects this and uses some arcane Perl code
(which looks straightforward, but you have to know internals to
understand it) to work around it.

Other pure-Perl encoding implementations included with don’t work:

open $fh, ">encoding(utf-7)", \$s;
print $fh "a"x1023 . chr 256;

That produces malformed UTF8 messages.

PerlIO::encoding should be caching the partial characters instead of
passing them to Perl code.

Site configuration information for perl 5.17.5:

Configured by sprout at Sat Sep 22 18:51:23 PDT 2012.

Summary of my perl5 (revision 5 version 17 subversion 5) configuration:
  Snapshot of: 451f421fe4742646fa2efbed0f45a19f0713d00f
    osname=darwin, osvers=10.5.0, archname=darwin-2level
    uname='darwin pint.local 10.5.0 darwin kernel version 10.5.0: fri
nov 5 23:20:39 pdt 2010; root:xnu-1504.9.17~1release_i386 i386 '
    config_args='-de -Dusedevel -DDEBUGGING'
    hint=recommended, useposix=true, d_sigaction=define
    useithreads=undef, usemultiplicity=undef
    useperlio=define, d_sfio=undef, uselargefiles=define, usesocks=undef
    use64bitint=undef, use64bitall=undef, uselongdouble=undef
    usemymalloc=n, bincompat5005=undef
    cc='cc', ccflags ='-fno-common -DPERL_DARWIN -DDEBUGGING
-fno-strict-aliasing -pipe -fstack-protector -I/usr/local/include',
    optimize='-O3 -g',
    cppflags='-fno-common -DPERL_DARWIN -DDEBUGGING -fno-strict-aliasing
-pipe -fstack-protector -I/usr/local/include'
    ccversion='', gccversion='4.2.1 (Apple Inc. build 5664)',
    intsize=4, longsize=4, ptrsize=4, doublesize=8, byteorder=1234
    d_longlong=define, longlongsize=8, d_longdbl=define, longdblsize=16
    ivtype='long', ivsize=4, nvtype='double', nvsize=8, Off_t='off_t',
    alignbytes=8, prototype=define
  Linker and Libraries:
    ld='env MACOSX_DEPLOYMENT_TARGET=10.3 cc', ldflags ='
-fstack-protector -L/usr/local/lib'
    libpth=/usr/local/lib /usr/lib
    libs=-ldbm -ldl -lm -lutil -lc
    perllibs=-ldl -lm -lutil -lc
    libc=, so=dylib, useshrplib=false, libperl=libperl.a
  Dynamic Linking:
    dlsrc=dl_dlopen.xs, dlext=bundle, d_dlsymun=undef, ccdlflags=' '
    cccdlflags=' ', lddlflags=' -bundle -undefined dynamic_lookup
-L/usr/local/lib -fstack-protector'

Locally applied patches:

@INC for perl 5.17.5:

Environment for perl 5.17.5:
    LANGUAGE (unset)
    LD_LIBRARY_PATH (unset)
    LOGDIR (unset)
    PERL_BADLANG (unset)


Father Chrysostomos

Thread Next Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at | Group listing | About