develooper Front page | perl.perl5.porters | Postings from November 2016

[perl #130199] Text::CSV::Encoded is incorrectly forced to parsewidechar

Thread Previous | Thread Next
From:
perlbug-followup
Date:
November 28, 2016 18:25
Subject:
[perl #130199] Text::CSV::Encoded is incorrectly forced to parsewidechar
Message ID:
rt-4.0.24-24998-1480336442-250.130199-75-0@perl.org
# New Ticket Created by   
# Please include the string:  [perl #130199]
# in the subject line of all future correspondence about this issue. 
# <URL: https://rt.perl.org/Ticket/Display.html?id=130199 >



This is a bug report for perl from rafal@zorro.ztk-rp.eu,
generated with the help of perlbug 1.40 running under perl 5.20.2.


-----------------------------------------------------------------
[Please describe your issue here]
After upgrading from debian-wheezy to debian-jessie HTML::Mason started
to behave strangely with respect to UTF8 encoding. Earlier both web-pages
and forms were working correctly (in UTF8) without any special setup. As
of jessie with Apache 2.4 UTF8 no longer works.
1. I had to add binmode(STDOUT,'UTF8') to modules.
2. I had to decode_utf8($_) data from forms before passing them over
to psql-db
This report I file with example code of erratic behavior of Text::CSV::Encoded
since I could narrow the problem to just a few lines of test-case.

========================
#!/usr/bin/perl
use Text::CSV::Encoded;
open(my $FH, shift) or die "open";
binmode($FH, ":encoding(cp1250) :raw :bytes");
local $/ = "\r\n";
my $csv = Text::CSV::Encoded->new ( { encoding_in  => "cp1250",
                        binary => 1, eol => $/, sep_char => ';',
                } ) or die "Cannot use CSV: ".Text::CSV->error_diag ();
$\ = "\n";
while ( <$FH> ) {
	s/\s+$//;
	print;
	if ($csv->parse( $_ )) {
		print $csv->fields();
	}
}
__END__
10;"SPÓ£DZIELNIA
WARSZAWA";62;"TEST"
======================

In this example:
1. the test file (provided "inline") as <DATA> contains two speciffic
characters from CODE-PAGE-1250, one such char just after another.
1a. this test file IS-NOT UTF8 encoded.
2. the input stream is correctly marked as CP1250
3. the module gets correct information as to that file encoding
... and yet, the module complains about encoutering a "wide-char", which in
the above setup should not ever be possible (I think).

The result of the above program is:
=======================
$ ./wide-char test-input 
10;"SPÓ£DZIELNIA
WARSZAWA";62;"TEST"
Wide character in subroutine entry at /usr/share/perl5/Text/CSV/Encoded/Coder/Encode.pm line 37, <$FH> chunk 1.
$
=======================

This result is incorrect, since the file does not contain any "wide chars".

[Please do not change anything below this line]
-----------------------------------------------------------------
---
Flags:
    category=core
    severity=high
---
Site configuration information for perl 5.20.2:

Configured by Debian Project at Fri Jul 22 15:47:27 UTC 2016.

Summary of my perl5 (revision 5 version 20 subversion 2) configuration:
   
  Platform:
    osname=linux, osvers=3.16.0-4-amd64, archname=x86_64-linux-gnu-thread-multi
    uname='linux himalia 3.16.0-4-amd64 #1 smp debian 3.16.7-ckt25-2+deb8u3 (2016-07-02) x86_64 gnulinux '
    config_args='-Dusethreads -Duselargefiles -Dccflags=-DDEBIAN -D_FORTIFY_SOURCE=2 -g -O2 -fstack-protector-strong -Wformat -Werror=format-security -Dldflags= -Wl,-z,relro -Dlddlflags=-shared -Wl,-z,relro -Dcccdlflags=-fPIC -Darchname=x86_64-linux-gnu -Dprefix=/usr -Dprivlib=/usr/share/perl/5.20 -Darchlib=/usr/lib/x86_64-linux-gnu/perl/5.20 -Dvendorprefix=/usr -Dvendorlib=/usr/share/perl5 -Dvendorarch=/usr/lib/x86_64-linux-gnu/perl5/5.20 -Dsiteprefix=/usr/local -Dsitelib=/usr/local/share/perl/5.20.2 -Dsitearch=/usr/local/lib/x86_64-linux-gnu/perl/5.20.2 -Dman1dir=/usr/share/man/man1 -Dman3dir=/usr/share/man/man3 -Dsiteman1dir=/usr/local/man/man1 -Dsiteman3dir=/usr/local/man/man3 -Dusesitecustomize -Duse64bitint -Dman1ext=1 -Dman3ext=3perl -Dpager=/usr/bin/sensible-pager -Uafs -Ud_csh -Ud_ualarm -Uusesfio -Uusenm -Ui_libutil -Uversiononly -DDEBUGGING=-g -Doptimize=-O2 -Duseshrplib -Dlibperl=libperl.so.5.20.2 -des'
    hint=recommended, useposix=true, d_sigaction=define
    useithreads=define, usemultiplicity=define
    use64bitint=define, use64bitall=define, uselongdouble=undef
    usemymalloc=n, bincompat5005=undef
  Compiler:
    cc='cc', ccflags ='-D_REENTRANT -D_GNU_SOURCE -DDEBIAN -fwrapv -fno-strict-aliasing -pipe -I/usr/local/include -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64',
    optimize='-O2 -g',
    cppflags='-D_REENTRANT -D_GNU_SOURCE -DDEBIAN -fwrapv -fno-strict-aliasing -pipe -I/usr/local/include'
    ccversion='', gccversion='4.9.2', gccosandvers=''
    intsize=4, longsize=8, ptrsize=8, doublesize=8, byteorder=12345678
    d_longlong=define, longlongsize=8, d_longdbl=define, longdblsize=16
    ivtype='long', ivsize=8, nvtype='double', nvsize=8, Off_t='off_t', lseeksize=8
    alignbytes=8, prototype=define
  Linker and Libraries:
    ld='cc', ldflags =' -fstack-protector -L/usr/local/lib'
    libpth=/usr/local/lib /usr/lib/gcc/x86_64-linux-gnu/4.9/include-fixed /usr/include/x86_64-linux-gnu /usr/lib /lib/x86_64-linux-gnu /lib/../lib /usr/lib/x86_64-linux-gnu /usr/lib/../lib /lib
    libs=-lgdbm -lgdbm_compat -ldb -ldl -lm -lpthread -lc -lcrypt
    perllibs=-ldl -lm -lpthread -lc -lcrypt
    libc=libc-2.19.so, so=so, useshrplib=true, libperl=libperl.so.5.20
    gnulibc_version='2.19'
  Dynamic Linking:
    dlsrc=dl_dlopen.xs, dlext=so, d_dlsymun=undef, ccdlflags='-Wl,-E'
    cccdlflags='-fPIC', lddlflags='-shared -L/usr/local/lib -fstack-protector'

Locally applied patches:
    DEBPKG:debian/cpan_definstalldirs - Provide a sensible INSTALLDIRS default for modules installed from CPAN.
    DEBPKG:debian/db_file_ver - http://bugs.debian.org/340047 Remove overly restrictive DB_File version check.
    DEBPKG:debian/doc_info - Replace generic man(1) instructions with Debian-specific information.
    DEBPKG:debian/enc2xs_inc - http://bugs.debian.org/290336 Tweak enc2xs to follow symlinks and ignore missing @INC directories.
    DEBPKG:debian/errno_ver - http://bugs.debian.org/343351 Remove Errno version check due to upgrade problems with long-running processes.
    DEBPKG:debian/libperl_embed_doc - http://bugs.debian.org/186778 Note that libperl-dev package is required for embedded linking
    DEBPKG:fixes/respect_umask - Respect umask during installation
    DEBPKG:debian/writable_site_dirs - Set umask approproately for site install directories
    DEBPKG:debian/extutils_set_libperl_path - EU:MM: set location of libperl.a under /usr/lib
    DEBPKG:debian/no_packlist_perllocal - Don't install .packlist or perllocal.pod for perl or vendor
    DEBPKG:debian/prefix_changes - Fiddle with *PREFIX and variables written to the makefile
    DEBPKG:debian/fakeroot - Postpone LD_LIBRARY_PATH evaluation to the binary targets.
    DEBPKG:debian/instmodsh_doc - Debian policy doesn't install .packlist files for core or vendor.
    DEBPKG:debian/ld_run_path - Remove standard libs from LD_RUN_PATH as per Debian policy.
    DEBPKG:debian/libnet_config_path - Set location of libnet.cfg to /etc/perl/Net as /usr may not be writable.
    DEBPKG:debian/mod_paths - Tweak @INC ordering for Debian
    DEBPKG:debian/module_build_man_extensions - http://bugs.debian.org/479460 Adjust Module::Build manual page extensions for the Debian Perl policy
    DEBPKG:debian/prune_libs - http://bugs.debian.org/128355 Prune the list of libraries wanted to what we actually need.
    DEBPKG:fixes/net_smtp_docs - [rt.cpan.org #36038] http://bugs.debian.org/100195 Document the Net::SMTP 'Port' option
    DEBPKG:debian/perlivp - http://bugs.debian.org/510895 Make perlivp skip include directories in /usr/local
    DEBPKG:debian/deprecate-with-apt - http://bugs.debian.org/747628 Point users to Debian packages of deprecated core modules
    DEBPKG:debian/squelch-locale-warnings - http://bugs.debian.org/508764 Squelch locale warnings in Debian package maintainer scripts
    DEBPKG:debian/skip-upstream-git-tests - Skip tests specific to the upstream Git repository
    DEBPKG:debian/patchlevel - http://bugs.debian.org/567489 List packaged patches for 5.20.2-3+deb8u6 in patchlevel.h
    DEBPKG:debian/skip-kfreebsd-crash - http://bugs.debian.org/628493 [perl #96272] Skip a crashing test case in t/op/threads.t on GNU/kFreeBSD
    DEBPKG:fixes/document_makemaker_ccflags - http://bugs.debian.org/628522 [rt.cpan.org #68613] Document that CCFLAGS should include $Config{ccflags}
    DEBPKG:debian/find_html2text - http://bugs.debian.org/640479 Configure CPAN::Distribution with correct name of html2text
    DEBPKG:debian/perl5db-x-terminal-emulator.patch - http://bugs.debian.org/668490 Invoke x-terminal-emulator rather than xterm in perl5db.pl
    DEBPKG:debian/cpan-missing-site-dirs - http://bugs.debian.org/688842 Fix CPAN::FirstTime defaults with nonexisting site dirs if a parent is writable
    DEBPKG:fixes/memoize_storable_nstore - [rt.cpan.org #77790] http://bugs.debian.org/587650 Memoize::Storable: respect 'nstore' option not respected
    DEBPKG:debian/regen-skip - Skip a regeneration check in unrelated git repositories
    DEBPKG:fixes/regcomp-mips-optim - [perl #122817] http://bugs.debian.org/754054 Downgrade the optimization of regcomp.c on mips and mipsel due to a gcc-4.9 bug
    DEBPKG:debian/makemaker-pasthru - http://bugs.debian.org/758471 Pass LD settings through to subdirectories
    DEBPKG:fixes/perldoc-less-R - [rt.cpan.org #98636] http://bugs.debian.org/758689 Tell the 'less' pager to allow terminal escape sequences
    DEBPKG:fixes/pod_man_reproducible_date - http://bugs.debian.org/759405 Support POD_MAN_DATE in Pod::Man for the left-hand footer
    DEBPKG:fixes/io_uncompress_gunzip_inmemory - http://bugs.debian.org/747363 [rt.cpan.org #95494] Fix gunzip to in-memory file handle
    DEBPKG:fixes/socket_test_recv_fix - http://bugs.debian.org/758718 [perl #122657] Compare recv return value to peername in socket test
    DEBPKG:fixes/hurd_socket_recv_todo - http://bugs.debian.org/758718 [perl #122657] TODO checking the result of recv() on hurd
    DEBPKG:fixes/regexp-performance - [0fa70a0] http://bugs.debian.org/777556 [perl #123743] simpify and speed up /.*.../ handling
    DEBPKG:fixes/failed_require_diagnostics - http://bugs.debian.org/781120 [perl #123270] Report inaccesible file on failed require
    DEBPKG:fixes/array-cloning - http://bugs.debian.org/779357 [perl #124127] [902d169] fix cloning arrays with unused elements
    DEBPKG:fixes/perldb-threads - http://bugs.debian.org/779357 [perl #124127] [41ef2c6] lib/perl5db.pl: Restore noop lock prototype
    DEBPKG:fixes/CVE-2015-8607_file_spec_taint_fix - ensure File::Spec::canonpath() preserves taint
    DEBPKG:fixes/encode-unicode-bom - http://bugs.debian.org/798727 [rt.cpan.org #107043] Address https://rt.cpan.org/Public/Bug/Display.html?id=107043
    DEBPKG:debian/encode-unicode-bom-doc - http://bugs.debian.org/798727 Document Debian backport of Encode::Unicode fix
    DEBPKG:debian/kfreebsd-softupdates - http://bugs.debian.org/796798 Work around Debian Bug#796798
    DEBPKG:fixes/CVE-2016-2381_duplicate_env - remove duplicate environment variables from environ
    DEBPKG:debian/debugperl-compat-fix - [perl #127212] http://bugs.debian.org/810326 Disable PERL_TRACK_MEMPOOL for debugging builds
    DEBPKG:fixes/CVE-2015-8853_regexp_hang - http://bugs.debian.org/821848 [perl #123562] PATCH [perl #123562] Regexp-matching "hangs"
    DEBPKG:fixes/utf8_regexp_crash - http://bugs.debian.org/820328 [perl #124109] save_re_context(): do "local $n" with no PL_curpm
    DEBPKG:fixes/regcomp_whitespace_fix - http://bugs.debian.org/820328 [perl #124109] Perl_save_re_context(): re-indent after last commit
    DEBPKG:fixes/5.20.3/eval_label_crash - http://bugs.debian.org/822336 [perl #123652] eval {label:} crash
    DEBPKG:fixes/5.20.3/preserve_record_separator - http://bugs.debian.org/822336 [perl #123218] "preserve" $/ if set to a bad value
    DEBPKG:fixes/5.20.3/test_count_base_rs - http://bugs.debian.org/822336 Fix test count in t/base/rs.t
    DEBPKG:fixes/5.20.3/remove_get_magic - http://bugs.debian.org/822336 [perl #123739] Remove get-magic from $/
    DEBPKG:fixes/5.20.3/speed_up_scalar_g - http://bugs.debian.org/822336 [perl #123202] speed up scalar //g against tainted strings
    DEBPKG:fixes/5.20.3/accidental_all_features - http://bugs.debian.org/822336 Stop $^H |= 0x1c020000 from enabling all features
    DEBPKG:fixes/5.20.3/multidimensional_arrays_utf8 - http://bugs.debian.org/822336 [perl #124113] Make check for multi-dimensional arrays be UTF8-aware
    DEBPKG:fixes/5.20.3/unquoted_utf8_heredoc_terminators - http://bugs.debian.org/822336 Allow unquoted UTF-8 HERE-document terminators
    DEBPKG:fixes/5.20.3/parentheses_ambiguous_warning_utf8_functions - http://bugs.debian.org/822336 Fix "...without parentheses is ambuguous" warning for UTF-8 function names
    DEBPKG:fixes/5.20.3/leak_namepv_copy - http://bugs.debian.org/822336 [perl #123786] don't leak the temp utf8 copy of namepv
    DEBPKG:fixes/5.20.3/h2ph_hex_constants - http://bugs.debian.org/822336 h2ph: correct handling of hex constants for the preamble
    DEBPKG:fixes/5.20.3/leftbracket_XTERMORDORDOR - http://bugs.debian.org/822336 [perl #123711] Fix crash with 0-5x-l{0}
    DEBPKG:fixes/5.20.3/fatalize_warnings_unwinding - http://bugs.debian.org/822336 [perl #123398] don't fatalize warnings during unwinding (#123398)
    DEBPKG:fixes/5.20.3/setpgrp - http://bugs.debian.org/822336 =?UTF-8?q?Don=E2=80=99t=20treat=20setpgrp($nonzero)=20as=20setpgr?= =?UTF-8?q?p(1)?=
    DEBPKG:fixes/5.20.3/death_unwinding_crash - http://bugs.debian.org/822336 [perl #124156] RT #124156: death during unwinding causes crash
    DEBPKG:fixes/5.20.3/stashpvn_crash - http://bugs.debian.org/822336 [perl #125541] Fix crash with %::=(); J->${\"::"}
    DEBPKG:fixes/5.20.3/possessive_quantifier - http://bugs.debian.org/822336 [perl #125825] PATCH: [perl 125825] {n}+ possessive quantifier broken
    DEBPKG:fixes/5.20.3/quoted_code_crash - http://bugs.debian.org/822336 [perl #123712] Fix /$a[/ parsing
    DEBPKG:fixes/5.20.3/checking_sub_inwhat - http://bugs.debian.org/822336 [perl #123712] Don't check sub_inwhat
    DEBPKG:fixes/5.20.3/yylex_loop - http://bugs.debian.org/822336 Fix hang with "@{"
    DEBPKG:fixes/5.20.3/docs/op - http://bugs.debian.org/822336 Fix apidocs for OP_TYPE_IS(_OR_WAS) - arguments separated by |, not ,.
    DEBPKG:fixes/5.20.3/docs/encoding - http://bugs.debian.org/822336 perlpodspec: Corrections/adds to detecting =encoding
    DEBPKG:fixes/5.20.3/docs/SvPV_set - http://bugs.debian.org/822336 improve SvPV_set's docs, it really shouldn't be public API
    DEBPKG:fixes/5.20.3/docs/autodie - http://bugs.debian.org/822336 Fix warning message regarding "use autodie" and "use open".
    DEBPKG:fixes/5.20.3/docs/autodie_2_26 - http://bugs.debian.org/822336 perlunicook: Note that autodie >= 2.26 should be okay with "use open".
    DEBPKG:fixes/5.20.3/docs/setenv - http://bugs.debian.org/822336 Fix setenv() replacement documentation in perlclib
    DEBPKG:fixes/5.20.3/docs/clib_caution - http://bugs.debian.org/822336 perlhacktips: Add caution about clib ptr returns to static memory
    DEBPKG:fixes/5.20.3/docs/perlunicook_typos - http://bugs.debian.org/822336 Fix minor code typos in perlunicook
    DEBPKG:fixes/5.20.3/docs/ook_example - http://bugs.debian.org/822336 [perl #122322] Update OOK example in perlguts
    DEBPKG:fixes/5.20.3/docs/study_noop - http://bugs.debian.org/822336 perlfunc: mention that study() is currently a noop
    DEBPKG:fixes/CVE-2016-1238/remove-dot-when-loading - [perl #127834] (perl #127834) remove . from the end of @INC if complex modules are loaded
    DEBPKG:fixes/CVE-2016-1238/remove-dot-in-padwalker - [perl #127834] perl5db.pl: ensure PadWalker is loaded from standard paths
    DEBPKG:fixes/CVE-2016-1238/remove-dot-in-dist - [perl #127834] dist/: remove . from @INC when loading optional modules
    DEBPKG:fixes/CVE-2016-1238/remove-dot-in-cpan - [perl #127834] cpan/: remove . from @INC when loading optional modules
    DEBPKG:fixes/CVE-2016-1238/customized-encode - Update customized.dat for cpan/Encode/Encode.pm
    DEBPKG:debian/CVE-2016-1238/test-suite-without-dot - [perl #127810] Patch unit tests to explicitly insert "." into @INC when needed.
    DEBPKG:debian/CVE-2016-1238/eumm-without-dot - [perl #127810] Add PERL_USE_UNSAFE_INC support to EU::MM for fortify_inc support.
    DEBPKG:debian/CVE-2016-1238/cpan-without-dot - [perl #127810] Set PERL_USE_UNSAFE_INC for cpan usage
    DEBPKG:debian/CVE-2016-1238/mb-without-dot - Make Module::Build set PERL_USE_UNSAFE_INC
    DEBPKG:debian/CVE-2016-1238/sitecustomize-in-etc - Look for sitecustomize.pl in /etc/perl rather than sitelib on Debian systems
    DEBPKG:fixes/xsloader-eval - [rt.cpan.org #115808] http://bugs.debian.org/829578 =?UTF-8?q?Don=E2=80=99t=20let=20XSLoader=20load=20relative=20path?= =?UTF-8?q?s?=

---
@INC for perl 5.20.2:
    /etc/perl
    /usr/local/lib/x86_64-linux-gnu/perl/5.20.2
    /usr/local/share/perl/5.20.2
    /usr/lib/x86_64-linux-gnu/perl5/5.20
    /usr/share/perl5
    /usr/lib/x86_64-linux-gnu/perl/5.20
    /usr/share/perl/5.20
    /usr/local/lib/site_perl

---
Environment for perl 5.20.2:
    HOME=/home/rafal
    LANG=pl_PL.utf8
    LANGUAGE=en_US:en
    LD_LIBRARY_PATH (unset)
    LOGDIR (unset)
    PATH=/usr/local/bin:/usr/bin:/bin:/usr/local/games:/usr/games:/home/rafal/bin
    PERL_BADLANG (unset)
    SHELL=/bin/bash


Thread Previous | Thread Next


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About