develooper Front page | perl.perl5.porters | Postings from November 2016

[perl #130199] Text::CSV::Encoded is incorrectly forced to parsewidechar

Thread Next
From:
slaven@rezic.de via RT
Date:
November 29, 2016 08:24
Subject:
[perl #130199] Text::CSV::Encoded is incorrectly forced to parsewidechar
Message ID:
rt-4.0.24-23498-1480407853-1946.130199-15-0@perl.org
Dana Mon, 28 Nov 2016 04:34:02 -0800, rafal@zorro.ztk-rp.eu reče:
> 
> This is a bug report for perl from rafal@zorro.ztk-rp.eu,
> generated with the help of perlbug 1.40 running under perl 5.20.2.
> 
> 
> -----------------------------------------------------------------
> [Please describe your issue here]
> After upgrading from debian-wheezy to debian-jessie HTML::Mason
> started
> to behave strangely with respect to UTF8 encoding. Earlier both web-
> pages
> and forms were working correctly (in UTF8) without any special setup.
> As
> of jessie with Apache 2.4 UTF8 no longer works.
> 1. I had to add binmode(STDOUT,'UTF8') to modules.
> 2. I had to decode_utf8($_) data from forms before passing them over
> to psql-db
> This report I file with example code of erratic behavior of
> Text::CSV::Encoded
> since I could narrow the problem to just a few lines of test-case.
> 
> ========================
> #!/usr/bin/perl
> use Text::CSV::Encoded;
> open(my $FH, shift) or die "open";
> binmode($FH, ":encoding(cp1250) :raw :bytes");
> local $/ = "\r\n";
> my $csv = Text::CSV::Encoded->new ( { encoding_in  => "cp1250",
>                         binary => 1, eol => $/, sep_char => ';',
>                 } ) or die "Cannot use CSV: ".Text::CSV->error_diag
> ();
> $\ = "\n";
> while ( <$FH> ) {
>         s/\s+$//;
>         print;
>         if ($csv->parse( $_ )) {
>                 print $csv->fields();
>         }
> }
> __END__
> 10;"SPӣDZIELNIA
> WARSZAWA";62;"TEST"
> ======================
> 
> In this example:
> 1. the test file (provided "inline") as <DATA> contains two speciffic
> characters from CODE-PAGE-1250, one such char just after another.
> 1a. this test file IS-NOT UTF8 encoded.
> 2. the input stream is correctly marked as CP1250
> 3. the module gets correct information as to that file encoding
> ... and yet, the module complains about encoutering a "wide-char",
> which in
> the above setup should not ever be possible (I think).
> 
> The result of the above program is:
> =======================
> $ ./wide-char test-input
> 10;"SPӣDZIELNIA
> WARSZAWA";62;"TEST"
> Wide character in subroutine entry at
> /usr/share/perl5/Text/CSV/Encoded/Coder/Encode.pm line 37, <$FH> chunk
> 1.
> $
> =======================
> 
> This result is incorrect, since the file does not contain any "wide
> chars".
> 
> [Please do not change anything below this line]
> -----------------------------------------------------------------

As it seems to make a difference if the CSV file has DOS or UNIX newlines --- can you attach the sample file? (In any case, either with DOS or UNIX newlines I don't see different behavior between Debian's perl in wheezy and jessie)

---
via perlbug:  queue: perl5 status: open
https://rt.perl.org/Ticket/Display.html?id=130199

Thread Next


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About