Front page | perl.perl5.porters |
Postings from November 2016
[perl #130199] Text::CSV::Encoded is incorrectly forced to parsewidechar
Thread Previous
|
Thread Next
From:
James E Keenan via RT
Date:
November 28, 2016 23:04
Subject:
[perl #130199] Text::CSV::Encoded is incorrectly forced to parsewidechar
Message ID:
rt-4.0.24-30716-1480374232-1358.130199-15-0@perl.org
On Mon, 28 Nov 2016 12:34:02 GMT, rafal@zorro.ztk-rp.eu wrote:
>
> This is a bug report for perl from rafal@zorro.ztk-rp.eu,
> generated with the help of perlbug 1.40 running under perl 5.20.2.
>
>
> -----------------------------------------------------------------
> [Please describe your issue here]
> After upgrading from debian-wheezy to debian-jessie HTML::Mason
> started
> to behave strangely with respect to UTF8 encoding. Earlier both web-
> pages
> and forms were working correctly (in UTF8) without any special setup.
> As
> of jessie with Apache 2.4 UTF8 no longer works.
> 1. I had to add binmode(STDOUT,'UTF8') to modules.
> 2. I had to decode_utf8($_) data from forms before passing them over
> to psql-db
> This report I file with example code of erratic behavior of
> Text::CSV::Encoded
> since I could narrow the problem to just a few lines of test-case.
>
> ========================
> #!/usr/bin/perl
> use Text::CSV::Encoded;
> open(my $FH, shift) or die "open";
> binmode($FH, ":encoding(cp1250) :raw :bytes");
> local $/ = "\r\n";
> my $csv = Text::CSV::Encoded->new ( { encoding_in => "cp1250",
> binary => 1, eol => $/, sep_char => ';',
> } ) or die "Cannot use CSV: ".Text::CSV->error_diag
> ();
> $\ = "\n";
> while ( <$FH> ) {
> s/\s+$//;
> print;
> if ($csv->parse( $_ )) {
> print $csv->fields();
> }
> }
> __END__
> 10;"SPӣDZIELNIA
> WARSZAWA";62;"TEST"
> ======================
>
> In this example:
> 1. the test file (provided "inline") as <DATA> contains two speciffic
> characters from CODE-PAGE-1250, one such char just after another.
> 1a. this test file IS-NOT UTF8 encoded.
> 2. the input stream is correctly marked as CP1250
> 3. the module gets correct information as to that file encoding
> ... and yet, the module complains about encoutering a "wide-char",
> which in
> the above setup should not ever be possible (I think).
>
> The result of the above program is:
> =======================
> $ ./wide-char test-input
> 10;"SPӣDZIELNIA
> WARSZAWA";62;"TEST"
> Wide character in subroutine entry at
> /usr/share/perl5/Text/CSV/Encoded/Coder/Encode.pm line 37, <$FH> chunk
> 1.
> $
> =======================
>
> This result is incorrect, since the file does not contain any "wide
> chars".
>
It appears that the file does indeed contain characters which satisfy the condition required for the "Wide characters" warning. Here's what pod/perldiag.pod in perl-5.24.0 says:
#####
=item Wide character in %s
(S utf8) Perl met a wide character (>255) when it wasn't expecting
one. This warning is by default on for I/O (like print). The easiest
way to quiet this warning is simply to add the C<:utf8> layer to the
output, e.g. C<binmode STDOUT, ':utf8'>. Another way to turn off the
warning is to add C<no warnings 'utf8';> but that is often closer to
cheating. In general, you are supposed to explicitly mark the
filehandle with an encoding, see L<open> and L<perlfunc/binmode>.
#####
If I put your test data into a file and run it through 'od -c', I observe two characters in the >255 range.
#####
$ od -c warsaw.txt
0000000 1 0 ; " S P 323 243 D Z I E L N I A
0000020 \n W A R S Z A W A " ; 6 2 ; " T
0000040 E S T " \n
0000045
#####
Text::CSV::Encoded is not part of the Perl 5 core distribution, so I think including it in the test script muddies the waters. Here's a pure Perl reduction:
#####
$ cat 2-130199-text-csv-encoded.pl
# perl
use strict;
use warnings;
open(my $FH, '<', 'warsaw.txt') or die "open";
binmode($FH, ":encoding(cp1250)");
while ( <$FH> ) {
s/\s+$//;
print "$_\n";
}
close $FH or die "close";
#####
$ perl 2-130199-text-csv-encoded.pl
Wide character in print at 2-130199-text-csv-encoded.pl line 9, <$FH> line 1.
10;"SPÓŁDZIELNIA
WARSZAWA";62;"TEST"
#####
I think that warning is appropriate. However, I concede that I don't have much experience with 'cp1250' so I'm unclear what the expected behavior is. Other people on list should comment.
Thank you very much.
---
via perlbug: queue: perl5 status: new
https://rt.perl.org/Ticket/Display.html?id=130199
Thread Previous
|
Thread Next