Front page | perl.perl5.porters |
Postings from December 2013
[perl #120797] tell / getline problems on Win32 with unix-delimited files opened with encoding(UTF-8)
Thread Next
From:
Christian Millour
Date:
December 16, 2013 18:17
Subject:
[perl #120797] tell / getline problems on Win32 with unix-delimited files opened with encoding(UTF-8)
Message ID:
rt-4.0.18-18205-1387157803-1440.120797-75-0@perl.org
# New Ticket Created by Christian Millour
# Please include the string: [perl #120797]
# in the subject line of all future correspondence about this issue.
# <URL: https://rt.perl.org/Ticket/Display.html?id=120797 >
This is a bug report for perl from cm.perl@abtela.com,
generated with the help of perlbug 1.39 running under perl 5.18.1.
When you open a unix-delimited file (i.e., lines end in LF, not CRLF)
on Win32 with
my $io = new IO::File($filename, "<:encoding(...)")
a call to
tell $io;
seem to corrupt the handle / layers state to the point that the next
call to $io->getline does not return the next line as expected.
This is a serious problem as it precludes any use of
$io->input_line_number
(which makes a call to tell) for unix-delimited files opened this
way on Win32.
This seems to be the reason why Pod-Eventual-0.094001 fails tests
on Win32. It calls input_line_number on handles opened by
Mixin-Linewise-0.102 with a default encoding of ":encoding(UTF-8)"
(the introduction of this default encoding was apparently the
rationale for the latest versions of these two modules).
Dist-Zilla, Pod-Weaver, Config-INI and other important CPAN distribs
depend on these.
The attached test file io_tell_encoding.t illustrates the problem
(it has been coded in the style of dist/IO/t/io_linenum.t in the
hope that it would facilitate its integration... If Test::More can
be used I will be happy to provide a more explicit version).
What this test program does is first establish a "reference" version
of the list of lines to be read (using a 'traditional' open) and
then reads again the same file using various "extensions" :
my $io = IO::File->new($File, "<:$encoding") or die $!;
for the following values of $encoding :
"encoding(UTF-8)", "encoding(iso-8859-1)", "", "raw", "crlf", "utf8"
any of which should be able to read without problem a pure ASCII,
unix-delimited file. Each line read (with $io->getline) is compared
with the reference.
In a first batch of tests there is a call to tell($io) after each
$io->getline.
This call is omitted in a second batch.
The test file is the test program itself (the comments at the end of
the program text were crafted to make it easier to see the problem).
When stored as a unix (LF delimited) file, this program yields
Taisha:~/devbin/tmp $ perl io_tell_encoding.t
1..12
# Running under perl version 5.018001 for MSWin32
# Current time local: Mon Dec 16 01:45:11 2013
# Current time GMT: Mon Dec 16 00:45:11 2013
# Using Test.pm version 1.26
not ok 1
# Test 1 got: "line 1, expected 'my $File;\n', got '5a6a7a8a9\n'"
(io_tell_encoding.t at line 40)
# Expected: "OK" (encoding = encoding(UTF-8), tell = 1)
# io_tell_encoding.t line 40 is: ok(test($encoding, $tell), "OK",
"encoding = $encoding, tell = $tell");
not ok 2
# Test 2 got: "line 1, expected 'my $File;\n', got '5a6a7a8a9\n'"
(io_tell_encoding.t at line 40 fail #2)
# Expected: "OK" (encoding = encoding(iso-8859-1), tell = 1)
ok 3
ok 4
ok 5
ok 6
ok 7
ok 8
ok 9
ok 10
ok 11
ok 12
Taisha:~/devbin/tmp $
We see that the test fails only for "encoding(...)" when tell($io)
is called.
If the program is stored as a CRLF delimited file it yields instead
Taisha:~/devbin/tmp $ perl io_tell_encoding.t
1..12
# Running under perl version 5.018001 for MSWin32
# Current time local: Mon Dec 16 01:47:14 2013
# Current time GMT: Mon Dec 16 00:47:14 2013
# Using Test.pm version 1.26
ok 1
ok 2
ok 3
not ok 4
# Test 4 got: "line 0, expected '#!./perl\n', got '#!./perl\r\n'"
(io_tell_encoding.t at line 40 fail #4)
# Expected: "OK" (encoding = raw, tell = 1)
# io_tell_encoding.t line 40 is: ok(test($encoding, $tell), "OK",
"encoding = $encoding, tell = $tell");
ok 5
ok 6
ok 7
ok 8
ok 9
not ok 10
# Test 10 got: "line 0, expected '#!./perl\n', got '#!./perl\r\n'"
(io_tell_encoding.t at line 40 fail #10)
# Expected: "OK" (encoding = raw, tell = 0)
ok 11
ok 12
Taisha:~/devbin/tmp $
now the only encoding that fails is ':raw', which is normal and
unrelated to this ticket.
I have tried to investigate further but after a few hours concluded
that this problem was way over my head :(
Thank you for your time and attention.
---
Flags:
category=core
severity=critical
---
Site configuration information for perl 5.18.1:
Configured by strawberry-perl at Tue Aug 13 19:21:46 2013.
Summary of my perl5 (revision 5 version 18 subversion 1) configuration:
Platform:
osname=MSWin32, osvers=4.0, archname=MSWin32-x86-multi-thread-64int
uname='Win32 strawberry-perl 5.18.1.1 #1 Tue Aug 13 19:20:13 2013
i386'
config_args='undef'
hint=recommended, useposix=true, d_sigaction=undef
useithreads=define, usemultiplicity=define
useperlio=define, d_sfio=undef, uselargefiles=define, usesocks=undef
use64bitint=define, use64bitall=undef, uselongdouble=undef
usemymalloc=n, bincompat5005=undef
Compiler:
cc='gcc', ccflags =' -s -O2 -DWIN32 -DPERL_TEXTMODE_SCRIPTS
-DPERL_IMPLICIT_CONTEXT -DPERL_IMPLICIT_SYS -DUSE_PERLIO
-fno-strict-aliasing -mms-bitfields',
optimize='-s -O2',
cppflags='-DWIN32'
ccversion='', gccversion='4.7.3', gccosandvers=''
intsize=4, longsize=4, ptrsize=4, doublesize=8, byteorder=12345678
d_longlong=define, longlongsize=8, d_longdbl=define, longdblsize=12
ivtype='long long', ivsize=8, nvtype='double', nvsize=8,
Off_t='long long', lseeksize=8
alignbytes=8, prototype=define
Linker and Libraries:
ld='g++.exe', ldflags ='-s
-L"E:\cm\devbin\strawberry-perl-5.18.1.1-32bit-portable\perl\lib\CORE"
-L"E:\cm\devbin\strawberry-perl-5.18.1.1-32bit-portable\c\lib"'
libpth=E:\cm\devbin\strawberry-perl-5.18.1.1-32bit-portable\c\lib
E:\cm\devbin\strawberry-perl-5.18.1.1-32bit-portable\c\i686-w64-mingw32\lib
libs=-lmoldname -lkernel32 -luser32 -lgdi32 -lwinspool -lcomdlg32
-ladvapi32 -lshell32 -lole32 -loleaut32 -lnetapi32 -luuid -lws2_32 -lmpr
-lwinmm -lversion -lodbc32 -lodbccp32 -lcomctl32
perllibs=-lmoldname -lkernel32 -luser32 -lgdi32 -lwinspool
-lcomdlg32 -ladvapi32 -lshell32 -lole32 -loleaut32 -lnetapi32 -luuid
-lws2_32 -lmpr -lwinmm -lversion -lodbc32 -lodbccp32 -lcomctl32
libc=, so=dll, useshrplib=true, libperl=libperl518.a
gnulibc_version=''
Dynamic Linking:
dlsrc=dl_win32.xs, dlext=dll, d_dlsymun=undef, ccdlflags=' '
cccdlflags=' ', lddlflags='-mdll -s
-L"E:\cm\devbin\strawberry-perl-5.18.1.1-32bit-portable\perl\lib\CORE"
-L"E:\cm\devbin\strawberry-perl-5.18.1.1-32bit-portable\c\lib"'
Locally applied patches:
---
@INC for perl 5.18.1:
E:/cm/devbin/strawberry-perl-5.18.1.1-32bit-portable/perl/site/lib
E:/cm/devbin/strawberry-perl-5.18.1.1-32bit-portable/perl/vendor/lib
E:/cm/devbin/strawberry-perl-5.18.1.1-32bit-portable/perl/lib
.
---
Environment for perl 5.18.1:
CYGWIN=nodosfilewarning
HOME=e:/cm
LANG (unset)
LANGUAGE (unset)
LD_LIBRARY_PATH (unset)
LOGDIR (unset)
PATH=E:\cm\devbin\strawberry-perl-5.18.1.1-32bit-portable\perl\site\bin;E:\cm\devbin\strawberry-perl-5.18.1.1-32bit-portable\perl\bin;E:\cm\devbin\strawberry-perl-5.18.1.1-32bit-portable\c\bin;C:\Windows\system32;C:\Windows;C:\Windows\System32\Wbem;C:\Windows\System32\WindowsPowerShell\v1.0\;C:\Program
Files (x86)\PuTTY;C:\Program Files (x86)\OpenOffice.org
3\program;C:\Program Files (x86)\QT Lite\QTSystem;c:\Program
Files\WinRAR;C:\Program Files (x86)\Calibre2\
PERL_BADLANG (unset)
SHELL (unset)
Thread Next
-
[perl #120797] tell / getline problems on Win32 with unix-delimited files opened with encoding(UTF-8)
by Christian Millour