develooper Front page | perl.perl5.porters | Postings from December 2013

[perl #120797] tell / getline problems on Win32 with unix-delimited files opened with encoding(UTF-8)

Thread Next
From:
Christian Millour
Date:
December 16, 2013 18:17
Subject:
[perl #120797] tell / getline problems on Win32 with unix-delimited files opened with encoding(UTF-8)
Message ID:
rt-4.0.18-18205-1387157803-1440.120797-75-0@perl.org
# New Ticket Created by  Christian Millour 
# Please include the string:  [perl #120797]
# in the subject line of all future correspondence about this issue. 
# <URL: https://rt.perl.org/Ticket/Display.html?id=120797 >


This is a bug report for perl from cm.perl@abtela.com,
generated with the help of perlbug 1.39 running under perl 5.18.1.

When you open a unix-delimited file (i.e., lines end in LF, not CRLF)
on Win32 with
     my $io = new IO::File($filename, "<:encoding(...)")
a call to
     tell $io;
seem to corrupt the handle / layers state to the point that the next
call to $io->getline does not return the next line as expected.

This is a serious problem as it precludes any use of
     $io->input_line_number
(which makes a call to tell) for unix-delimited files opened this
way on Win32.

This seems to be the reason why Pod-Eventual-0.094001 fails tests
on Win32. It calls input_line_number on handles opened by
Mixin-Linewise-0.102 with a default encoding of ":encoding(UTF-8)"
(the introduction of this default encoding was apparently the
rationale for the latest versions of these two modules).

Dist-Zilla, Pod-Weaver, Config-INI and other important CPAN distribs
depend on these.

The attached test file io_tell_encoding.t illustrates the problem
(it has been coded in the style of dist/IO/t/io_linenum.t in the
hope that it would facilitate its integration... If Test::More can
be used I will be happy to provide a more explicit version).

What this test program does is first establish a "reference" version
of the list of lines to be read (using a 'traditional' open) and
then reads again the same file using various "extensions" :

       my $io = IO::File->new($File, "<:$encoding") or die $!;

for the following values of $encoding :

       "encoding(UTF-8)", "encoding(iso-8859-1)", "", "raw", "crlf", "utf8"

any of which should be able to read without problem a pure ASCII,
unix-delimited file. Each line read (with $io->getline) is compared
with the reference.

In a first batch of tests there is a call to tell($io) after each 
$io->getline.
This call is omitted in a second batch.

The test file is the test program itself (the comments at the end of
the program text were crafted to make it easier to see the problem).

When stored as a unix (LF delimited) file, this program yields

Taisha:~/devbin/tmp $ perl io_tell_encoding.t
1..12
# Running under perl version 5.018001 for MSWin32
# Current time local: Mon Dec 16 01:45:11 2013
# Current time GMT:   Mon Dec 16 00:45:11 2013
# Using Test.pm version 1.26
not ok 1
# Test 1 got: "line 1, expected 'my $File;\n', got '5a6a7a8a9\n'" 
(io_tell_encoding.t at line 40)
#   Expected: "OK" (encoding = encoding(UTF-8), tell = 1)
#  io_tell_encoding.t line 40 is: 	ok(test($encoding, $tell), "OK", 
"encoding = $encoding, tell = $tell");
not ok 2
# Test 2 got: "line 1, expected 'my $File;\n', got '5a6a7a8a9\n'" 
(io_tell_encoding.t at line 40 fail #2)
#   Expected: "OK" (encoding = encoding(iso-8859-1), tell = 1)
ok 3
ok 4
ok 5
ok 6
ok 7
ok 8
ok 9
ok 10
ok 11
ok 12
Taisha:~/devbin/tmp $


We see that the test fails only for "encoding(...)" when tell($io)
is called.

If the program is stored as a CRLF delimited file it yields instead


Taisha:~/devbin/tmp $ perl io_tell_encoding.t
1..12
# Running under perl version 5.018001 for MSWin32
# Current time local: Mon Dec 16 01:47:14 2013
# Current time GMT:   Mon Dec 16 00:47:14 2013
# Using Test.pm version 1.26
ok 1
ok 2
ok 3
not ok 4
# Test 4 got: "line 0, expected '#!./perl\n', got '#!./perl\r\n'" 
(io_tell_encoding.t at line 40 fail #4)
#   Expected: "OK" (encoding = raw, tell = 1)
#  io_tell_encoding.t line 40 is: 	ok(test($encoding, $tell), "OK", 
"encoding = $encoding, tell = $tell");
ok 5
ok 6
ok 7
ok 8
ok 9
not ok 10
# Test 10 got: "line 0, expected '#!./perl\n', got '#!./perl\r\n'" 
(io_tell_encoding.t at line 40 fail #10)
#    Expected: "OK" (encoding = raw, tell = 0)
ok 11
ok 12
Taisha:~/devbin/tmp $


now the only encoding that fails is ':raw',  which is normal and
unrelated to this ticket.

I have tried to investigate further but after a few hours concluded
that this problem was way over my head :(

Thank you for your time and attention.

---
Flags:
       category=core
       severity=critical
---
Site configuration information for perl 5.18.1:

Configured by strawberry-perl at Tue Aug 13 19:21:46 2013.

Summary of my perl5 (revision 5 version 18 subversion 1) configuration:

     Platform:
       osname=MSWin32, osvers=4.0, archname=MSWin32-x86-multi-thread-64int
       uname='Win32 strawberry-perl 5.18.1.1 #1 Tue Aug 13 19:20:13 2013 
i386'
       config_args='undef'
       hint=recommended, useposix=true, d_sigaction=undef
       useithreads=define, usemultiplicity=define
       useperlio=define, d_sfio=undef, uselargefiles=define, usesocks=undef
       use64bitint=define, use64bitall=undef, uselongdouble=undef
       usemymalloc=n, bincompat5005=undef
     Compiler:
       cc='gcc', ccflags =' -s -O2 -DWIN32  -DPERL_TEXTMODE_SCRIPTS 
-DPERL_IMPLICIT_CONTEXT -DPERL_IMPLICIT_SYS -DUSE_PERLIO 
-fno-strict-aliasing -mms-bitfields',
       optimize='-s -O2',
       cppflags='-DWIN32'
       ccversion='', gccversion='4.7.3', gccosandvers=''
       intsize=4, longsize=4, ptrsize=4, doublesize=8, byteorder=12345678
       d_longlong=define, longlongsize=8, d_longdbl=define, longdblsize=12
       ivtype='long long', ivsize=8, nvtype='double', nvsize=8, 
Off_t='long long', lseeksize=8
       alignbytes=8, prototype=define
     Linker and Libraries:
       ld='g++.exe', ldflags ='-s 
-L"E:\cm\devbin\strawberry-perl-5.18.1.1-32bit-portable\perl\lib\CORE" 
-L"E:\cm\devbin\strawberry-perl-5.18.1.1-32bit-portable\c\lib"'
       libpth=E:\cm\devbin\strawberry-perl-5.18.1.1-32bit-portable\c\lib 
E:\cm\devbin\strawberry-perl-5.18.1.1-32bit-portable\c\i686-w64-mingw32\lib
       libs=-lmoldname -lkernel32 -luser32 -lgdi32 -lwinspool -lcomdlg32 
-ladvapi32 -lshell32 -lole32 -loleaut32 -lnetapi32 -luuid -lws2_32 -lmpr 
-lwinmm -lversion -lodbc32 -lodbccp32 -lcomctl32
       perllibs=-lmoldname -lkernel32 -luser32 -lgdi32 -lwinspool 
-lcomdlg32 -ladvapi32 -lshell32 -lole32 -loleaut32 -lnetapi32 -luuid 
-lws2_32 -lmpr -lwinmm -lversion -lodbc32 -lodbccp32 -lcomctl32
       libc=, so=dll, useshrplib=true, libperl=libperl518.a
       gnulibc_version=''
     Dynamic Linking:
       dlsrc=dl_win32.xs, dlext=dll, d_dlsymun=undef, ccdlflags=' '
       cccdlflags=' ', lddlflags='-mdll -s 
-L"E:\cm\devbin\strawberry-perl-5.18.1.1-32bit-portable\perl\lib\CORE" 
-L"E:\cm\devbin\strawberry-perl-5.18.1.1-32bit-portable\c\lib"'

Locally applied patches:


---
@INC for perl 5.18.1:
       E:/cm/devbin/strawberry-perl-5.18.1.1-32bit-portable/perl/site/lib
       E:/cm/devbin/strawberry-perl-5.18.1.1-32bit-portable/perl/vendor/lib
       E:/cm/devbin/strawberry-perl-5.18.1.1-32bit-portable/perl/lib
       .

---
Environment for perl 5.18.1:
       CYGWIN=nodosfilewarning
       HOME=e:/cm
       LANG (unset)
       LANGUAGE (unset)
       LD_LIBRARY_PATH (unset)
       LOGDIR (unset)
 
PATH=E:\cm\devbin\strawberry-perl-5.18.1.1-32bit-portable\perl\site\bin;E:\cm\devbin\strawberry-perl-5.18.1.1-32bit-portable\perl\bin;E:\cm\devbin\strawberry-perl-5.18.1.1-32bit-portable\c\bin;C:\Windows\system32;C:\Windows;C:\Windows\System32\Wbem;C:\Windows\System32\WindowsPowerShell\v1.0\;C:\Program 
Files (x86)\PuTTY;C:\Program Files (x86)\OpenOffice.org 
3\program;C:\Program Files (x86)\QT Lite\QTSystem;c:\Program 
Files\WinRAR;C:\Program Files (x86)\Calibre2\
       PERL_BADLANG (unset)
       SHELL (unset)



Thread Next


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About