develooper Front page | perl.perl5.porters | Postings from July 2001

Re: [ID 20010703.002] [5.6.2-to-be] PerlIO is EVIL

Thread Previous | Thread Next
From:
Nick Ing-Simmons
Date:
July 6, 2001 05:39
Subject:
Re: [ID 20010703.002] [5.6.2-to-be] PerlIO is EVIL
Message ID:
20010706132827.1279.4@bactrian.ni-s.u-net.com
Ilya Zakharevich <ilya@math.ohio-state.edu> writes:
>This is a bug report for perl from ilya@ia-ia.uucp,
>generated with the help of perlbug 1.33 running under perl v5.7.1.
>
>  #!perl -w
>
>  open F, "$^X -wle 'binmode STDOUT; print q()' |" or die;
>  my $c = getc F or warn;
>  $c eq "\n" or warn ord $c;
>  close F or die;
>
>is silent on usual perl, and warns "255" on 5.6.2-to-be.  No, I have
>no idea why it is 255, I constructd this example to demonstrate a
>*different* bug of PerlIO.
>
>The *actual* problem is that char-by-char input requires DUPLICATE pressing
>of ENTER key for this key to be seen by Perl.  Debugging this problem
>(via Term::ReadKey test suite) shows the following logic:
>
>  pp_getc() calls  is_eof() which does getc/ungetc
>	    calls  getc()
>
>  [BTW, I see no logic in this sequence of events.]

It is legacy scheme - for systems where we cannot snoop the buffer
we ask (e.g.) stdio to get a char, if it succeeds we are not at EOF.
But then we need to put char back.

>
>The problem is that ungetc() can't unget "\n" if this \n is the first char
>in the buffer, and quietly drops "\n" to the floor.  The (horrible) code
>looks like this:
>
>       if (ch == '\n')
>        {
>         if (b->ptr - 2 >= b->buf)
>          {
>
>Tracing through perlio.c shows how much EVIL it is.  All 80K of it.  

If the handle is in text mode then '\n' should have come from a CRLF pair,
so there should be room for it. 

>I 
>simply have no words for this code.  Extremely inefficient and bloated,
>doing enormous amount of work for trivial tasks - not even mentioning
>how buggy it turns out to be!   Apparently the principal task is to have
>CRNL translation, and the principal usage is on Dosish systems - which
>*already* do this with binary().

All the handles should be binary() at the OS level. The reason for crlf 
layer is that various CRLF->'\n' logic in C runtime on Win32 is 
even worse than what I have in perlio.c.

If OS2's CRLF logic is solid an efficent then it may make sense to 
NOT use crlf layer but just use OS2's read/write directly.


>
>In the example of Term::ReadKey, CRTL *knows* that TTYs should be opened
>in the binary mode.  PerlIO does not.  

So get Term::ReadKey to tell PerlIO it should be binary.

>There are miriads of such special
>cases.
>
>This code *duplicates* the work already done in the CRTL, thus lead to
>double translations (as easy examples show).  

That is NOT the intent. The intent is that crlf layer is used when:
 A. The OS CRTL does not do that (e.g. reading DOSISH files on UNIX).
 B. When the CRTL is buggy and inefficient (e.g. Borland and VC++ C on Win32).
    

>Not even mentioning that
>this code is probably 1/10 as efficient as the code in CRTL (and 5 times
>bulkier and sub-call-deeper) - in*****tent is the only word I could find
>for perlio.c.
>
>The task of CRNL translation is very quick and easy - with a proper
>implementation.  

The main pain is getting tell() and seek() to work right. 

>No need to extra copying, no nothing: just keep an extra
>pointer into the buffer.  Whatever is before the pointer is translated,
>whatever is after is not - and the char immediately after the pointer
>should be ignored.  On _fill() move the pointer to "\n" in the nearest
>"\r\n" pair, and replace the first "\r" by "\n".  That's all.  ungetc()
>modifies only the already-translated part, there is no need to untranslate
>things (which is impossible anyway).  

I agree it is "impossible" - but there is a _need_. If the handle was in 
text mode and then switched back to binary mode then the CRLF need to be 
"put back". (E.g. a CRLF MIME header on a file which then switches to embedded 
JPEG or whatever.).
If the stream is seek-able then safe approach is to seek and re-read.

>Similar things may be done for all
>the other needed translations.

Patches welcome. I wrote the CRLF layer very quickly.
But crlf layer does not copy either, we use the CRLF layer as the buffer
layer. 

>
>[It is also very efficient to report only the part until this buffer as
>available, if it is available.  This allows simple inlining of getc().]
>
>I do not even want to mention that in many cases all Perl needs is a
>pointer to the buffer, but PerlIO insists on copying the stuff...

PerlIO does _NOT_ insist on copying - that is exactly the sort of thing
the Win32 CRTL do which I am trying to avoid.
There may be the odd spot where implementation is not optimal.

>
>I do not think that PerlIO is ready for 5.6.2.
>
>Hope this helps,
>Ilya
>
>---
>Flags:
>    category=core
>    severity=critical
>---
>Site configuration information for perl v5.7.1:
>
>Configured by vera at Mon Jun 25 01:33:44 PDT 2001.
>
>Summary of my perl5 (revision 5.0 version 7 subversion 17) configuration:
>  Platform:
>    osname=os2, osvers=2.30, archname=os2
>    uname='os2 ia-ia 2 2.30 i386 '
>    config_args='-des -Dprefix=i:/perllib -Dusedevel'
>    hint=recommended, useposix=true, d_sigaction=define
>    usethreads=undef use5005threads=undef useithreads=undef usemultiplicity=undef
>    useperlio=define d_sfio=undef uselargefiles=define usesocks=undef
>    use64bitint=undef use64bitall=undef uselongdouble=undef
>  Compiler:
>    cc='gcc', ccflags ='-Zomf -Zmt -DDOSISH -DOS2=2 -DEMBED -I. -D_EMX_CRT_REV_=63 -Wall',
>    optimize='-O2 -fomit-frame-pointer -malign-loops=2 -malign-jumps=2 -malign-functions=2 -s',
>    cppflags='-Zomf -Zmt -DDOSISH -DOS2=2 -DEMBED -I. -D_EMX_CRT_REV_=63 -Wall'
>    ccversion='', gccversion='2.8.1', gccosandvers=''
>    intsize=4, longsize=4, ptrsize=4, doublesize=8, byteorder=1234
>    d_longlong=define, longlongsize=8, d_longdbl=define, longdblsize=12
>    ivtype='long', ivsize=4, nvtype='double', nvsize=8, Off_t='off_t', lseeksize=4
>    alignbytes=4, usemymalloc=y, prototype=define
>  Linker and Libraries:
>    ld='gcc', ldflags ='-Zexe -Zomf -Zmt -Zcrtdll -Zstack 32000 -Zlinker /e:2'
>    libpth=i:/emx.add/lib i:/emx/lib D:/DEVTOOLS/OPENGL/LIB I:/JAVA11/LIB i:/emx/lib/mt
>    libs=-lsocket -lm -lbsd
>    perllibs=-lsocket -lm -lbsd
>    libc=i:/emx/lib/mt/c_import.lib, so=dll, useshrplib=true, libperl=libperl.lib
>  Dynamic Linking:
>    dlsrc=dl_dlopen.xs, dlext=dll, d_dlsymun=undef, ccdlflags=' '
>    cccdlflags='-Zdll', lddlflags='-Zdll -Zomf -Zmt -Zcrtdll -Zlinker /e:2'
>
>Locally applied patches:
>    DEVEL10906
>
>---
>@INC for perl v5.7.1:
>    lib/os2
>    lib
>    i:/perllib/lib/5.7.1/os2
>    i:/perllib/lib/5.7.1
>    i:/perllib/lib/site_perl/5.7.1/os2
>    i:/perllib/lib/site_perl/5.7.1
>    i:/perllib/lib/site_perl/5.00553/os2
>    i:/perllib/lib/site_perl/5.00553
>    i:/perllib/lib/site_perl
>    .
>
>---
>Environment for perl v5.7.1:
>    HOME=j:/home
>    LANG=EN_US
>    LANGUAGE (unset)
>    LD_LIBRARY_PATH (unset)
>    LOGDIR (unset)
>    PATH=i:\NETSCAPE\PROGRAM;d:\OPENDOC\EPM;d:\OPENDOC\OREXX;d:\OPENDOC\BIN;i:\DRV;i:\TOOLKIT\BETA\BIN\PMEISTER;i:\TOOLKIT\BETA\BIN;i:\TOOLKIT\BIN\OSARED;i:\TOOLKIT\SOM\COMMON;i:\TOOLKIT\SOM\BIN;i:\TOOLKIT\BIN;D:\OS2;D:\OS2\SYSTEM;i:\VABASIC\SOM\BIN;D:\OS2\INSTALL;D:\;D:\OS2\MDOS;D:\OS2\APPS;i:\BIN;i:\UTILS;i:\EMACS\19.33\BIN;i:\emx.add\BIN;i:\EMX\BIN;F:\PBM;i:\EMTEX\bin;i:\ckermit;I:\PAGETURN;d:\tcpip\bin;D:\TCPIP\UMAIL;D:\VIEWER\BIN;I:\XFree86\bin;i:\VABASIC\BIN;i:\SMART;I:\Java11\rmi-iiop\bin;I:\JAVA11\BIN;d:\JAVAOS2\bin;D:\MMOS2;I:\JAVA11\ICATJAVA\BIN;I:\gs\tools;I:\gs\gs7.00\bin;
>    PERLLIB_PREFIX=f:/perllib;i:/perllib
>    PERL_BADLANG (unset)
>    PERL_SH_DIR=i:/bin
>    SHELL (unset)
-- 
Nick Ing-Simmons
http://www.ni-s.u-net.com/


Thread Previous | Thread Next


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About