develooper Front page | perl.perl5.porters | Postings from July 2008

Re: fixing the silently failing implicit-close bug (was magic crud)

Thread Next
From:
Ed Avis
Date:
July 30, 2008 00:54
Subject:
Re: fixing the silently failing implicit-close bug (was magic crud)
Message ID:
loom.20080730T064736-60@post.gmane.org
Tom Christiansen <tchrist <at> perl.com> writes:

>>But isn't all this effort on documenting the caveats and gotchas of
>><>, with only moderate success in educating the perl-using public, an
>>indicator that it might be better to step back and think of a better
>>way of doing it?

>It's not the input operator that requires that explanation, save for
>whether it's a fileglob or a readline.  The perl5.001 perlfunc manpage
>under open ended with:

>        $file =~ s#^(\s)#./$1#;
>        open(FOO, "< $file\0");
> 
>Although since then, that's gotten harder and harder to find as the PC
>police have had their own way when nobody was looking.

I guess partly because it's no longer as relevant.  If you want to open an
arbitrary filename, you just say open($fh, '<', $filename), which is a big
improvement, after all easy things should be easy.

>And that's all it takes.  Period.
> 
>Did you read it, or not?

All that is under open, not under <>.

>v5.10  The null filehandle <> is special: it can be used to
>v5.10  emulate the behavior of sed and awk.  Input from <> comes
>v5.10  either from standard input, or from each file listed on
>v5.10  the command line.  Here's how it works: the first time <>
>v5.10  is evaluated, the @ARGV array is checked, and if it is
>v5.10  empty, $ARGV[0] is set to "-", which when opened gives you
>v5.10  standard input.  The @ARGV array is then processed as a
>v5.10  list of filenames.  The loop
>v5.10
>v5.10      while (<>) {
>v5.10          ...                     # code for each line
>v5.10      }
>v5.10
>v5.10  is equivalent to the following Perl-like pseudo code:
>v5.10
>v5.10      unshift(@ARGV, '-') unless @ARGV;
>v5.10      while ($ARGV = shift) {
>v5.10          open(ARGV, $ARGV);
>v5.10          while (<ARGV>) {
>v5.10              ...         # code for each line
>v5.10          }
>v5.10      }

I suspect that most readers will take more notice of the one line summary 'from
standard input, or from each file listed on the command line'.  Which is how
many people expect the operator to work and IMHO how it should work.

That is much more likely than going through the pseudo code line by line and
cross-referencing each function to elsewhere in perlfunc in order to come across
a warning in the middle of the documentation for a now quite rarely used form of
open() that sometimes, instead of opening a file as its name suggests or as the
documentation above says, it will start running programs.

See some of the replies on earlier threads for a few data points showing that
some programmers (and surely not a wilfully stupid few) have indeed been misled
into thinking that <> simply reads stdin or files listed on the command line.
 
>Better to add something by the gz map in the doc (that
>nobody reads(?).

I suppose some programmers read it and some don't.  If you learn from 'Learning
Perl' that while (<>) is a correct way to write a simple filter program, why
would you mistrust that and go hunting through perlfunc for Easter eggs and
obiter dicta that might reveal hidden bugs in your program?

>Better to let people still use gz stuff and such 
>if they want to--many of us clearly do.

Yes, *if they want to*.  That's the key point.  Nobody says magic should be
unavailable, just it needs a syntax that is a bit more explicit and isn't
confused with mere 'read the files'.

>>>To this day, Perl's implicit closing of files doesn't warn you of
>>>errors, let alone exit nonzero.  This makes it do wrong thing and not
>>>even tell you it did them wrong.
> 
>>I am glad you mentioned this, because I also think it's a pretty
>>egregious bug, but I was rather imagining you'd come out to defend it,
> 
>Huh?

If I might play devil's advocate for a moment:

The manual page clearly states that it ignores errors on close and always has
done.  Here's the perl1 manpage;

v1.0        while (<>) {
v1.0             ...            # code for each line
v1.0        }
v1.0
v1.0   is equivalent to
v1.0
v1.0        unshift(@ARGV, '-') if $#ARGV < $[;
v1.0        while ($ARGV = shift) {
v1.0             open(ARGV, $ARGV);
v1.0             while (<ARGV>) {
v1.0                  ...       # code for each line
v1.0             }
v1.0        }

Do you see anything in there that says

    close(ARGV) || die($!);

No?  Do people not even read the docs?  And here is 5.10:

v5.10      while (<>) {
v5.10          ...                     # code for each line
v5.10      }
v5.10
v5.10  is equivalent to the following Perl-like pseudo code:
v5.10
v5.10      unshift(@ARGV, '-') unless @ARGV;
v5.10      while ($ARGV = shift) {
v5.10          open(ARGV, $ARGV);
v5.10          while (<ARGV>) {
v5.10              ...         # code for each line
v5.10          }
v5.10      }

I get the very strong idea you haven't read that.  The <> operator is behaving
exactly as documented and so there is no bug.

Some people have been making lots of noise with alarmist scenarios of how it
could silently 'go wrong', so called, when the disk is full and output can't be
written.  But how realistic is that?  Every Unix system reserves a few percent
of space for the superuser, and if ordinary users are taking up too much disk
space, that is a matter of user education.  Even if it means running a
system-wide du(1), you should teach users that anyone filling the disk to the
maximum deserves what they get.

Besides, even though checking the result of close() gets rid of one error, it
doesn't make it safe in general.  The machine might lose power or crash before
the kernel's buffers have been flushed to disk.  The file might have been
unlinked partway through the script running, so writing to it appears to work,
but the data goes into the bit bucket as soon as you close().  The disk might
fail.  If you are in that kind of hostile environment then merely being paranoid
about checking close() is not enough to make you 'safe', though it might lull
you into a false sense of security.

For those who really want to overrule Larry's careful design decision that has
been there since perl1 and has therefore not needed to change since, there is a
special incantation that tchrist came up with, mentioned in the most prominent
place possible in the documentation (under 'how can I do an atexit()' in
perlfaq8... OBVIOUSLY) which gives them what they want.

So there is absolutely no need to change the core language, mess with the
documented and Larry-approved behaviour that has been there since perl 1.0, and
risk breaking some running code, just to fix a theoretical problem.

>>Since nobody in practice checks the return status of every print()
>>call, I would like to see an enhancement to perl where if any print()
>>on a filehandle fails, it sets a flag which is then checked by
>>close().  So you could be sure to catch file I/O errors sooner or
>>later, if not immediately.  But that is for another thread ;-p.
> 
>And just what do you *think* happens!?  See your local clearerr and
>ferror manpages in section 3 or 3S of your local mantree.  I'd be aghast
>if this behavior weren't carried out for the perlio implementation.

What about tied filehandles?  The author of a tied filehandle class has no
obligation to store a per-filehandle flag for whether an error has occurred. 
(Indeed, perltie doesn't mention anything about how to report error status,
which is rather a deficiency.)

>As for arbitrary filenames, you're forgetting some history again. Notice
>why there's no aux.sh hints file.  Try creating con.tmp or aux.plx on
>some systems.

I used to use MS-DOS and I am aware of COM, NUL and other crappy magic filenames
on that system.  Computer archaeologists may be able to give other examples. 
But I don't think that ought to stop <> being 8-bit clean for filenames today.

>A bug is a bug, and this is.  Always has been.  

-- 
Ed Avis <eda@waniasset.com>


Thread Next


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About