develooper Front page | perl.perl5.porters | Postings from July 2008

fixing the silently failing implicit-close bug (was magic crud)

From:
Tom Christiansen
Date:
July 29, 2008 19:07
Subject:
fixing the silently failing implicit-close bug (was magic crud)
Message ID:
19185.1217383672@chthon
In-Reply-To: Message from Ed Avis <eda@waniasset.com> 
   of "Tue, 29 Jul 2008 23:19:47 -0000." <loom.20080729T222121-334@post.gmane.org> 

> Tom Christiansen <tchrist <at> perl.com> writes:

>>> But if you think this is an important issue, wouldn't it make much
>>> more sense to teach people RIGHT NOW that they shouldn't rely on
>>> while (<>) automatically open files for them,

>> Because while you can lead a horse to water, but you can't make it drink.
>> And if you let it drink freely on its own, it may drink itself to death.
>> 
>> Which just shows that you can't win for trying.

>> I spent years on it--YEARS.  They still just won't learn.  Things
>> like this have been in the FAQ since BEFORE Camel-1 even; they've
>> been in plenty of the available documentation since then--

> [illustrative examples of how the documentation has deteriorated snipped]

> You are certainly right that you've spent a lot of effort on this and
> other aspects of the perl documentation, which is greatly appreciated
> BTW, and that it was probably clearer in older versions.

> But isn't all this effort on documenting the caveats and gotchas of
> <>, with only moderate success in educating the perl-using public, an
> indicator that it might be better to step back and think of a better
> way of doing it?  The mere fact that <> requires so much documentation
> and controversy, and arguments about when it is safe to use and when
> it isn't, and discussions about exactly the right incantation to add
> \0 to the end of your filenames, surely shows that its semantics are
> a bit hairy at the very least: too hairy for such a simple-looking
> operator that everyone uses without a second thought.

It's not the input operator that requires that explanation, save for
whether it's a fileglob or a readline.  The perl5.001 perlfunc manpage
under open ended with:

    The filename that is passed to open will have leading and trailing
    whitespace deleted.  In order to open a file with arbitrary weird
    characters in it, it's necessary to protect any leading and trailing
    whitespace thusly:

        $file =~ s#^(\s)#./$1#;
        open(FOO, "< $file\0");

Although since then, that's gotten harder and harder to find as the PC
police have had their own way when nobody was looking. It today survives
only in the perlopentut section I references before, which includes a
very clear explanation of why we have the behavior we have, and it's
merely what I've said before:

   The other important thing to notice is that, just as in
   the shell, any whitespace before or after the filename is
   ignored.  This is good, because you wouldn't want these to
   do different things:

       open INFO,   "<datafile"
       open INFO,   "< datafile"
       open INFO,   "<  datafile"

   Ignoring surrounding whitespace also helps for when you
   read a filename in from a different file, and forget to
   trim it before opening:

       $filename = <INFO>;         # oops, \n still there
       open(EXTRA, "< $filename") || die "can't open $filename: $!";

   This is not a bug, but a feature.  Because "open" mimics
   the shell in its style of using redirection arrows to
   specify how to open the file, it also does so with respect
   to extra whitespace around the filename itself as well.

And that's all it takes.  Period.

Did you read it, or not?

> (Of course, whatever warning appears under 'open' also needs to be
> duplicated under <>, since it is by no means obvious that <> is using
> the magical open internally.)

It always has been.  Sheesh!  Don't people read?   Here's
the perl1 manpage on it:

v1.0   The null filehandle <> is special and can be used to  emuĀ­
v1.0   late  the  behavior  of  sed and awk.  Input from <> comes
v1.0   either from standard input, or from each  file  listed  on
v1.0   the  command line.  Here's how it works: the first time <>
v1.0   is evaluated, the ARGV array is  checked,  and  if  it  is
v1.0   null,  $ARGV[0] is set to '-', which when opened gives you
v1.0   standard input.  The ARGV array is  then  processed  as  a
v1.0   list of filenames.  The loop
v1.0
v1.0        while (<>) {
v1.0             ...            # code for each line
v1.0        }
v1.0
v1.0   is equivalent to
v1.0
v1.0        unshift(@ARGV, '-') if $#ARGV < $[;
v1.0        while ($ARGV = shift) {
v1.0             open(ARGV, $ARGV);
v1.0             while (<ARGV>) {
v1.0                  ...       # code for each line
v1.0             }
v1.0        }
v1.0
v1.0   except that it isn't as cumbersome to say.  It really does
v1.0   shift array ARGV and put the current filename  into  variĀ­
v1.0   able  ARGV.  It also uses filehandle ARGV internally.  You
v1.0   can modify @ARGV before the first <> as long as you  leave
v1.0   the first filename at the beginning of the array.
v1.0
v1.0   If  you  want  to  set  @ARGV to you own list of files, go
v1.0   right ahead.  If you  want  to  pass  switches  into  your
v1.0   script, you can put a loop on the front like this:
v1.0
v1.0        while ($_ = $ARGV[0], /^-/) {
v1.0             shift;
v1.0             last if /^--$/;
v1.0             /^-D(.*)/ && ($debug = $1);
v1.0             /^-v/ && $verbose++;
v1.0             ...       # other switches
v1.0        }
v1.0        while (<>) {
v1.0             ...       # code for each line
v1.0        }
v1.0
v1.0   The <> symbol will return FALSE only once.  If you call it
v1.0   again after this it will assume you are processing another
v1.0   @ARGV  list, and if you haven't set @ARGV, will input from
v1.0   stdin.

And here we have the perlop manpage from 5.10.0:

v5.10  The null filehandle <> is special: it can be used to
v5.10  emulate the behavior of sed and awk.  Input from <> comes
v5.10  either from standard input, or from each file listed on
v5.10  the command line.  Here's how it works: the first time <>
v5.10  is evaluated, the @ARGV array is checked, and if it is
v5.10  empty, $ARGV[0] is set to "-", which when opened gives you
v5.10  standard input.  The @ARGV array is then processed as a
v5.10  list of filenames.  The loop
v5.10
v5.10      while (<>) {
v5.10          ...                     # code for each line
v5.10      }
v5.10
v5.10  is equivalent to the following Perl-like pseudo code:
v5.10
v5.10      unshift(@ARGV, '-') unless @ARGV;
v5.10      while ($ARGV = shift) {
v5.10          open(ARGV, $ARGV);
v5.10          while (<ARGV>) {
v5.10              ...         # code for each line
v5.10          }
v5.10      }
v5.10
v5.10  except that it isn't so cumbersome to say, and will
v5.10  actually work.  It really does shift the @ARGV array and
v5.10  put the current filename into the $ARGV variable.  It also
v5.10  uses filehandle ARGV internally--<> is just a synonym for
v5.10  <ARGV>, which is magical.  (The pseudo code above doesn't
v5.10  work because it treats <ARGV> as non-magical.)
v5.10
v5.10  You can modify @ARGV before the first <> as long as the
v5.10  array ends up containing the list of filenames you really
v5.10  want.  Line numbers ($.)  continue as though the input
v5.10  were one big happy file.  See the example in "eof" in
v5.10  perlfunc for how to reset line numbers on each file.
v5.10
v5.10  If you want to set @ARGV to your own list of files, go
v5.10  right ahead.  This sets @ARGV to all plain text files if
v5.10  no @ARGV was given:
v5.10
v5.10      @ARGV = grep { -f && -T } glob('*') unless @ARGV;
v5.10
v5.10  You can even set them to pipe commands.  For example, this
v5.10  automatically filters compressed arguments through gzip:
v5.10
v5.10      @ARGV = map { /\.(gz|Z)$/ ? "gzip -dc < $_ |" : $_ } @ARGV;
v5.10
v5.10  If you want to pass switches into your script, you can use
v5.10  one of the Getopts modules or put a loop on the front like
v5.10  this:
v5.10
v5.10      while ($_ = $ARGV[0], /^-/) {
v5.10          shift;
v5.10          last if /^--$/;
v5.10          if (/^-D(.*)/) { $debug = $1 }
v5.10          if (/^-v/)     { $verbose++  }
v5.10          # ...           # other switches
v5.10      }
v5.10
v5.10      while (<>) {
v5.10          # ...           # code for each line
v5.10      }
v5.10
v5.10  The <> symbol will return "undef" for end-of-file only
v5.10  once.  If you call it again after this, it will assume you
v5.10  are processing another @ARGV list, and if you haven't set
v5.10  @ARGV, will read input from STDIN.

I get the very strong idea you haven't read that.  

> Why not fix <> so it doesn't need all the explanation?  Make it just
> read the files, which can be adequately documented in one sentence.
> And add a new operator, just a couple of characters longer, that
> people can use to get all the handy magic behaviour, *if they want it*
> and having read the documentation.

It's not <> that's at issue, but that readline on the synthetic
filehandle ARGV does what it has always said it does.  And it does.

No New Operators, thank you very much.

Better to add something by the gz map in the doc (that
nobody reads(?).  

Better to create some ARGV:: modules for autoinstalling 
the filters, or "autoprotecting" the filenames.

Better to let people still use gz stuff and such 
if they want to--many of us clearly do.  Not having
a readonly open be anything else than that is fine.

>> Which brings me to my final complaint of the day.

>>     % perl -i.orig -pe 'print "this is more stuff\n"' it
>> 
>>     % echo $?
>>     0
>> 
>>     % ls -l it*
>>     0 -rw-r--r--  1 tchrist  wheel  0 Jul 29 15:05 it
>>     0 -rw-r--r--  1 tchrist  wheel  0 Jul 29 14:06 it.orig
>> 
>> To this day, Perl's implicit closing of files doesn't warn you of
>> errors, let alone exit nonzero.  This makes it do wrong thing and not
>> even tell you it did them wrong.

> I am glad you mentioned this, because I also think it's a pretty
> egregious bug, but I was rather imagining you'd come out to defend it,

Huh?

> point out that it has always been that way and why change it now, and
> anyway is clearly mentioned as a note to an addendum to a particular
> perlfaq answer.  I am glad we agree on this issue at least :-(.

What you mean in section 8?  I'm the one who came up with the END{}
block with closing STDOUT, remember.  I couldn't get people to believe
that silent implicit closes were a problem.  I'm especially amazed
that they didn't fix this when they created autovivving filehandles,
since those also autoclose.  

> It is not sufficient; I think it might sometimes be necessary, see
> below.

> Since nobody in practice checks the return status of every print()
> call, I would like to see an enhancement to perl where if any print()
> on a filehandle fails, it sets a flag which is then checked by
> close().  So you could be sure to catch file I/O errors sooner or
> later, if not immediately.  But that is for another thread ;-p.

And just what do you *think* happens!?  See your local clearerr and
ferror manpages in section 3 or 3S of your local mantree.  I'd be aghast
if this behavior weren't carried out for the perlio implementation.
IO::Handle documentation suggests that it has, or is otherwise emulated:

    =item $io->error

    Returns a true value if the given handle has experienced any errors
    since it was opened or since the last call to C<clearerr>, or if the
    handle is invalid. It only returns false for a valid handle with no
    outstanding errors.

    =item $io->clearerr

    Clear the given handle's error indicator. Returns -1 if the handle is
    invalid, 0 otherwise.

>> I've never convinced anybody this is important.  Since *every*
>> program should do this for correctness, it has to be in the run-time
>> system to avoid it ever being forgotten.

>> Sure, there's stuff like this you can do:
>> 
>>      END { close(STDOUT) || die "can't close stdout: $!" }

> Is even that sufficient?  Is it not possible a print() might fail but
> the next print() succeed (perhaps the disk was full, but then some
> space became available) and the error flag gets cleared?  Does it
> depend on whether stdout is opened to a file, a pipe, a tty, etc?

See above.

> [further example of incorrect behaviour]

>> +-------------------------------------------------------+
>> | Notice how even my cat is smarter than your perl!? :( |
>> +-------------------------------------------------------+

> I would also note that cat has absolutely no problem handing arbitrary
> filenames, be they >>>LOL<<<, |^^|, or anything else.  The only
> exception is the standard Unix convention of - for stdin.

> Of course, a perl filter program can be made to work with all possible
> filenames if you write the code longhand instead of with <>, and it
> can be made to handle file I/O robustly if you take the trouble to do
> open and close by hand and check everything scrupulously.  But you
> shouldn't have to.  The default should just work and be the safe,
> checked code unless you specify otherwise.

The system is "safe".  This is a tempest a teapot.

As for arbitrary filenames, you're forgetting some history again. Notice
why there's no aux.sh hints file.  Try creating con.tmp or aux.plx on
some systems.

If Larry had wanted Perl's open() to be identical to fopen(), he would
simply have called it that, and that's all it would ever have done. But
he wanted it to also be popen(), and he wanted it to deal with an idea
of a standard synonym for standard streams.  And please don't tell me
that you really believe /dev/stdin or /dev/fd/0 or whatnot were in vogue
during the 80s.  They weren't.  You can't rely on them today, either.
The addition of dup2() and fdopen() handling to Perl's open(), not to
mention the fork-open, all came later.  

The reason it all went under open() was to make open() easier to use,
not harder.  But then, people have been slowing pulling things out of
it to make it more like C.  We didn't even have the C version of open
for aeons and aeons.  No, really.  Perl1 had no sys*() functions at
all, save I guess for system(), but there's no SYS_tem #define in the
jumptable in <sys/syscall.h> as there is for things like SYS_open, 
SYS_read, and SYS_close.  And yes, this is a bit of a jest.

Perl4, meanwhile, *had* syscall() -- and sysread() and syswrite(), so we
wouldn't have to use syscall() to get them.  Oddly, its manpage used
SYS_write for the example.

    require 'syscall.ph';         # may need to run h2ph
    syscall(&SYS_write, fileno(STDOUT), "hi there\n", 9);

Which, oddly, still persists there, whereas the SYS_close example from
perlfaq5 might be more appropriate (despite the horribly malformatted
"helpful" bit appended after that).

But Perl4 certainly did *not* have sysopen() or sysseek().  Then again,
neither did perl5!  Well, not perl 5.001. 

Nor did perl5.001 or even perl5.001m.

We didn't get sysopen() until 5.002.  The change entry reads:

    NETaa15210: sysopen()
    From: Chip Salzenberg
    Files patched: doio.c keywords.pl lib/ExtUtils/typemap opcode.pl 
	opd/perlfunc.pod pp_hot.c pp_sys.c p roto.h toke.c
    Applied suggested patch.  Hope it works...

We briefly had a systell() function, but no more.  

Anybody else rememeber systell()?  

You have to look very closely and not blink.  It was there in 5.003_97f,
which was what was supposed to become 5.004.  The change log read:

    -----------------
    Version 5.003_97g
    -----------------

    This is it before _98.  No more last-minute features.  Really, I mean
    it this time.  No kidding.

     CORE LANGUAGE CHANGES

      Title:  "New operator systell()"
       From:  Chip Salzenberg
      Files:  doio.c ext/Opcode/Opcode.pm keywords.pl opcode.pl
	      pod/perldelta.pod pod/perldiag.pod pod/perlfunc.pod pp_sys.c
	      t/op/sysio.t toke.c

But that wasn't to last long at all, and Chip got to eat his words very
shortly thereafter with the *g* sub-sub-release of 5.003_97!

    -----------------
    Version 5.003_97g
    -----------------

    This one has two security bug fixes for buffer overflows.  Perl has
    not yet been searched to see if more fixes are needed.

     CORE LANGUAGE CHANGES

      Title:  "Improve sysseek(), remove systell(), fix Opcode"
       From:  Chip Salzenberg
      Files:  doio.c ext/Opcode/Makefile.PL ext/Opcode/Opcode.pm
	      ext/Opcode/Opcode.xs global.sym keywords.pl opcode.pl
	      pod/perldelta.pod pod/perldiag.pod pod/perlfunc.pod pp_sys.c
	      proto.h t/op/sysio.t toke.c

Seems to be there's still no sysreadline, but I've published source code
for that.  You need it if you're using sysread(), syswrite(), and C-
style select.  Otherwise you'll get wrong answers. I don't know if the
raw, unbuffered I/O layer makes any of this any easier.  I doubt it.

As for the input operator, go back and reread *its* purpose.  It does a
higher-level thing than fgets(), and with good intent and effect.

>> I firmly believe that this needs to be a part of *EVERY* Perl
>> program, which consequently means it should *not* be there, but 
>> in the core itself:
>>
>>     END { close(STDOUT) || die "can't close stdout: $!" }

> Yup, and also for STDERR (if closing STDERR fails you can attempt to
> print a warning message to it, which might or might not get through,
> but you can at least exit with nonzero status).

Possibly.  Probably.  Maybe.

>> Furthermore, I also believe all *IMPLICIT* closes that fail need to
>> generate a mandatory io warning,

> This also makes perfect sense.  

Easy for you to say.  Others have said otherwise in the past, and
I've not prevailed.  I don't know if those people are still alive
anymore though, for I don't recall who they were nor the thrust
of their arguments against such sanity.  I don't recall ever having
been convinced of whatever "reasoning" they used, but absence of
evidence is not to be confused with evidence of absence, so perhaps
I was and that memory is now hidden from me.

> Does anyone really object to this?  

They clearly did, or it would have been done.  But maybe this is one of
those rare periods in history where we can fix it when the PC police
aren't breathing down our throats.  One shall see.

> In principle it is a backwards incompatible change (code that
> previously chugged along without visible problems will now start
> producing warnings or exiting with nonzero status) but I can't see the
> point of taking compatibility to such extremes.

A bug is a bug, and this is.  Always has been.  
  
--tom



nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About