develooper Front page | perl.perl5.porters | Postings from February 2013

Don't patch perlopentut: rewrite it completely

Thread Next
Tom Christiansen
February 17, 2013 01:30
Don't patch perlopentut: rewrite it completely
Message ID:
SUMMARY: Let's rename perlopentut and write a new one from scratch.

The perlopentut manpage is trying to be something else than certain
people seem to want.  Just look at its horrible outline:

     Open à la shell
         Simple Opens
         Indirect Filehandles
         Pipe Opens
         The Minus File
         Mixing Reads and Writes
     Open à la C
         Permissions à la mode
     Obscure Open Tricks
         Re-Opening Files (dups)
         Dispelling the Dweomer
         Paths as Opens
         Single Argument Open
         Playing with STDIN and STDOUT
     Other I/O Issues
         Opening Non-File Files
         Opening Named Pipes
         Opening Sockets
         Binary Files
         File Locking
         IO Layers

Nearly every one of those is non-simple. That is not a tutorial, and it is not
just about open.  It is more like "FMTEYEWTK About open() and I/O in Perl". If
the goal is to create a document that teaches people how to use open() for the
simple cases, then there is no hope for perlopentut.  It cannot be turned into
that, because that is not what it is.  You have to start over.

Here's a better outline:

         Opening Text Files
         Opening Binary Files
         Opening Pipes
         Low-level File Opens via sysopen

See what I mean? Short, simple; no fancy stuff.  Just the facts, ma'am.
I might even agree that the final sysopen section should go.  Too hard.  
Not basic.  Ugly C programming.  There are FAQs for that sort of thing.

Because perlopentut is more ETC than TUT, I suggest that the original 
perlopentut be renamed to perlopenetc --and something else be written 
from scratch to take its place.

I have intentionally left out a great deal of bits that just do not
fit the goal of teaching people how to use open() in the most basic 
and common cases.  I did find that I was unable to completely avoid 
mentioning the actual I/O operators, but the focus is still on open
alone. Most notably, all the esoterica is gone. All we have left is:

     *  Only autovivified handles as arguments to open.
     *  Only three-argument open.

I believe those two points alone will fix the constant scolding, but 
we can do better than that.  Just look at what all has been removed:

     *  No more one- or two-argument open.
     *  No more tricks.
     *  No more reopening STDIN or STDOUT.
     *  No more trying to teach <>.
     *  No more magic ARGV.  
     *  No more magic minus.
     *  No more dup or dup2.
     *  No more sockets.
     *  No more forks.
     *  No more globs.
     *  No more symbol table games.
     *  No more read-write on text files; too hard; see FAQ.
     *  No more file locking; too hard; see FAQ.

Here's one idea in that direction; it's just a sketch, but 
the final version should not be much more complicated.  I'm
still undecided about sysopen: should it stay or should it go?



=encoding utf8

=head1 NAME

perlopentut - simple recipes for opening files and pipes in Perl


Whenever you do I/O on a file in Perl, you do so through what in Perl is
called a B<filehandle>.  A filehandle is an internal name for an external
file.  It is the job of the C<open> function to make the association
between the internal name and the external name, and it is the job
of the C<close> function to break that associations.

For your convenience, Perl sets up a few special filehandles that are
already open when you run.  These include C<STDIN>, C<STDOUT>, C<STDERR>,
and C<ARGV>.  Since those are pre-opened, you can use them right away
without having to go to the trouble of opening them yourself:

    print STDERR "This is a debugging message.\n";

    print STDOUT "Please enter something: ";
    $response = <STDIN> // die "how come no input?";
    print STDOUT "Thank you!\n";

    while (<ARGV>) { ... }

As you see from those examples, C<STDOUT> and C<STDERR> are output 
handles, and C<STDIN> and C<ARGV> are input handles.  Those are 
in all capital letters because they are reserved to Perl, much
like the C<@ARGV> array and the C<%ENV> hash are.  Their external
associations were set up by your shell.

For eveyrthing else, you will need to open it on your own. Although there 
are many other variants, the most common way to call Perl's open() function 
is with three arguments and one return value:

C<    I<OK> = open(I<HANDLE>, I<MODE>, I<PATHNAME>)>


=item I<OK> 

will be some defined value if the open succeeds, but 
C<undef> if it fails;

=item I<HANDLE> 

should be an undefined scalar variable to be filled in by the 
C<open> function if it succeeds;

=item I<MODE> 

is the access mode and the encoding format to open the file with;

=item I<PATHNAME> 

is the external name of the file you want opened.


Most of the complexity of the C<open> function lies in the many 
possible values that the I<MODE> parameter can take on.

One last thing before we show you how to open files: opening
files does not (usually) automatically lock them in Perl.  See
L<perlfaq4> for how to lock.

=head Opening Text Files

=head2 Opening Text Files for Reading

If you want to read from a text file, first open it in 
read-only mode like this:

    my $filename = "/some/path/to/a/textfile/goes/here";
    my $encoding = ":encoding(UTF-8)";
    my $handle   = undef;     # this will be filled in on success

    open($handle, "< $encoding", $filename)
        || die "$0: can't open $filename for reading: $!\n";

As with the shell, in Perl the C<< "<" >> is used to open the file in
read-only mode.  If it succeeds, Perl allocates a brand new filehandle for
you and fills in your previously undefined C<$handle> argument with a
reference to that handle.

Now you may use functions like C<readline>, C<read>, C<getc>, and
C<sysread> on that handle.  Probably the most common input function 
is the one that looks like an operator: 

    $line = readline($handle);
    $line = <$handle>;          # same thing

Because the C<readline> function returns C<undef> at end of file or
upon error, you will sometimes see it used this way:

    $line = <$handle>;
    if (defined $line) {
        # do something with $line
    else {
        # $line is not valid, so skip it

You can also just quickly C<die> on an undefined value this way:

    $line = <$handle> // die "no input found";

However, if hitting EOF is an expected and normal event, you 
would not to exit just because you ran out of input.  Instead,
you probably just want to exit an input loop.  Immediately
afterwards you can then test to see if there was an actual
error that caused the loop to terminate, and act accordingly:

    while (<$handle>) {
        # do something with data in $_
    if ($!) {
        die "unexpected error while reading from $filename: $!";

B<A Note on Encodings>: Having to specify the text encoding every time
might seem a bit of a bother.  To set up a default encoding for C<open> so
that you don't have to supply it each time, you can use the C<open> pragma:

    use open qw< :encoding(UTF-8) >;

Once you've done that, you can safely omit the encoding part of the
open mode:

    open($handle, "<", $filename)
        || die "$0: can't open $filename for reading: $!\n";

But never use the bare C<< "<" >> without having set up a default encoding
first.  Otherwise, Perl cannot know which of the many, many, many possible
flavors of text file you have, and Perl will have no idea how to correctly
map the data in your file into actual characters it can work with.  Other
common encoding formats including C<"ASCII">, C<"ISO-8859-1">,
C<"ISO-8859-15">, C<"Windows-1252">, C<"MacRoman">, and even C<"UTF-16LE">.
See L<perlunitut> for more about encodings.

=head2 Opening Text Files for Writing

On the other hand,  you want to write to a file, you first have to decide
what to do about any existing contents.  You have two basic choices here:
to preserve or to clobber.  

If you want to preserve any existing contents, then you want to open the
file in append mode.  As in the shell, in Perl you use C<<< ">>" >>> to
open an existing file in append mode, and creates the file if it does not
already exist.

    my $handle   = undef;   
    my $filename = "/some/path/to/a/textfile/goes/here";
    my $encoding = ":encoding(UTF-8)";

    open($handle, ">> $encoding", $filename)
        || die "$0: can't open $filename for appending: $!\n";

Now you can write to that filehandle using any of C<print>, C<printf>,
C<say>, C<write>, or C<syswrite>.

The file does not have to exist just to open it in append mode.  If the
file did not previously exist, then the append-mode open creates it for
you.  But if the file does previously exist, its contents are safe from
harm because you will be adding your new text past the end of the old text.

On the other hand, sometimes you want to clobber whatever might already be
there.  To empty out a file before you start writing to it, you can open it
in write-only mode:

    my $handle   = undef;   
    my $filename = "/some/path/to/a/textfile/goes/here";
    my $encoding = ":encoding(UTF-8)";

    open($handle, "> $encoding", $filename)
        || die "$0: can't open $filename in write-open mode: $!\n";

Here again Perl works just like the shell in that the C<< ">" >> clobbers
an existing file.

As with the append mode, when you open a file in write-only mode,
you can now write to that filehandle using any of C<print>, C<printf>,
C<say>, C<write>, or C<syswrite>.

What about read-write mode?  You should probably pretend it doesn't exist,
because opening text files in read-write mode is unlikely to do what you
would like.  See L<perlfaq4> for details.

=head1 Opening Binary Files

If the file to be opened contains binary data instead of text characters,
then the C<MODE> argument to C<open> is a little different.  Instead of 
specifying the encoding, you tell Perl that your data are in raw bytes.

    my $filename = "/some/path/to/a/binary/file/goes/here";
    my $encoding = ":raw :bytes"
    my $handle   = undef;     # this will be filled in on success

And then open as before, choosing C<<< "<" >>>, C<<< ">>" >>>, or 
C<<< ">" >>> as needed:

    open($handle, "< $encoding", $filename)
        || die "$0: can't open $filename for reading: $!\n";

    open($handle, ">> $encoding", $filename)
        || die "$0: can't open $filename for appending: $!\n";

    open($handle, "> $encoding", $filename)
        || die "$0: can't open $filename in write-open mode: $!\n";

Alternately, you can change to binary mode on an existing handle this way:

    binmode($handle)    || die "cannot binmode handle";

This is especially handy for the handles that Perl has already opened for you.

    binmode(STDIN)      || die "cannot binmode STDIN";
    binmode(STDOUT)     || die "cannot binmode STDOUT";

You can also pass C<binmode> an explicit encoding to change it on the fly.
This isn't exactly "binary" mode, but we still use C<binmode> to do it:

    binmode(STDIN,  ":encoding(MacRoman)")  || die "cannot binmode STDIN";
    binmode(STDOUT, ":encoding(UTF-8)")     || die "cannot binmode STDOUT";

Once you have your binary file properly opened in the right mode, you can 
use all the same Perl I/O functions as you used on text files.  However, 
you may wish to use the fixed-size C<read> instead of the variable-sized
C<readline> for your input.  

Here's an example of how to copy a binary file:

    my $BUFSIZ   = 64 * (2 ** 10);
    my $name_in  = "/some/input/file";
    my $name_out = "/some/output/flie";

    my($in_fh, $out_fh, $buffer);

    open($in_fh,  "<", $name_in)   || die "$0: cannot open $name_in for reading: $!";
    open($out_fh, ">", $name_out)  || die "$0: cannot open $name_out for writing: $!";

    for my $fh ($in_fh, $out_fh)  {
        binmode($fh)               || die "binmode failed";

    while (read($in_fh, $buffer, $BUFSIZ)) {
        unless (print $out_fh $buffer) {
            die "couldn't write to $name_out: $!";

    close($in_fh)       || die "couldn't close $name_in: $!";
    close($out_fh)      || die "couldn't close $name_out: $!";

=head1 Opening Pipes

To be announced.

=head1 Low-level File Opens via sysopen

To be announced.  Or deleted.

=head1 SEE ALSO

To be announced.


To be announced.

=head1 HISTORY

To be announced.

Thread Next Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at | Group listing | About