develooper Front page | perl.perl5.porters | Postings from July 2011

Hypermodernize perlopentut.pod

Thread Previous | Thread Next
From:
Tom Christiansen
Date:
July 16, 2011 09:32
Subject:
Hypermodernize perlopentut.pod
Message ID:
22100.1310833921@chthon
Mike,

Thank you for trying to help.  I hadn't realized that this was your first
posting.  I apologize for being so harsh last night.

My own personal opinion of perlopentut's main flaws is

	(1) its relegating all discussion of encodings to the end,
    and (2) its reliance on buggy implicit closes.

Here's a sample of some pod2text'd text for what I think needs to be dealt
with earlier, and perhaps how.  I hope this helps clarify my concerns and
offers a possible direction for improvement.

=======================================================================

    When doing I/O, you cannot help but come up against a key
    difference between how Perl thinks of strings and how the
    rest of the world things of them: Perl uses *decoded* strings,
    while the strings the rest of the world uses are *encoded*.

    Unlike in older versions of Perl, we now think of there being just
    decoded strings for internal use:

     *  A *decoded* string holds characters of arbitrarily large ordinals
        (ok, not infinitely large, but minimally 32 bits worth, and
        probably as many as 72).

     *  What these *decoded internal* strings actually hold is a mystery to
        anything *outside* of Perl.

    and encoded strings for external use:

     *  An *encoded* string holds characters whose ordinals are always under
        256 (that is, 8 bits only).

     *  What these *encoded external* strings actually hold is a mystery to
        anything *inside* of Perl.

    Trying to work with raw encoded strings is a real pain, and all the old
    "use locale" and "use bytes" stuff is a sop to those people. That's why
    we suggest avoiding those complicated old models and using instead the
    simpler modern model to:

     *  Always *decode* all *incoming* data as the very *first* thing.

     *  Always *encode* all *outgoing* data as the very *last* thing.

    Setting the encoding on the streams makes this happen automatically.
    Although there are implicit ways to set the coding with command-line
    options or environment variables, or sometimes with the "use open"
    pramga, it's more common to explicitly set it with the 2nd argument to
    either of "binmode" or of "open" (provided that that "open" has at
    least 3 arguments). For example:

         binmode(STDIN,      ":encoding(cp1252)")
                       || die "can't binmode STDIN: $!";
         open(OUTPUT, "> :raw :encoding(UTF-16LE) :crlf", $filename)
                       || die "can't open $filename: $!";
         print OUTPUT while <STDIN>;
         close(OUTPUT) || die "couldn't close $filename: $!";
         close(STDIN)  || die "couldn't close STDIN: $!";

    Or, under the recommended "use autodie" pragma, more simply:

	 use autodie;
         binmode(STDIN,      ":encoding(cp1252)");
         open(OUTPUT, "> :raw :encoding(UTF-16LE) :crlf", $filename);
         print OUTPUT while <STDIN>;
         close OUTPUT;
         close STDIN;

    There's no need for manually encoding or decoding if you have the
    streams set up to do that for you. Once you have the data, it is
    just a string of abstract characters.

    That doesn't mean there's never a need for manual encoding or decoding.
    There certainly is, alas, but it is a far rarer thing (we hope). You may
    have to resort to it when working with databases, who being external
    source normally give you encoded strings, or when even working with a
    particularly painful text files each of whose lines might be in any of
    several different encodings.

=======================================================================

Yes, I didn't declare $filename.  This is a partial example; I didn't
say to use v5.14 either, and I think all source units should minimally have
use v#.## at their top.  But you can't realistically give complete program
each time you give an example, or you'll go mad.  See perlfunc's many
examples for what calamity would come of that strategem.

I leave as an exercise for the reader what ugliness necessarily ensues
if you try to make this simple example use only autovivified indirect
filehandles, whatever their scope, instead of the simpler direct filehandles
that I've used here.

Answer: It is not pretty, it is not simple, and it is not necessary.
        It breaks parallelism, fights against Perl's basic I/O model,
	and makes an easy thing harder than Perl designed it to be.

But easy things should be easy.  That's why I reject the complexity.

--tom

Thread Previous | Thread Next


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About