Front page | perl.perl5.porters |
Postings from July 2011
From: Tom Christiansen
July 16, 2011 09:32
Message ID: 22100.1310833921@chthon
Thank you for trying to help. I hadn't realized that this was your first
posting. I apologize for being so harsh last night.
My own personal opinion of perlopentut's main flaws is
(1) its relegating all discussion of encodings to the end,
and (2) its reliance on buggy implicit closes.
Here's a sample of some pod2text'd text for what I think needs to be dealt
with earlier, and perhaps how. I hope this helps clarify my concerns and
offers a possible direction for improvement.
When doing I/O, you cannot help but come up against a key
difference between how Perl thinks of strings and how the
rest of the world things of them: Perl uses *decoded* strings,
while the strings the rest of the world uses are *encoded*.
Unlike in older versions of Perl, we now think of there being just
decoded strings for internal use:
* A *decoded* string holds characters of arbitrarily large ordinals
(ok, not infinitely large, but minimally 32 bits worth, and
probably as many as 72).
* What these *decoded internal* strings actually hold is a mystery to
anything *outside* of Perl.
and encoded strings for external use:
* An *encoded* string holds characters whose ordinals are always under
256 (that is, 8 bits only).
* What these *encoded external* strings actually hold is a mystery to
anything *inside* of Perl.
Trying to work with raw encoded strings is a real pain, and all the old
"use locale" and "use bytes" stuff is a sop to those people. That's why
we suggest avoiding those complicated old models and using instead the
simpler modern model to:
* Always *decode* all *incoming* data as the very *first* thing.
* Always *encode* all *outgoing* data as the very *last* thing.
Setting the encoding on the streams makes this happen automatically.
Although there are implicit ways to set the coding with command-line
options or environment variables, or sometimes with the "use open"
pramga, it's more common to explicitly set it with the 2nd argument to
either of "binmode" or of "open" (provided that that "open" has at
least 3 arguments). For example:
|| die "can't binmode STDIN: $!";
open(OUTPUT, "> :raw :encoding(UTF-16LE) :crlf", $filename)
|| die "can't open $filename: $!";
print OUTPUT while <STDIN>;
close(OUTPUT) || die "couldn't close $filename: $!";
close(STDIN) || die "couldn't close STDIN: $!";
Or, under the recommended "use autodie" pragma, more simply:
open(OUTPUT, "> :raw :encoding(UTF-16LE) :crlf", $filename);
print OUTPUT while <STDIN>;
There's no need for manually encoding or decoding if you have the
streams set up to do that for you. Once you have the data, it is
just a string of abstract characters.
That doesn't mean there's never a need for manual encoding or decoding.
There certainly is, alas, but it is a far rarer thing (we hope). You may
have to resort to it when working with databases, who being external
source normally give you encoded strings, or when even working with a
particularly painful text files each of whose lines might be in any of
several different encodings.
Yes, I didn't declare $filename. This is a partial example; I didn't
say to use v5.14 either, and I think all source units should minimally have
use v#.## at their top. But you can't realistically give complete program
each time you give an example, or you'll go mad. See perlfunc's many
examples for what calamity would come of that strategem.
I leave as an exercise for the reader what ugliness necessarily ensues
if you try to make this simple example use only autovivified indirect
filehandles, whatever their scope, instead of the simpler direct filehandles
that I've used here.
Answer: It is not pretty, it is not simple, and it is not necessary.
It breaks parallelism, fights against Perl's basic I/O model,
and makes an easy thing harder than Perl designed it to be.
But easy things should be easy. That's why I reject the complexity.