develooper Front page | perl.perl5.porters | Postings from July 2008

Creative and *routine* use of so-called "magic" ARGV (was [perl #2783] Security of ARGV using 2-argument open)

Thread Next
Tom Christiansen
July 28, 2008 20:36
Creative and *routine* use of so-called "magic" ARGV (was [perl #2783] Security of ARGV using 2-argument open)
Message ID:
In-Reply-To: Message from Mark Jason Dominus <> 
   of "Mon, 28 Jul 2008 14:38:07 EDT." <1217270287.13235.43.camel@localhost> 

> On Mon, 2008-07-28 at 09:58 +0100, Ed Avis wrote:

>> We've done a little survey here 

>I disagree.  You have not, 

Quite right; Abigail also picked up on that.

> and even if you had, it would probably not be
> worth anything anyway.

Right again.

>> and IIRC the answers were

> I have used that feature more than once.

Nor are you alone.  See perlopentut.

Although that doesn't say how *much* more often than once.  
I intend to quantify mine, at least a little bit.

> For example, I have a program that generates reports from my web
> server logs which begins:

>        #!/usr/bin/perl -lan
>        BEGIN { 
>          for (@ARGV) {
>            if (/\.gz$/) {
>              $_ = "gzip -dc $_ |";
>            }
>          }
>        }
>        ...

Even so.  

As far as I can tell from quick inspection, I have 138 programs that
make use of <ARGV> in some form or another.

And special magic-argv preprocessing, well, that I do *all* the time--quite
in keeping with Mark Mielke's wishes for still more magic, which is already
there if you but ask for it.  These extra magicalities could be even in a
module so I could stop writing them.  They tend to fall into one of the
following areas.  And this is a very *frequent* thing, especially the first
of the next two examples, as you yourself pointed out:

   @ARGV = map { /^\.(gz|Z)$/ 
		    ? "gzip -dc < $_ |" 
		    : $_  
	       } @ARGV;

   @ARGV = map { m#^\w+://# 
		    ? "lwp-request -m GET $_ |" 
		    : $_ 
	       } @ARGV;

And yes, I'm aware of the dollar not being persnicketily correct.  I also
don't think it matters; I see you didn't care, either.  One merely gets an
open failure, which isn't even fatal unless you use 
    use warnings FATAL => 'inplace' 
or some such similar incantation.

Those who've not reread perlopentut lately (AHEM) may have forgotten how
in there I wrote this nicety:

   $pwdinfo = ( `domainname` =~ /^(\(none\))?$/ )
	       ? "< /etc/passwd"
	       :  "ypcat passwd |" ;

   open(PWD, $pwdinfo)
	   || die "can't open $pwdinfo: $!";

Another sort of @ARGV preprocessing involves deciding what to do on an
empty one.  I have three sorts of strategies I routinely employ, not 
mutually exclusively:

    1) To weed out undesirable files:

	@ARGV = grep { -f && -T } @ARGV;  

    2) To default to something other than <&=STDIN on empty @ARGV:

        @ARGV   = (".") 		unless @ARGV;  # recurse on directory
	@ARGV   = <*> 			unless @ARGV;
        @ARGV   = <*.*> 		unless @ARGV;
        @ARGV   = <*.[Cchy]> 		unless @ARGV;

	@ARGV ||= grep { / \. (?: jpe?g | tiff? | nef ) $ /ix } <*.*>;

	chomp(@ARGV = <STDIN>);  # possibly with $/ set to chr(0)

    3)  To warn the user that you're going to use "-" so they don't
	wonder what's happening when nothing does:

	if (@ARGV == 0 && -t STDIN && -t STDERR) {  # maybe omit -t STDERR
	    print STDERR "[$0: reading from stdin...]\n";

	On BSD systems, I always "stty status ^t", which helps here,
	but most people don't know to hit ^T when they seem hung.

	Here I demo doing so:

	    % tcgrep foo
	    tcgrep: reading from stdin
	    load: 0.96  cmd: perl 28 [runnable] 0.16u 0.05s 0% 547k
	     0x3c021000 fdr_wait    15 -c--RW---f 0000 main     
	    load: 0.96  cmd: perl 28 [runnable] 0.16u 0.05s 0% 549k
	     0x3c021000 fdr_wait    15 -c--RW---f 0000 main     
	    load: 0.96  cmd: perl 28 [runnable] 0.16u 0.05s 0% 549k
	     0x3c021000 fdr_wait    15 -c--RW---f 0000 main     
	    load: 0.77  cmd: perl 28 [runnable] 0.16u 0.05s 0% 549k
	     0x3c021000 fdr_wait    15 -c--RW---f 0000 main     

As I said, these @ARGV mungings aren't mutually exclusive, making a
sequence like this one perfectly reasonable:

    if (@ARGV == 0) {
	if (-t STDIN) { 
	    @ARGV = <*>;             # default to all nondot-files in cwd
	else {
	    chomp(@ARGV = <STDIN>);  # get filename list from stdin

    # figure out which need pipe processing, and what kind
    @ARGV = map { m#^\w+://#   
		    ? /\.(gz|Z)$/ 
			? "lwp-request -m GET $_ | gzip -dc |"
			: "lwp-request -m GET $_ |" 
		    : /\.(gz|Z)$/ 
			? "gzip -dc < $_ |" 
			: $_ 
		} @ARGV;

    # and finally winnow out the non-text files
    @ARGV = grep { /\|\z/ || /^-\z/ || (-f && -T) } @ARGV;

If you feed an @ARGV or <STDIN> like this:


It will dutifully transform that mess into the following clean files,
all ready for <> processing:

    gzip -dc < foo.gz |
    lwp-request -m GET | gzip -dc |

Which I think is the very sort of WYSIWIG magic that Mark Mielke was 
hoping for.  As I've shown, it's not tough code to write, and it's
already supported in Perl.  Has been for quite a while, in fact.

Isn't that cool?

About using the input operator for fileglobbing, not just for readlining, 
this occurs in some fewer of my programs, only around 30, not over
100 the way <ARGV> processing does.  Here's a sort -dfu'd list of such:

    @ARGV = <*>;
    @ARGV = tsort(<*>) unless @ARGV;
    @catfiles = <${dir}/cat[12368l]/${arg}.*>;
    chown($<, -1, <*>)
    defined $opt_newmanifest ? $opt_newmanifest : "<*>",
    @dirs = grep {-d && $cwd !~ /${_}$/ } <*>;
    $ENV{MAIL} = <~tchrist/Mail/in.coming/personal>;
    @files = (@required, <*.pl>);
    for $file ( <*.pl> ) {
    for $need_perl (reverse sort </usr/local/bin/perl* /usr/bin/perl*>) {
    foreach $catdir ( <cat*> ) {
    foreach $chapdir ( <chap??> ) {
    foreach $dir ( <man?*> ) {
    foreach $lock ( </usr/spool/uucp/LCK..*> )  {
    foreach $mbox ( <*> ) {
    if ( (@name = <${fullpath}*>)  && "@name" !~ /\*$/ ) {
    $incpath = <~/Mail/outpostbox>;
    $logfile = shift || <~/logs/www/access_log>;
    page: foreach $page (<*.*>) {
    print sort {$a <=> $b} <[0-9]*>;
    while (<*.*>) {

There is an unhealthy prescriptivist and alarmist spirit haunting us in
this matter, and I'm increasingly concerned its pesterful stridency may
eventually wear people down enough to overrule good sense and previous fair
practice.  Breaking people's programs is a good way to drive them away, and
that is never a good thing.

For v5.10, I had to update a couple of old programs that used $* because
when they were written, ONE HAD NO CHOICE.  I don't think I've had to do
that since open(log, ">>/tmp/foolog") stopped working.  

Do you REALIZE how very long ago that was?  Almost nobody reading this list
remembers then.  That's how long it's been since we screwed people around.
Why in the world the anally-confused should want to do so again, and so 
soon, I completely fail to understand.

The thought of updating triple-digit numbers of my happily running scripts
that certain individuals would just as well see broken is really beyond
the conscionable--or its promulgators, conscientiousness.

The only thing I long for right now is Rafael's currently nonworking
inspiration of "<:utf8 someutf8file" or "<:crlf winfile.txt" and the like.
I sure would like that a lot, because it's a bit of a pain otherwise.
Running `file` on the input filename to figure out its flavor for
processing isn't always as nice as letting the user specify it directly and
individually.  DWIM may be one thing, but DWIS should always overrule it.

I wonder whether we shouldn't have some ARGV:: modules or dwimmer::
pragmata that help run, or not run, these convenient transforms on @ARGV,
the way we already have for Getopt::this_and_that, which all update @ARGV
per expectation.  The anaylsayers can have their "no dwimmer::ARGV;" if
they want thent to dispell the dweomer, and then blissfully leave the rest
of us--and our code!--in peace.

Yes, I know I yet owe a cogent explanation detailing why the sound and fury
of the alarmicists is both uncalled for and unrealistic.  Even worse,
coddling to their demands through appeasement may even seduce them into a
dangerously unmerited sense of complacency, an illusion of false security
where none such exists.  I *do* have that argument ready for elaboration
and exposition, but while it is shorter than this missive, that's really
quite enough for tonight.

Good night.


Thread Next Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at | Group listing | About