develooper Front page | perl.perl5.porters | Postings from August 2001

Re: On "Command-line Wildcard Expansion"

Thread Previous | Thread Next
Tye McQueen
August 8, 2001 08:56
Re: On "Command-line Wildcard Expansion"
Message ID:
Excerpts from the mail message of $Bill Luebkert:
) I'm using a native tcsh which expands args (within the funky env 
) limitations of Windoze.  I don't want any args passed in as '*fubar*' 
) to be expanded since they aren't file globs.

Of course.  That is why there needs to be an easy way to tell
Perl that you don't want globbing.

*** Skip to near the end for the important stuff ***

) The args need to be selectively globbed based on what they refer to.
) Only the shell can provide globbing properly, since by the time Perl 
) sees the args, I believe they are already stripped of ' and ", etc.

I don't agree with either of those.  Globbing needs to be done by
the same mechanism that does the quoting.  For the most common
Win32 shells, the shell does neither globbing nor quoting.  In
fact, the Win32 OS makes having a shell handle quoting robustly
nearly impossible since executed programs are given a command line
and not a list of command arguments.  So even if the shell does
quoting, it has to transform the resulting list of arguments back
into a single command line for passing to the other program and so
probably has to quote things again.

So the best a Win32 shell can do is to offer flexible quoting
schemes that it then puts into a canonical form so that the
applications only have to offer a rudamentary quoting scheme
(like the one supported by the standard C RTLs).

) This leaves it up to the programmer to decide which args need possible 
) globbing and which don't.

And I really disagree with this.  I thought you just said that
only the shell can do globbing properly.  Surely you aren't
saying that the script writer should tell the shell which
args should be considered for globbing?  But nevermind, as I
think I just misunderstand you and I hope that the rest of
this makes this point moot.

) > > And if Perl were to have built-in
) > > command-line glob()ing for Win32, I'd want an argument of
) > > "*.x" to remain "*.x" in the absence of any matching files.
) > > The only problem I see with the above code is that it
) > > _silently_ drops "*.x" from the command line.
) I assume you mean a quoted *.x should not be globbed - I agree.

If you have a shell that does globbing and quoting (and, being
under Win32, is forced to requote things before constructing the
command line that is given to Perl) then none of the above should
matter at all to you.  You are the exception and would just need
to note your exceptional setup by telling Perl to not bother to
glob at all.

But no, I'm saying that Perl seeing *.x without any quotes
whatever, should stay as *.x if (and only if) there are no files
that match *.x.  This rather simplistic technique does a quite
good job in the majority of cases.  It isn't robust by any means,
but the default Win32 shells make a robust solution impossible for
the majority of Win32 Perl users.  Again, this wouldn't apply to
you, however.

So if your script takes a regex as an argument, the odds of the
regex also being a glob that matches at least one file are quite
small.  The perfect counter-example is the Unix "find" command.
You'll often write something like "find . -name '*.c' -print" when
you have *.c files in the current directory.  Aside form that, the
simplistic "glob if and only if matching files are found" is quite
good at distinguishing intended globs from things that aren't
intended to be globs.  To get better than that, you can get a new
shell.  But if you go out of your way to get a new shell, then you
can do a tiny bit of extra work to tell Perl that it should stop
being helpful in this respect.

Extending the Perl globbing code to handle single quotes to
prevent globbing makes a lot of sense and would make supporting a
Win32 find-like CLI possible.  Then the question is "do you leave
the single quotes in or strip them?"  Stripping them would break
scripts that are currently taking arguments that contain single
quotes.  Not stripping them would mean that scripts that want
"strange" arguments would have to know to strip the single quotes
themself but only when run under Win32.  A middle ground could
be had by only stripping single quotes if they neatly surround
a single command-line argument.

Note that you may not even want to use double quotes as part of
your scheme for preventing globbing as Win32 requires double
quotes for files with spaces in their names and doesn't allow
quoting of just part of the file name and so "my *.doc" (with the
quotes) is the standard Win32 way of globbing for *.doc files
whose names start with "my ".  Also, it appears the Perl lets the
C RTL handle the double quotes so Perl doesn't currently even
see them.

) But are the quotes still there when Perl gets them - I thought not.

Well, "yes and no".  Perl doesn't currently see them, but that is
fairly easy to fix.  And I think that leads to the best solution
I've seen yet.

Now Win32 Perl already treats double quotes quite a bit
differently than a lot of Win32 programs.  So it might make sense
to just extend that "difference" to (not) globbing as well.  It
looks like this will require replacing the C RTL's "command line
to command arguments" conversion.  I like that idea.  We already
replace quite a few bits of C RTL that are broken and are also
already disabling the C RTL globbing because it is broken.  Let's
just replace the C RTL quoting-and-globbing with a good version!

*** Skip to here for the important stuff ***

Have Perl default to refetching the command line and rebuilding
argv[] using the C RTL's rather simplistic " and \" handling,
while globbing unquoted arguments that match at least one file.

) If Perl can differentiate between quoted and non-quoted args, then 
) it might be possible to glob each non-quoted arg with a wild card in it.
) Otherwise it makes no sense to do so.

For the default Win32 shells, no quotes are stripped, so Perl can
tell which arguments are quoted if the suggestion in my previous
paragraph were accepted.

) As long as globing is not the default - fine.

And this gets to the crux of my position.  Globbing _should_
be the default because the default shell doesn't glob.  If
you have a non-default configuration for your shell, then it
isn't unreasonable for you to have to also change your Perl
configuration away from the default.

How to turn off globbing should be tied to the shell that launched
Perl (so you can use more than one shell and have one copy of
Perl behave correctly depending on the shell).  So an environment
variable seems the obvious choice to me.  I'd probably just add a
PERLGLOB environment variable.  I'd call it PERLNOGLOB since the
default should be to glob, but I suspect it might evolve into also
being used to control _how_ Perl does globbing (if there is enough
desire to have single quotes prevent globbing, for example, etc.)


Thread Previous | Thread Next Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at | Group listing | About