develooper Front page | perl.perl5.porters | Postings from September 2017

[perl #121440] spurious lstat calls in Perl_do_readline / glob

From:
James E Keenan via RT
Date:
September 30, 2017 02:36
Subject:
[perl #121440] spurious lstat calls in Perl_do_readline / glob
Message ID:
rt-4.0.24-12539-1506738974-819.121440-15-0@perl.org
On Mon, 17 Mar 2014 12:33:58 GMT, craigberry wrote:
> On 3/14/14, 3:51 PM, Craig A . Berry wrote:
> > # New Ticket Created by  Craig A. Berry
> > # Please include the string:  [perl #121440]
> > # in the subject line of all future correspondence about this issue.
> > # <URL: https://rt.perl.org/Ticket/Display.html?id=121440 >
> >
> >
> > In pp_hot.c in Perl_do_readline, each and every line returned by a
> > glob operation is run through the following check:
> >
> > for (t1 = SvPVX_const(sv); *t1; t1++)
> >     if (strchr("$&*(){}[]'\";\\|?<>~`", *t1))
> >             break;
> > if (*t1 && PerlLIO_lstat(SvPVX_const(sv), &PL_statbuf) < 0) {
> >     (void)POPs;             /* Unmatched wildcard?  Chuck it... */
> >     continue;
> > }
> >
> > where sv contains the line returned by calling sv_gets on the file
> > pointer returned by Perl_start_glob.  The code is ancient; the basic
> > logic hasn't changed since at least 5.000 twenty years ago.
> >
> > Checking each and every character of each and every filename for its
> > presence in a hard-coded list of pattern match characters is a fair
> > amount of processing, but is also non-portable and, more seriously,
> > causes many extra calls to the relatively slow lstat() in very common
> > use cases on non-Unix platforms.
> 
> What I forgot about when I posted is that this code only kicks in when
> PERL_EXTERNAL_GLOB is defined, which on most platforms means only for
> miniperl but on VMS means always.
> 
> > As far as I know (which may not be very for, so please correct me)
> > there would never be a reason to do this check on any but the first
> > line of input (I *think* C<IoLINES(io) == 1> would do this but
> > haven't checked).  By the time you have two lines returned from a
> > glob operation, you know you have at least two matches, so a check
> > for the "no match" case no longer makes sense.
> 
> Answering my own question, I believe the expectation is that multiple
> glob patterns can be processed in sequence, which means an unexpanded
> wildcard could be found anywhere in the input, such is the *.flirble
> in
> the following:
> 
> $ echo *.lst *.flirble *.SH
>  mkppport.lst utils.lst *.flirble Makefile.SH Policy_sh.SH cflags.SH
> config_h.SH makedepend.SH metaconfig.SH myconfig.SH runtests.SH
> 
> So this can't be fixed in the universal fashion I was hoping. The
> patch
> below fixes it on VMS by only checking for wildcard characters that
> are
> meaningful in the VMS glob implementation, thus saving the lstat()
> call
> on all the perfectly normal filenames having brackets, semicolons, and
> dollar signs.
> 
> This makes a glob of the Perl source tree 60% faster on first
> iteration
> and 80% faster on subsequent iterations (presumably because caching of
> directories doesn't get swamped out by all the extra lstat() calls).
> It
> doesn't do anything for paths containing backslashes on Windows, which
> may be of interest to some folks even though it only affects miniperl.
> 
> So I'll push this soonish unless someone has a better suggestion.
> 
> --- pp_hot.c.orig       2014-03-14 21:40:45 -0500
> +++ pp_hot.c    2014-03-15 18:00:37 -0500
> @@ -1698,7 +1698,11 @@ Perl_do_readline(pTHX)
>                 }
>             }
>             for (t1 = SvPVX_const(sv); *t1; t1++)
> +#ifdef __VMS
> +               if (strchr("*%?", *t1))
> +#else
>                 if (strchr("$&*(){}[]'\";\\|?<>~`", *t1))
> +#endif
>                         break;
>             if (*t1 && PerlLIO_lstat(SvPVX_const(sv), &PL_statbuf) <
> 0) {
>                 (void)POPs;             /* Unmatched wildcard?  Chuck
> it... */

You did commit that code, but the ticket was never closed:

#####
commit b51c3e77dbb7e510319342a73163b3fbb59baf5a
Author:     Craig A. Berry <craigberry@mac.com>
AuthorDate: Fri Mar 21 19:29:38 2014 -0500
Commit:     Craig A. Berry <craigberry@mac.com>
CommitDate: Fri Mar 21 20:55:03 2014 -0500
#####

Closing now.
-- 
James E Keenan (jkeenan@cpan.org)

---
via perlbug:  queue: perl5 status: new
https://rt.perl.org/Ticket/Display.html?id=121440



nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About