develooper Front page | perl.perl5.porters | Postings from January 2012

Re: whither study()?

Thread Previous | Thread Next
Andy Dougherty
January 31, 2012 08:02
Re: whither study()?
Message ID:
On Tue, 31 Jan 2012, demerphq wrote:

> On 31 January 2012 14:52, Andy Dougherty <> wrote:
> > On Mon, 30 Jan 2012, Ricardo Signes wrote:
> >
> >> * demerphq <> [2012-01-29T04:41:33]

> >> > Does anybody have any examples where it actually makes a difference?
> >>
> >> I second that question, but I only care if the difference is the kind of thing
> >> we want to keep around. ;)
> >
> > Yes, I've used it, and yes it has typically made a difference (around 5%
> > last several times I benchmarked it).  However, I've only used it for
> > simple patterns of straight ASCII text.  I haven't run into any corner
> > cases or subtle bugs, but I haven't stressed it too much either.
> If you have a case you can share I would really like to see it. My
> thinking is that other strategies might provide better results.

metaconfig (used to generate perl's Configure) uses study to some 
advantage.  Without study, a metaconfig run takes 58s on my system.  With 
study, it only takes 48s.

metaconfig makes a list of patterns (symbols it knows about) and looks 
through every file in the perl distribution looking for each of those 
symbols.  Abstracting what it does a bit, I've used the following program 
over the past many years to track the use of study.  I append it here for 
whatever it's worth.  Making the different patterns truly different would 
make a fairer test, but this is what I cooked up all those years ago.

# undef $/;
$search  = "while (<>) { \n";
$search .= "study;\n";
$patt = "abcdefghijklmnopqrstuvwxyz0000";
for ($i = 0; $i < 250; $i++) {
    $search .= 'print "$ARGV: $_" if ' . "/$patt/;\n";
$search .= "}\n";
# print $search;
eval $search;

I tried both 4.036 and 5.000, both with and without the study.
I also tried adding an undef $/; to the beginning of the program.
(metaconfig uses over 500 patterns, but I ran out of patience:-)

A typical command line is (in the perl5.000 directory)
	time perl4.036 *.c *.h

Here are the results:

Perl           Study?   Slurp?    user time (sec)
perl4.036        No        No        344
perl4.036       Yes        No        220
perl5.000        No        No        680
perl5.000       Yes        No        675
perl4.036        No        Yes        25
perl4.036       Yes        Yes         8
perl5.000        No        Yes        26
perl5.000       Yes        Yes        26


These differences in performance on a basic pattern extraction problem are
a bit surprising.   It's especially puzzling that the study doesn't 
seem to buy much for perl5.000.  

Yes, I realize that slurping in paragraphs or whole files runs much faster
-- unless you run out of memory, in which case it won't run at all:-(.  In
the interest of avoiding arbitrary limits, I usually use the default
line-at-a-time style -- that way some critical job won't bomb the night
before a presentation with "Out of memory!". 

Yes, I also realize that perl4.036 is optimized, while perl5.000 is 
generally not.  Still, I hope it's helpful to identify places where it's 
worth the effort to optimize.

    Andy Dougherty

Update:  June 23, 2006

Perl           		Study?   user time (sec)
perl5.8.4-thread-multi	 Yes	 57
perl5.8.4-thread-multi	  No	 60

Update:  May 2, 20111
	time ./perl `awk '{print $1}' MANIFEST`  > /dev/null

Perl           Study?   Slurp?    user time (sec)
perl5.14.0-RC1   No        No        46.8
perl5.14.0-RC1   Yes       No        44.5
perl5.14.0-RC1   No        Yes        3.23
perl5.14.0-RC1   Yes       Yes        0.66
Thread Previous | Thread Next Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at | Group listing | About