Front page | perl.perl5.porters |
Postings from April 2008
Re: Make built-in list functions continuous
Thread Previous
|
Thread Next
From:
Nicholas Clark
Date:
April 1, 2008 06:50
Subject:
Re: Make built-in list functions continuous
Message ID:
20080401134947.GY79799@plum.flirble.org
On Tue, Apr 01, 2008 at 02:36:37AM -0400, wren argetlahm wrote:
> Nicholas Clark wrote:
> >I was wondering if you had more detail on this, as it's
> > not something currently in the core's TODO file:
> >
> > http://mirrors.develooper.com/perl/APC/perl-current/pod/perltodo.pod
Thanks for the excellent explanation.
> It was mainly something I ran into while optimizing some data-mining
> code a little while back. The margin for speedups is quite dramatic for
> large arrays, and simple cases can be dealt with by fairly trivial
> rewrites. I'm not sure how much more involved a comprehensive solution
> would be; the answer to that would depend on the approach taken (e.g.
> transforming the Perl code before running it, vs rewriting the offending
> functions) as well as the coverage for ferreting out everywhere it happens.
>
> The general genre of this sort of optimization is called
> "deforestation". If you're familiar with Haskell or other ML-like
I'm not personally, but given that there are over 400 subscribers to the
list, I suspect than there are quite a few people who are.
> @xs = map { f($_) } @ys;
>
> In that code, `map` will construct an array in memory, but when it
> returns it, the array will be copied cell by cell into a new array
> constructed as @xs. We can optimize this to remove that intermediate
> array by using the following code instead:
>
> @xs = ();
> push @xs, f($_) foreach @ys;
Interesting. I think that that would work in the general case for Perl 5 code.
(How easy it is to implement is later)
> A different approach would be to rewrite the original code to use a
> different (=) function which just makes @xs an alias to the intermediate
> array returned by map, rather than copying it. This would be along the
> lines of the lexical aliasing task on the perltodo, but it doesn't
> generalize. For example, when we have longer pipelines and so the
I don't think that that would generalise in the current Perl 5 implementation
without re-writing (others need to correct me if I've missed something)
Ops push results onto the Perl stack, and are coded to do so directly, rather
than having the destination passed in.
I don't think (but I'm not certain) that the current op code would facilitate
a (non-tied) array being aliased to a new stack frame.
Whilst we don't define the execution order of various types of Perl code,
there is only one interpreter, and it's always run the grep to completion
before starting the map, so there may be code that relies on that.
> We can transform the majority of the offending cases into foreach-loops
> which capture the fact that these chains of functions are actually
> forming a pipeline through which we're funneling the data streaming from
> @ys. But these rewrites aren't very amenable to using the Filter modules
> since the first argument to `grep` and `map` are blocks, so:
>
> @xs = map { f1($_) ; f2($_) ;...; fn($_) }
> grep { p1($_) ; p2($_) ;...; pn($_) } @ys;
>
> ...becomes:
>
> foreach (@ys) {
> p1($_);
> p2($_);
> ...
> if (pn($_)) {
> f1($_);
> f2($_);
> ...
> push @xs, fn($_);
> }
> }
In the general case, I think that that would be better as
foreach (@ys) {
{
p1($_);
p2($_);
...
}
if (pn($_)) {
f1($_);
f2($_);
...
push @xs, fn($_);
}
}
so that the scope of the grep is exited before the scope of the map is
entered, because that preserves at least some of the code execution ordering
of the current example.
> Because of this ugliness, the rewrite needs to take place within the
> parsing/compilation pass in order to have access to the AST.
Which gets tricky, because there isn't an AST in the perl 5 interpreter.
(As I understand it, and this is not my strong point) the tokeniser and
lexer spit out an optree directly, having already done some transformations.
Further necessary transformations are performed by the same functions as
are actually also performing some true optimisations (eg the ck_* functions
that "check" ops as they are built, and the peephole optimiser)
Constant folding is performed as the optree is assembled.
> Granted, that's something of a contrived example, but you get the idea.
> There's also a related optimization which takes the composition of maps
> as the map of composition. I.e. these two lines are equivalent, but the
> latter is more efficient:
>
> @xs = map {f($_)} map {g($_)} @ys;
> @xs = map {$_ = f($_); g($_)} @ys;
I think you made a typo and meant
@xs = map {$_ = g($_); f($_)} @ys;
given the right to left nature of the map pipeline. But in the Perl general
case, I don't think that the rewrite is equivalent, as the context of &g is
wrong
@xs = map {my @a = g($_); f(@a)} @ys;
except that isn't correct in the general case either, as the map pipeline
calls &f many times with exactly one argument, so I think it ends up as
@xs = map {my @a = g($_); my @b = f($_) foreach @a; @b} @ys;
(which at least avoids needing a full length temporary list, and would combine
with your previous re-write to become
my @xs;
foreach my $y (@ys) {
push @xs, f($_) foreach g($y);
}
> One method of implementing continuous functions is to use a lazy array
> like mentioned above; though integrating this into an otherwise non-lazy
> language can require thought. Another approach is to actually spawn off
I think that this would be too tricky in the current Perl 5 internals.
> threads and pass messages between them; while nicely parallel for
> multi-core folks, usually the overhead is too much unless the functions
> are doing some heavy computations (using a thread pool can reduce the
> overhead somewhat). Yet another approach is to use continuation passing
I don't think that this is viable, as Perl allows the functions called to
manipulate global state. If those functions were in separate threads, then
that means that global state has to be shared, which ithreads can't do (and
5005threads failed), so a pre-requisite for this is solving threading; not
a trivial problem.
> in order to capture the flow of control; how nice this approach is
> depends on the implementation of closures. And of course there are other
> approaches as well.
I'm not sure that this is viable, as transparently taking a continuation from
inside a function means deciding and implementing the semantics of the
interaction between lexical and dynamic scopes
sub foo {
local $bar::baz = glurpp();
...
foreach (@_) {
# presumably we might take a continuation here
...
}
}
what happens to the value of $bar::baz whilst we temporarily jump outside the
function foo()?
> Going all the way and implementing continuous functions would be
> awesome, but it'd also be a good deal of work which might be better
> spent elsewhere. But even just doing the rewrites can improve
> performance for data-heavy programs by 160~200% (== +60~+100%), even
> though it doesn't catch everything. For someone who knows how the
> parsing/compilation sweep works (or can quickly learn) so they can to
> manipulate the ASTs/opcode tree, adding the rewrites should be about the
> right size for a Google Summer of Code project.
I suspect that the number of people who have good knowledge of the entire
parsing/compilation code worldwide is under 10. I suspect that I count as
"fast learner" for the purposes of this exercise, and it took me an entire
full time week to figure out how to optimise `reverse sort` and `foreach
(reverse @zam)` at the optree level, and I think a day for cutting [] and {}
construction from 3 ops to 1. I don't know how long it took Dave to
implement in-place sort (`@glurp = sort @glurp`), or Yves to implement a
faster `if (%hash)` but these are the only optree level optimisations that
I'm aware of in the past 3 years. I didn't find it easy at all.
The re-writes certainly look interesting, and I can't see any scope related
reasons why they would fail in the general case, but I suspect that actually
implementing them in C from the ground up right now would even tax Robin
Houston (who knows the optree well, and now has a PhD in computer science)
What might be a much better start is to get optimize and B::Generate to the
state where it's viable to write the optree re-writers themselves in pure
Perl, and that way get the bugs out of them (and see where the real wins are).
That would make it easier for more people to prototype things in Perl, and
much reduce the amount of human time needed to write C, just down to
"re-implement this", instead of "prototype, develop and debug".
Nicholas Clark
Thread Previous
|
Thread Next