develooper Front page | perl.perl5.porters | Postings from April 2008

Re: Make built-in list functions continuous

Thread Previous | Thread Next
From:
David Nicol
Date:
April 11, 2008 15:26
Subject:
Re: Make built-in list functions continuous
Message ID:
934f64a20804111525w163d6b41lb3a46bae4581c6ae@mail.gmail.com
wren argetlahm has one > below:

I believe map is used for efficiency of programmer expression
rather than run time

>  It was mainly something I ran into while optimizing some data-mining code a
> little while back. The margin for speedups is quite dramatic for large
> arrays, and simple cases can be dealt with by fairly trivial rewrites. I'm
> not sure how much more involved a comprehensive solution would be; the
> answer to that would depend on the approach taken (e.g. transforming the
> Perl code before running it, vs rewriting the offending functions) as well
> as the coverage for ferreting out everywhere it happens.

I believe that my Macrame.pm module is a good start in the direction of
a tool to allow specifying rewrites (i cannot judge triviality) of perl source.

>  The problem is when you have a pipeline of functions and some function
> constructs an intermediate array which will then be read and destroyed right
> away by the next function in the pipeline.

Before reading this and discovering that your optimization is to lose the
copied temporary array, which might not actually happen in a series of
chained map or grep functions, I wanted to point out that from a cache
conservation perspective, batching all the data through a series of functions
one at a time will use less cache than loading all the functions on each
record, and this is the approach that is preferred by the current implementation
(unless there are gratuitous copies, which might be a side effect of passing
arguments on the stack.  Revising to use explicit temporary arrays instead
of the stack is suggested by Nicholas Clark.)


> A simple example of this is the
> following code where the pipeline is formed by the (=) assignment function
> and the `map` function:
>
>     @xs = map { f($_) } @ys;
>
>  In that code, `map` will construct an array in memory, but when it returns
> it, the array will be copied cell by cell into a new array constructed as
> @xs. We can optimize this to remove that intermediate array by using the
> following code instead:
>
>     @xs = ();
>     push @xs, f($_) foreach @ys;
>
>  This version constructs each element of @xs and then immediately pushes it
> through the "assignment" rather than building them up in memory first.

This transformation could be effected in current Macrame macro language
with something very similar to if not exactly

    cat > cmap.pm <<EOF
    use Filter::Macro;
    use Macrame;
    macro cmap 'my' @A { my @A; cmap @A }
    macro cmap @OUT = 'map'  '{BLOCK}' @IN {
        @OUT = ();
        for ( @IN ) {
              push @OUT, do { BLOCK }
        };
        @OUT
    }
    EOF

and the sample problem would become

   use cmap;  # introduce cmap map optimization keyword
   cmap @xs = map { f($_) } @ys;

(might be able to drop the keyword after adding support for infix macro
invocation to Macrame, so it can grab things to the left of the keyword.
At this revision, all grabbing is rightward, thus the keyword.)

idenfitying a situation where maps chain, so that no special keyword
would be required (although a special keyword might be a good idea, to
prevent breakage of things that have side effects designed around an assumption
that the entire early stage would run first -- something like a whole series of
macros for different combinations of map and grep, or even a recursive macro
generator that can produce n-long combinations inductively (Macrame is
capable of such things, see below)

Here cmap is like map but it runs its block in scalar context and
the block is expected to provide one output for each input, also the
array expression needs to be in parentheses. (http://tinyurl.com/6aerta)

    macro cmap '{BLOCK}' 'cmap' '{BLOCK2}'  cmap '{BLOCK3}' ARREXPR {
       do {
          my @_MAP_MACRO_RESULT;
          for (ARREXPR) {
             $_ = do {BLOCK};
             $_ = do {BLOCK2};
             push @_MAP_MACRO_RESULT, do {BLOCK3};
           }
           @_MAP_MACRO_RESULT
       }
    }


    macro cmap '{BLOCK}' 'cmap' '{BLOCK2}'  ARREXPR {
       do {
          my @_MAP_MACRO_RESULT;
          for (ARREXPR) {
             $_ = do {BLOCK};
             push @_MAP_MACRO_RESULT, do {BLOCK2};
           }
           @_MAP_MACRO_RESULT
       }
    }


To allow BLOCK to return zero or more than one result, we have
to check for that, re-introducing the temporaries (another reason
continuous function doesn't optimize from standard perl -- the
functions are not well-behaved, in a math sense)

    macro map '{BLOCK}' 'map' '{BLOCK2}'  ARREXPR {
       do {
          my @_MAP_MACRO_RESULT;
          for (ARREXPR) {
             for (BLOCK2) {
                 push @_MAP_MACRO_RESULT, do {BLOCK};
           }
           @_MAP_MACRO_RESULT
       }
    }

The cmap approach taken below could be used to create ever-deeper
nestings of these, uh, delivering the desired optimization as a source filter.


recursive macro generators capable of handling N repitions of syntax
are possible but tricky.  one approach is to define for one more rep
than you have, to redefine that number of reps, and also to define a new
generator containing one more rep.

    sub MACRAME::Expand::cmap_macro($){
        my $reps = shift; my $next = $reps + 1;

        my $signature = join "}' 'cmap' '{B", 1..$reps;

        my $operations = join "\n", map {
                   "\$_ = {B$_};"
        } 1..($reps-1);

        my $regurg = join '} cmap {B', 1 .. $reps;

    <<MACROMACROMACRO
    macro cmap '{B$signature}' 'cmap' {
        EXPAND cmap_macro($next);
        cmap {B$regurg} cmap
    }
    macro cmap '{B$signature}' ARREXPR {
       do {
          my \@_MAP_MACRO_RESULT;
          for (ARREXPR) {
             $operations
             push \@_MAP_MACRO_RESULT, do {B$reps};
           }
           \@_MAP_MACRO_RESULT
       }

    }
    MACROMACROMACRO
    }


   macro cmap '{B1}' 'cmap' '{B2}'  'cmap' '{B3}'  'cmap' {
         EXPAND cmap_macro(4);
         cmap {B1} cmap {B2} cmap {B3} cmap
   }



I'm sure there are bugs above -- it is, after all,  untested perl code that is
intended as input for a perl engine that produces perl code, including
some perl code that produces both perl code and input to said engine :)

The idea is, a long string of cmap { block } cmap ... will trigger all the
expansions, and each expansion creates the cmap macro at that level
and also another expander that handles one more cmap in the chain.

How do I get SOC to fund the polishing of Macrame?

Thread Previous | Thread Next


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About