develooper Front page | perl.perl5.porters | Postings from April 2006

code references in @INC and source filters

Thread Next
From:
Nicholas Clark
Date:
April 14, 2006 11:59
Subject:
code references in @INC and source filters
Message ID:
20060414185850.GM48456@plum.flirble.org
It's documented that you can put code references in @INC.
It's documented that they are called with the prospective file name, and if
they return a file handle, that file handle is used to read the source code.

What's not documented, and has been true since they were first added in 5.6
is that they can do slightly more than that. I think it's not been documented
because it's assumed that it's not quite finished. Certainly, there's a
comment in pp_ctl.c:

    /* I was having segfault trouble under Linux 2.2.5 after a
       parse error occured.  (Had to hack around it with a test
       for PL_error_count == 0.)  Solaris doesn't segfault --
       not sure where the trouble is yet.  XXX */


From checking the source to pp_require, I believe that the things you can
usefully return from your coderef in @INC are

1: Just a file handle. Strictly, needs to be either a typeglob or a reference
   to a typeglob. (IO objects are blessed typeglob references, so that's fine)
   The typeglob has to have a real file handle. Tied handles don't work.

2: (file handle, code reference)
   The code reference is used a simple source filter on each read of the
   file handle. (The tokeniser reads in lines, but other filters are free to
   read in blocks)

3: (file handle, code reference, state)
   The third value is passed as a second parameter to the code reference, as
   described in 2.

4: (code reference)
   The code reference is called for each read. It's assumed to behave as a
   generator.

5: (code reference, state)
   The second value is passed as a second parameter to the code reference, as
   described in 4.


Potentially cases 2-5 are actually quite interesting. But source filtering
Perl is hard (and I'd argue not used by anyone sane in production code), so
it's not that useful. Filter::Simple is supposed to make it less hard (but
I'd stick to my view about sanity). For the work I proposed to TPF I said:

  Perl 5.6 added the ability to put code references into @INC, which allows
  arbitrary code to be run before creating a file handle to pass back to
  C<require>. An additional undocumented feature was the ability to pass back
  a source filter as a second return value. The code for this filter exists in
  core, with comments that suggest that some bugs are unresolved. The
  functionality is also provided by Filter::Simple, which entered the core for
  5.8.0.
  
  If the existing filter functionality were migrated to Filter::Simple, then
  the duplicate code can be removed from the C source, reducing size and
  maintenance liability.
  
  These changes will be merged back to 5.8.x

This is an expansion of what's in perltodo.pod:

=head2 @INC source filter to Filter::Simple
  
The second return value from a sub in @INC can be a source filter. This isn't
documented. It should be changed to use Filter::Simple, tested and documented.

=cut

I'm not actually sure if this exact approach is viable. Not being a
Filter::Simple user myself, I've been RTFM, and it seems that the main power
comes from being able to call FILTER_ONLY with optional parameters. I can't
see how to cleanly fit that onto the return values from a code reference.
Moreover, the public API to Filter::Simple doesn't really support this, as
it's designed to be put into your own package and your package used as a
filter, like this:

=head1 SYNOPSIS

 # in MyFilter.pm:

     package MyFilter;

     use Filter::Simple;
     
     FILTER { ... };

     # or just:
     #
     # use Filter::Simple sub { ... };

 # in user's code:

     use MyFilter;

     # this code is filtered

     no MyFilter;

     # this code is not

=cut


So, actually it looks like what you really want to be able to do is have a
code reference that injects the 1 line "use MyFilter;\n" into the module
loaded from disk, then supplies all lines from disk unchanged, letting
Filter::Simple (as configured via MyFilter) do the real work. Something that
works like this:

$ cat Shout.pm
package Shout;

use Filter::Simple;

FILTER_ONLY (string => sub { $_ =~ tr/a-z/A-Z/; });

1;
$ cat M.pm
package M;

sub z {
    $c = "Hello world$/";
    print $c;
}

1;
__END__
$ cat M1.pm
package M1;

sub z {
    $c = "Hello world$/";
    print $c;
}

1;
__END__
$ cat testfilter.pl
use warnings;
use strict;

BEGIN {
    unshift @INC, sub {
        return unless $_[1] =~ /^[A-Za-z]\.pm$/;
        my $fh;
        foreach (grep {!ref} @INC) {
            open $fh, "$_/$_[1]" and last;
        }
        return unless $fh;
        my $state
            = q{
                use Shout;
               };
        (sub {
            return -1 unless defined $_[1];
            if (!ref $_[1]) {
                $_[1] =~ m/([^\n]*)\n?/gs;
                $_ = $1;
                if (defined pos $_[1]) {
                    return 1;
                }
                $_[1] = $fh;
            }
            $! = 0;
            $_ = readline $_[1];
            # $! will be set on error, 0 if not. 0 return is EOF, -ve is error
            defined $_ ? 1 : -$!;
        }, $state);
    }
}

use M;
use M1;
M::z;
M1::z;
__END__
$ perl testfilter.pl
HELLO WORLD
Hello world


only a lot less verbose, without the whole (sub {...}, $state) complexity,
and not ignoring $_[0] (block mode/line mode)

So I was wondering if we should document the current functionality, and
add a new feature - if you return a scalar reference (instead of a code
reference), then that scalar is treated as $state is above, and fed line by
line downstream. If you return only a scalar reference then it behaves like
the generator and upon exhaustion it's "EOF", whereas if you return a list
with (file handle, scalar reference) it's fed in first, then lines are read
from the file handle.

This would make it very simple to generate virtual files from a code entry
in @INC, and to brute force add filters onto existing files.

There's one feature it might be nice to have that this doesn't let you do -
have a way of returning a stack of source filters that subsequently get
applied to the file handle returned by something later on in @INC.
Maybe that's what an array reference return could be for?

Thoughts? (Aside from "that was long")

Nicholas Clark

Thread Next


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About