develooper Front page | perl.perl5.porters | Postings from September 2002

[RFC] jit.pm - load modules and subroutines just in time

Thread Next
From:
Elizabeth Mattijsen
Date:
September 28, 2002 15:37
Subject:
[RFC] jit.pm - load modules and subroutines just in time
Message ID:
4.2.0.58.20020929002612.02f58ae0@mickey.dijkmat.nl
Before uploading to CPAN, I would like to ask p5p for comments on my latest 
endeavour: jit.pm.  Installable version available from:

  http://www.liz.nl/CPAN/jit-0.01.tar.gz

All the functionality described in the pod, should be working.  The pod 
follows.

Thanks in advance for any comments, remarks and flames (not too hot, please).


Liz
==============================================================================
=head1 NAME

jit - load modules and subroutines just in time

=head1 SYNOPSIS

   use jit;            # default, same as 'autoload'

   use jit 'autoload'; # export AUTOLOAD handler to this namespace

   use jit 'ondemand'; # load subroutines after __END__ when requested, default

   use jit 'now';      # load subroutines after __END__ now

   use jit ();         # same as qw(dontscan inherit)

   use jit 'dontscan'; # don't scan module until it is really needed

   use jit 'inherit';  # do NOT export AUTOLOAD handler to this namespace

=head1 DESCRIPTION

The "jit" pragma allows a module developer to give the application developer
more options with regards to optimize for memory or CPU usage.  The "jit"
pragma gives more control on the moment when subroutines are loaded and start
taking up memory.  This allows the application developer to optimize for CPU
usage (by loading all of a module at compile time and thus reducing the
amount of CPU used during the execution of an application).  Or allow the
application developer to optimize for memory usage, by loading subroutines
only when they are actually needed, thereby however increasing the amount of
CPU needed during execution.

The "jit" pragma combines the best of both worlds from L<AutoLoader> and
L<SelfLoader>.  And adds some more features.

In a situation where you want to use as little memory as possible, the "jit"
pragma (in the context of a module) is a drop-in replacement for L<AutoLoader>.
But for situations where you want to have a module load everything it could
ever possibly need (e.g. when starting a mod_perl server in pre-fork mode), the
"jit" pragma can be used (in the context of an application) to have all
subroutines of a module loaded without having to make any change to the source
of the module in question.

So the typical use inside a module is to have:

  package Your::Module;
  use jit;

in the source.  And to place all subroutines that you want to be loadable on
demand after the (first) __END__.

If an application developer decides that all subroutines should be loaded
at compile time, (s)he can say in the application:

  use jit 'now';
  use Your::Module;

This will cause the subroutines of Your::Module to all be loaded at compile
time.

=head1 MODES OF OPERATION

There are basically two places where you can call the "jit" pragma:

=head2 inside a module

When you call the "jit" pragma inside a module, you're basically enabling that
module for "just in time" loading of subroutines.  As with AutoLoader, any
subroutines that should be loaded on demand, should be located B<after> an
__END__ line.

If no parameters are specified with the C<use jit>, then the "autoload"
parameter is assumed.  Whether the module's subroutines are loaded at compile
time or on demand, is determined by the calling application.  If the
application doesn't specify anything specific, the "ondemand" keyword will
also be assumed.

=head2 inside an application

When you call the "jit" pragma inside an application, you're basically
specifying when subroutines will be loaded by "jit" enhanced modules.  As an
application developer, you can basically use two keywords: "ondemand" and
"now".

If an application does not call the "jit" pragma, the "ondemand" keyword will
be assumed.  With "ondemand", subroutines will only be loaded when they are
actually executed.  This saves memory at the expense of extra CPU the first
time the subroutine is called.

The "now" keyword indicates that all subroutines of all modules that are
enhanced with the "jit" pragma, will be loaded at compile time (thus using
more memory, but B<not> having an extra CPU overhead the first time the
subroutine is executed).

=head1 KEYWORDS

The following keywords are recognized with the C<use> command:

=head2 ondemand

The "ondemand" keyword indicates that subroutines, of modules that are enhanced
with the "jit" pragma, will only be loaded when they are actually called.

If the "ondemand" keyword is used in the context of an application, all
modules that are subsequently C<use>d, will be forced to load subroutines
only when they are actually called (unless the module itself forces a specific
setting).

If the "ondemand" keyword is used in the context of a module, it indicates
that the subroutines of that module, should B<always> be loaded when they are
actually needed.  Since this takes away the choice from the application
developer, the use of the "ondemand" keyword in module context is not
encouraged.  See also the L<now> and L<dontscan> keywords.

=head2 now

The "now" keyword indicates that subroutines, of modules that are enhanced
with the "jit" pragma, will be loaded at compile time.

If the "now" keyword is used in the context of an application, all modules
that are subsequently C<use>d, will be forced to load all subroutines at
compile time (unless the module forces a specific setting itself).

If the "now" keyword is used in the context of a module, it indicates that the
subroutines of that module, should B<always> be loaded at compile time.  Since
this takes away the choice from the application developer, the use of the
"now" keyword in module context is not encouraged.  See also the L<ondemand>
keyword.

=head2 dontscan

The "dontscan" keyword only makes sense when used in the context of a module.
Normally, when a module that is enhanced with the "jit" pragma is compiled,
the source after the __END__ is scanned for the locations of the subroutines.
This makes the compiling of modules a little slower, but allows for a faster
(initial) lookup of (yet) unloaded subroutines during execution.

If the "dontscan" keyword is specified, this scanning of the source is
skipped at compile time.  However, as soon as an attempt is made to ececute
a subroutine from this module, then first the scanning of the source is
performed, before the subroutine in question is loaded.

So, you should use the "dontscan" keyword if you are reasonably sure that you
will only need subroutines from the module in special cases.  In all other
cases it will make more sense to have the source scanned at compile time.

The "dontscan" keyword will be ignored if an application developer forces
subroutines to be loaded at compile time with the L<now> keyword.

=head2 autoload

The "autoload" keyword only makes sense when used in the context of a module.
It indicates that a generic AUTOLOAD subroutine will be exported to the
module's namespace.  It is selected by default if you use the "jit" pragma
without parameters in the source of a module.  See also the L<inherit> keyword
to B<not> export the generic AUTOLOAD subroutine.

=head2 inherit

The "inherit" keyword only makes sense when used in the context of a module.
It indicates that B<no> AUTOLOAD subroutine will be exported to the module's
namespace.  This can e.g. be used when you need to have your own AUTOLOAD
routine.  That AUTOLOAD routine should then contain:

  $jit::AUTOLOAD = $sub;
  goto &jit::AUTOLOAD;

to access the "jit" pragma functionality.  Another case to use the "inherit"
keyword would be in a sub-class of a module which also is "jit" enhanced.
In that case, the inheritance will cause the AUTOLOAD subroutine of the base
class to be used, thereby accessing the "jit" pragma automagically (and hence
the naming of the keyword of course).  See also the L<autoload> keyword to
have the module use the generic AUTOLOAD subroutine.

=head1 DIFFERENCES WITH SIMILAR MODULES

There are a number of (core) modules that more or less do the same thing as
the "jit" pragma.

=head2 AutoSplit / AutoLoader

The "jit" pragma is very similar to the AutoSplit / AutoLoader combination.
The main difference is that the splitting takes place when the "jit" import
is called in a module and that there are no external files created.  Instead,
just the offsets and lengths are recorded in a hash (when "ondemand" is active)
or all the source after __END__ is eval'led (when "now" is active).

 From a module developer point of view, the advantage is that you do not 
need to
install a module before you can test it.  From an application developer point
of view, you have the flexibility of having everything loaded now or later (on
demand).

 From a memory usage point of view, the "jit" offset/length hash takes up more
memory than the equivalent AutoLoader setup.  On the other hand, accessing the
source of a subroutine may generally be faster because the file is more likely
to reside in the operating system's buffers already.

As an extra feature, the "jit" pragma allows an application to force all
subroutines to be loaded at compile time, which is not possible with 
AutoLoader.

=head2 SelfLoader

The "jit" pragma also has some functionality in common with the SelfLoader
module.  But it gives more granularity: with SelfLoader, all subroutines that
are not loaded directly, will be loaded if B<any> not yet loaded subroutine is
requested.  It also adds complexities if your module needs to use the <DATA>
handle.  So the "jit" pragma gives more flexibility and fewer development
complexities.  And of course, an application can force all subroutines to be
loaded at compile time when needed with the "jit" pragma.

=head1 CAVEATS

Currently you may not have multiple packages in the same file, nor can you
have fully qualified subroutine names.

The parser that looks for package names and subroutines, is not very smart.
This is intentionally so, as making it smarter will make it a lot slower, but
probably still not smart enough.  Therefore, the C<package> and C<sub>'s
B<must> be at the start of a line.  And the name of the C<sub> B<must> be on
the same line as the C<sub>.

=head1 EXAMPLES

Some code examples.  Please note that these are just a part of an actual
situation.

=head2 base class

  package Your::Module;
  use jit;

Exports the generic AUTOLOAD subroutine and adheres to whatever the application
developer specifies as mode of operation.

=head2 sub class

  package Your::Module::Adapted;
  @ISA = qw(Your::Module);
  use jit ();

Does B<not> export the generic AUTOLOAD subroutine, but inherits it from its
base class.  Also implicitely specifies the "dontscan" keyword, causing the
source of the module to be scanned only when the first not yet loaded
subroutine is about to be executed.  If you only want to have the "inherit"
keyword functionality, then you must specify that explicitely:

  package Your::Module::Adapted;
  @ISA = qw(Your::Module);
  use jit 'inherit';

=head2 custom AUTOLOAD

  package Your::Module;
  use jit 'inherit';

  sub AUTOLOAD {
    if (some condition) {
      $jit::AUTOLOAD = $Your::Module::AUTOLOAD;
      goto &jit::AUTOLOAD;
    }
    # do your own stuff
  }

If you want to use your own AUTOLOAD subroutine, but still want to use the
functionality offered by the "jit" pragma, you can use the above construct.

=head2 mod_perl prefork

  use jit 'now';
  use Your::Module;

In pre-fork mod_perl applications (the default mod_perl applications before
mod_perl 2.0), it is advantageous to load all possible subroutines when the
Apache process is started.  This is because the operating system will share
memory using a process called "Copy On Write".  So even though it will take
more memory initially, that memory loss is easily evened out by the gains of
having everything shared.  Loading a not yet loaded subroutine in that
situation, will cause otherwise shared memory to become unshared.  Thereby
increasing the overall memory usage, because the amount that becomes unshared
is typically a lot more than the extra memory used by the subroutine (which
is caused by fragmentation of allocated memory).

=head2 threaded applications and mod_perl worker

  use Your::Module;

Threaded Perl applications, of which mod_perl applications using the "worker"
module are a special case, function best when subroutines are only loaded when
they are actually needed.  This is caused by the nature of the threading model
of Perl, in which all data-structures are B<copied> to each thread (essentially
forcing them to become unshared as far as the operating system is concerned).

Benchmarks have shown that the overhead of the extra CPU is easily offset by
the reduction of the amount of data that needs to be copied (and processed)
when a thread is created.

=head1 TODO

The coordinates of a subroutine in a module (start,number of bytes) are stored
in a hash in the jit namespace.  Ideally, this information should be stored in
the stash of the module to which they apply.  Then the internals that check
for the existence of a subroutine, would see that the subroutine doesn't exist
(yet), but that there is an offset and length (and implicitely, a file from
%INC) from which the source could be read and evalled.

Loading all of the subroutines should maybe be handled inside the Perl parser,
having it skip __END__ when the global "now" flag is set.

Possibly we should use the <DATA> handle from a module if there is one, or dup
it and use that, rather than opening the file again.


Thread Next


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About