develooper Front page | perl.perl6.internals | Postings from January 2001

This is PDD #1--a high-level overview of the perl system

Thread Next
Dan Sugalski
January 3, 2001 12:50
This is PDD #1--a high-level overview of the perl system
Message ID:
Here's PDD #1, the first of the internals perl documents. (Bcc'd to the
RFC librarian, so he doesn't get a zillion replies)

----Cut here----
=head1 TITLE

A high-level overview of the perl system

=head1 VERSION

=head2 CURRENT

    Maintainer: Dan Sugalski
    Class: Meta
    PDD Number: 1
    Version: 1
    Status: Developing
    Last Modified: 02 January 2001
    PDD Format: 1
    Language: English

=head2 HISTORY

None--this is the first version

=head1 CHANGES

None. (Yet...)


This PDD provides a high-level overview of the perl system. 


=head2 Major components

The perl system generally looks like this:

|		    Embedding App		     |
|      	   |	        |	      | 	     |
|  parser <-> compiler <-> optimizer <-> interpreter |
|	   |	       	| 	      |	  	     |
|		 Extensions to perl		     |

=item Parser

The parser takes source code of some sort (presumably perl source, but
we're not picky--if you want to write a parser module that takes C,
Python, or klingon that's OK with us) and creates a syntax tree of
that source.

The parser module is designed to be extended both with perl and
compiled languages, and much of the parser is written in perl. (This
is the plan, at least) Generally there will be one parser, though
there's no reason that there can't be multiple independent

=item Bytecode compiler

The bytecode compiler module takes a syntax tree from the parser and
emits an unoptimized stream of bytecode. This code is suitable for
passing straight to the interpreter, though it is probably not going
to be very fast.

=item Optimizer

The optimizer module takes the bytecode stream from the compiler and
optionally the syntax tree the bytecode was generated from, and
optimizes the bytecode.

=item Interpreter

The interpreter module takes the bytecode stream from either the
optimizer or the bytecode compiler and executes it. There must always
be at least one interpreter module available for any program that can
handle all of perl, since it's required for use statements and BEGIN

While there must be at least one interpreter, there may be multiple
interpreter modules linked into an executable. This would be the case,
for example, for programs that produced Java bytecode, where one of
the interpreter modules would take the bytecode stream and spit out
java bytecode instead of interpreting it.

=head2 Independent subsystems

Perl also has a number of subsystems that are independent of any
single module.

=item PerlIO subsystem

The PerlIO subsystem provides source- and platform-independent
asynchronous I/O to perl. With this, perl 6 is independent of C's
stdio system. (And good riddance--it sucks) How this maps to an OS'
underlying I/O code is not generally perl's concern, and a platform
isn't obligated to provide asynchronous I/O.

Additionally, the PerlIO subsystem allows a program to push filters
onto an input stream if necessary, to manipulate the data before it is
presented to a perl program. 

=item Regex engine

The regular expression engine's somewhat decoupled from the guts of
perl. Its job is to turn regexes into objects, and apply those regex
objects to strings. 

=head2 API levels

=item Embedding

The embedding API is the set of calls exported to the embedding
application. This is a small, simple set of calls, requiring minimum
effort to use. 

The goal is to provide an interface that a competent programmer who is
uninterested in perl can use to provide access to a perl interpreter
within another application with very little programming or
intellectual effort. Generally it should take less than thirty minutes
for a simple interface, though more complete integration will take

Backwards binary compatibility at this level is guaranteed across the
life of perl 6.

=item Extensions

The extension API is the set of calls exported to perl
extensions. They provide access to most of the things an exension
needs to do, while hiding the implementation details. (So that, for
example, we can change the way scalars are stored without having to
rewrite, or even recompile, an extension)

Binary compatibility is a serious goal, though it may be broken if
absolutely necessary.

=item Guts

The guts-level APIs are the routines used within a component. These
aren't guaranteed to be stable, and shouldn't be used outside a
component. (For example, an extension to the interpreter shouldn't
call any of the parser's internal routines)

No binary compatibility is guaranteed, and routines here may be
changed without notice.


One of the explicit goals of perl 6 is to generate Java bytecode and
.NET code, as well as to run on small devices such as the Palm. The
modular nature of perl 6 makes this reasonably straightforward.

=item Perl for small platforms

For small platforms, the parser, compiler, and optimizer modules are
replaced with a small bytecode loader module which reads in perl
bytecode and passes it to the interpreter for execution. No string
eval, do, use, or require is available, though loading of precompiled
modules via do, use, or require may be supported.

=item Bytecode compilation

One straightforward use of modular perl is to precompile perl source
into bytecode and save it for later use. This is easily done by having
a second interpreter module. The standard perl interpreter is used
during compilation to evaluate BEGIN blocks and suchlike things, but a
simple freeze-to-disk module is used when mainline execution
begins. Then, rather than executing the bytecode, it gets frozen to
disk for later loading.

=item Perl in, Java (or whatever) out

This is a variant of the bytecode compilation. Instead of freezing the
bytecode to disk, it's instead translated to something else. That
something could be Java bytecode or .NET code, or an executable of
some sort. Perl could also be a front end to other modular compilers
such as gcc or Compaq's GEM compiler system.

=item Standalone pieces

Each piece of perl can, with enough support hidden away (in the form
of an interpreter for the parsing module, for example), stand on its
own. This means it's feasable to have separate executables that parse
perl to a syntax tree, turn a syntax tree into bytecode, optimize the
bytecode, and execute the bytecode.

This allows us to develop pieces independently--the first version of
the parser, for example, can be written mainly in perl 5 using an
embedded interpreter. It also means we can have a standalone optimizer
which can spend a lot of time grovelling over bytecode, far more than
you might want to devote to optimizing one-liners or code that'll run
only once or twice.

=item The perl assembler

The parser and bytecode compiler can be replaced with a unit that will
eat a textual representation of the bytecode--essentially a perl
assembler. This can be useful in a number of ways, allowing programs
to emit perl bytecode without having to know the gory details of the
binary interface, or in fact having perl immediately available at
all. (It also means we can cobble up real perl programs without having
a full parser built yet, though that's more an issue of initial
implementation than anything else)

Thread Next Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at | Group listing | About