Front page | perl.cvs.parrot |
Postings from December 2008
[svn:parrot] r33678 - trunk/docs/book
December 8, 2008 16:21
[svn:parrot] r33678 - trunk/docs/book
Message ID: 20081209002059.4109CCB9AF@x12.develooper.com
Date: Mon Dec 8 16:20:58 2008
New Revision: 33678
[Book] small updates to chapter 11, and adding some content to chapter 12
--- trunk/docs/book/ch12_opcodes.pod (original)
+++ trunk/docs/book/ch12_opcodes.pod Mon Dec 8 16:20:58 2008
@@ -4,107 +4,224 @@
-The smallest executable component is not the compilation unit or even the subroutine,
-but is in fact the opcode. Opcodes in PASM, like opcodes in other assembly languages,
-are individual instructions that implement low-level operations in Parrot N<In the
-world of microprocessors, the word "opcode" typically refers to the numeric identifier
-for each instructions. The human-readable word used in the associated assembly language
-is called the "mnemonic". An assembler, among other tasks, is responsible for converting
-mnemonics into opcodes for execution. In Parrot, instead of referring to an instruction
-by different names depending on what form it's in, we just call them all "opcodes">. Of
-course the list of things that qualify as "low-level" in Parrot can be pretty advanced
-compared to the functionality supplied by regular assembly language opcodes.
+The smallest executable component is not the compilation unit or even
+the subroutine, but is in fact the opcode. Opcodes in PASM, like opcodes
+in other assembly languages, are individual instructions that implement
+low-level operations in Parrot N<In the world of microprocessors, the
+word "opcode" typically refers to the numeric identifier for each
+instructions. The human-readable word used in the associated assembly
+language is called the "mnemonic". An assembler, among other tasks, is
+responsible for converting mnemonics into opcodes for execution. In
+Parrot, instead of referring to an instruction by different names
+depending on what form it's in, we just call them all "opcodes">. Of
+course the list of things that qualify as "low-level" in Parrot can be
+pretty advanced compared to the functionality supplied by regular
+assembly language opcodes.
-Before we talk about opcodes, we have to a little bit of talking about the various
-runcores that invoke them.
+Before we talk about opcodes, we have to a little bit of talking about
+the various runcores that invoke them.
-During execution, the runcore is like the heart of Parrot. The runcore controls calling
-the various opcodes with the correct data, and making sure that program flow moves
-properly. Some runcores, such as the I<precomputed C goto runcore> are optimized for
-speed and don't perform many tasks beyond finding and dispatching opcodes. Other runcores,
-such as the I<GC-Debug>, I<debug> and I<profiling> runcores help with typical software
-maintenance and analysis tasks. Different runcores, because of the way they are structured,
-require the opcodes to be compiled into different forms. Because of this, understanding
-opcodes first requires an understanding of the Parrot runcores.
+During execution, the runcore is like the heart of Parrot. The runcore
+controls calling the various opcodes with the correct data, and making
+sure that program flow moves properly. Some runcores, such as the
+I<precomputed C goto runcore> are optimized for speed and don't perform
+many tasks beyond finding and dispatching opcodes. Other runcores,
+such as the I<GC-Debug>, I<debug> and I<profiling> runcores help with
+typical software maintenance and analysis tasks. We'll talk about all
+of these throughout the chapter.
+Different runcores, because of the way they are structured, require the
+opcodes to be compiled into different forms. Because of this,
+understanding opcodes first requires an understanding of the Parrot
=head3 Types of Runcores
-Parrot has multiple runcores. Some are useful for particular maintenance tasks, some are
-only available as optimizations in certain compilers, some are intended for general use,
-and some are just interesing flights of fancy with no practical benefits. One runcore that
-we've already seen is the debugging runcore which prompts the user for commands between
-executing each opcode. Another valuable maintenance runcore is the GC dubug core (which runs a
-full sweep of the garbage collector between each opcode).
+Parrot has multiple runcores. Some are useful for particular maintenance
+tasks, some are only available as optimizations in certain compilers,
+some are intended for general use, and some are just interesing flights
+of fancy with no practical benefits. Here we list the various runcores,
+their uses, and their benefits.
=item* Slow Core
-The slow core is a basic runcore design that treats each opcode as a separate function
-at the C level. Each function is called, and returns the address of the next opcode
-to be called by the core. The slow core performs bounds checking to ensure that the next
-opcode to be called is properly in bounds. Because of this modular approach where opcodes
-are treated as separate executable entities many other runcores, especially diagnostic and
-maintenance cores are based on this design.
+The slow core is a basic runcore design that treats each opcode as a
+separate function at the C level. Each function is called, and returns
+the address of the next opcode to be called by the core. The slow core
+performs bounds checking to ensure that the next opcode to be called is
+properly in bounds, and not somewhere random in memory. Because of this
+modular approach where opcodes are treated as separate executable
+entities many other runcores, especially diagnostic and maintenance
+cores are based on this design.
=item* Fast Core
-The fast core is a bare-bones core that doesn't do any of the bounds-checking or context
-updating that the slow core does.
+The fast core is a bare-bones core that doesn't do any of the
+bounds-checking or context updating that the slow core does. The fast
+core is the way Parrot should run, and is used to find and debug places
+where execution strays outside of it's normal bounds.
=item* Computed Goto Core
-I<Computed Goto> is a feature of some C compilers where a label is treated as a piece of
-data that can be stored in an array. Each opcode is simply a label in a very large
-function, and the labels are stored in an array. Calling an opcode is as easy as taking
-that opcode's number as the index of the label array, and calling the associated label.
-Sound complicated? It is a little, especially to C programmers who are not used to these
-kinds of features, and who have been taught that the C<goto> keyword is to be avoided.
+I<Computed Goto> is a feature of some C compilers where a label is
+treated as a piece of data that can be stored in an array. Each opcode
+is simply a label in a very large function, and the labels are stored
+in an array. Calling an opcode is as easy as taking that opcode's number
+as the index of the label array, and calling the associated label.
+Sound complicated? It is a little, especially to C programmers who are
+not used to these kinds of features, and who have been taught that the
+C<goto> keyword is to be avoided.
-As was mentioned earlier, not all compilers support computed goto, which means that this
-core will not be built on platforms that don't support it.
+As was mentioned earlier, not all compilers support computed goto, which
+means that this core will not be built on platforms that don't support it.
=item* Precomputed Goto Core
-Thought the Computed Goto core was hard enough to understand? Precomputed goto takes the
-concept a little further.
+Thought the Computed Goto core was hard enough to understand? Precomputed
+goto takes the concept a little further.
=item* Tracing Core
=item* Profiling Core
+The profiling core analyzes the performance of Parrot, and helps to
+determine where bottlenecks and trouble spots are in the programs that
+run on top of Parrot.
=item* GC Debug Core
+Parrot's garbage collector has been known as a weakness in the system
+for several years. In fact, the garbage collector and memory management
+subsystem was one of the last systems to be improved and rewritten before
+the release of version 1.0. It's not that garbage collection isn't
+important, but instead that it was so hard to do earlier in the project.
+Early on when the GC was such a weakness, and later when the GC was under
+active development, it was useful to have an operational mode that would
+really exercise the GC and find bugs that otherwise could hide by sheer
+chance. The GC debug runcore was this tool. The core executes a complete
+collection iteration between every single opcode. The throughput
+performance is terrible, but that's not the point: it's almost guaranteed
+to find problems in the memory system if they exist.
=item* Debug Core
+The debug core works like a normal software debugger, such as GDB. The
+debug core executes each opcode, and then prompts the user to enter a
+command. These commands can be used to continue execution, step to the
+next opcode, or examine and manipulate data from the executing program.
-Opcodes are the smallest logical execution element in Parrot. An individual opcode
-corresponds, in an abstract kind of way, with a single machine code instruction
-for a particular hardware processor architecture. The difference is that Parrot's
-opcodes can perform some very complex tasks. Also, Parrot's opcodes can be dynamically
-loaded in from a special library file called a I<dynop library>. We'll talk about
-dynops a little bit later
+Opcodes are the smallest logical execution element in Parrot. An
+individual opcode corresponds, in an abstract kind of way, with a single
+machine code instruction for a particular hardware processor
+architecture. The difference is that Parrot's opcodes can perform some
+very complex and high-level tasks. Also, Parrot's opcodes can be
+dynamically loaded in from a special library file called a I<dynop
+library>. We'll talk about dynops a little bit later.
=head3 Opcode naming
+To the PIR and PASM programmers, opcodes appear to be polymorphic. That
+is, some opcodes appear to have multiple argument formats. This is just an
+illusion, however. Parrot opcodes are not polymorphic, although certain
+features enable it to appear that way. Different argument list formats
+are detected during parsing and translated into separate, and unique,
=head3 Opcode Multiple Dispatch
=head2 Writing Opcodes
-Writing Opcodes, like writing PMCs, is done in a C-like language which is later
-compiled into C code by the X<opcode compiler> opcode compiler. The opcode script
-represents a thin overlay on top of ordinary C code: All valid C code is valid
-Opcode script. There are a few neat additions that make writing Opcodes easier.
+Writing Opcodes, like writing PMCs, is done in a C-like language which is
+later compiled into C code by the X<opcode compiler> opcode compiler. The
+opcode script represents a thin overlay on top of ordinary C code: All
+valid C code is valid opcode script. There are a few neat additions that
+make writing opcodes easier. This script is very similar to that used to
+define PMCs. The C<INTERP> constant, for instance, is always available
+in the opcodes like they are in VTABLE and METHOD declarations. Unlike
+VTABLEs and METHODs, opcodes are defined with the C<op> keyword.
+Opcodes are written in files with the C<.ops> extension. The core
+operation files are stored in the C<src/ops/> directory.
=head3 Opcode Parameters
+Each opcode can take any fixed number of input and output arguments. These
+arguments can be any of the four primary data types--INTVALs, PMCs, NUMBERS
+and STRINGs--but can also be one of several other types of values including
+LABELs, KEYs and INTKEYs.
+Each parameter can be an input, an output or both, using the C<in>, C<out>,
+and C<inout> keywords respectively. Here is an example:
+ op Foo (out INT, in NUM)
+This opcode could be called like this:
+ $I0 = Foo $N0 # in PIR syntax
+ Foo $I0, $N0 # in PASM syntax
+When Parrot parses through the file and sees the C<Foo> operation, it
+converts it to the real name C<Foo_i_n>. The real name of an opcode
+is it's name followed by an underscore-separated ordered list of
+the parameters to that opcode. This is how Parrot appears to use
+polymorphism: It translates the overloaded opcode common names into
+longer unique names depending on the parameter list of that opcode. Here
+is a list of some of the variants of the C<add> opcode:
+ add_i_i # $I0 += $I1
+ add_n_n # $N0 += $N1
+ add_p_p # $P0 += $P1
+ add_i_i_i # $I0 = $I1 + $I2
+ add_p_p_i # $P0 = $P1 + $I0
+ add_p_p_n # $P0 = $P1 + $N0
+This isn't a complete list, but you should get the picture. Each different
+combination of parameters translates to a different unique operation, and
+each operation is remarkably simple to implement. In some cases, Parrot
+can even use it's multi-method dispatch system to call opcodes which are
+heavily overloaded, or for which there is no exact fit but the parameters
+could be coerced into different types to complete the operation. For
+instance, attempting to add a STRING to a PMC might coerce the string into
+a numerical type first, and then dispatch to the C<add_p_p_n> opcode. This
+is just an example, and the exact mechanisms may change as more opcodes
+are added or old ones are deleted.
=head3 Opcode Control Flow
+Some opcodes have the ability to alter control flow of the program they
+are in. There are a number of control behaviors that can be implemented,
+such as an unconditional jump in the C<goto> opcode, or a subroutine
+call in the C<call> code, or the conditional behavior implemented by C<if>.
+At the end of each opcode you can call a C<goto> operation to jump to the
+next opcode to execute. If no C<goto> is performed, control flow will
+continue like normal to the next operation in the program. In this way,
+opcodes can easily manipulate control flow. Opcode script provides a
+number of keywords to alter control flow:
+=item * NEXT()
+If C<NEXT> contains the address of the next opcode in memory. You don't
+need to call C<goto NEXT()>, however, because the default behavior for
+all opcodes is to automatically jump to the next opcode in the program
+N<You can do this if you really want to, but it really wouldn't help you
+any>. The C<NEXT> keyword is frequently used in places like the C<invoke>
+opcode to create a continuation to the next opcode to return to after
+the subroutine returns.
=head2 The Opcode Compiler
[svn:parrot] r33678 - trunk/docs/book