develooper Front page | perl.perl6.internals | Postings from September 2005

[RFC] Debug Segment, HLL Debug Segment And Source Segment

Thread Next
From:
Jonathan Worthington
Date:
September 20, 2005 15:52
Subject:
[RFC] Debug Segment, HLL Debug Segment And Source Segment
Message ID:
00f301c5be36$00c30760$0300a8c0@SERVER
Hi,

The current format of the debug segment in Parrot packfiles (.pbc files), as 
documented in doc/parrotbyte.pod, only allows for a single source file to be 
named.  This became insufficient some time ago since we had .include 
directives; it also means that there's nothing sensible that pbc_merge can 
do with the debug segments it finds in input files.

WHAT WE HAVE NOW
Currently, we store two things:-
1) The filename of a single source file, as an additional field in the 
header
2) The line number in the source file for each bytecode instruction, as the 
segment's opcode stream

WHAT SOURCE?
The debug segment as we currently have it relates to PIR and PASM source 
files, not to high level language source files.  Currently PIR parses a 
directive that looks like this:
    #line 'filename'
This is for compilers to supply the line numbers and file names of HLL 
source files.  Currently, nothing is done with these directives after they 
are parsed, but the data they provide should go into a seperate HLL debug 
segment.

As the needs of the PASM/PIR debug segments and the HLL debug segments would 
seem to be the same, this proposal will detail a single format that should 
work for both of them.  If it is determined that the HLL debug segment needs 
something more sophisticated, this proposal still stands for the PASM/PIR 
debug segment.

SOURCE SEGMENTS
This is currently mentioned in parrotbyte.pod; the idea would seem to be 
that this segment can contain source code.  I suspect the intention of it 
was to store the source code of high level languages rather than PASM or 
PIR.  I think the doc is correct in stating that this segment is currently 
unused.  However, in the future it likely will be, so it makes sense to 
consider its future existence now while re-designing the debug segment(s).

FORMAT PROPOSAL
The aims of the new format, intended for both the PASM/PIR debug segment and 
the HLL debug segment are:
1) Supporting multiple input files
2) Allowing for a reference into the source segment in place of a filename.
3) Still being space-efficient on disk

The opcode stream will contain one line number per bytecode instruction. No 
information as to what file that line is in will be stored in this stream. 
(This is pretty much the same as what we have now).

The header (after the standard stuff that every header has) will start with 
a count of the number of source file to bytecode position mappings that are 
in the header.

  0 (relative)
  +----------+----------+----------+----------+
  | number of source => bytecode mappings     |
  +----------+----------+----------+----------+

A source to bytecode position mapping simply states that the bytecode that 
starts from the specified offset up until the offset in the next mapping, or 
if there is none up until the end of the bytecode, has it's source in 
location X.

A mapping always starts with the offset in the bytecode, followed by the 
type of the mapping.

  0 (relative)
  +----------+----------+----------+----------+
  |              bytecode offset              |
  +----------+----------+----------+----------+

  4
  +----------+----------+----------+----------+
  |               mapping type                |
  +----------+----------+----------+----------+

There are 3 mapping types.

Type 0 means there is no source available for the bytecode starting at the 
given offset. No further data is stored with this type of mapping; the next 
mapping continues immediately after it.

Type 1 means the source is available in a file. A NULL terminated string 
containing the filename follows.

Type 2 means the source is available in a source segment. Another integer 
follows, which will specify which source file in the source segment to use.

Note that the ordering of the offsets into the bytecode must be sequential; 
a mapping for offset 100 cannot follow a mapping for offset 200, for 
example.

COMPATIBILITY
This change is incompatible with the current debug segment format.  But 
that's OK, we're still in development.

Comments on this would be very welcome, even if it's as simple as "looks OK 
to me" or "looks terrible to me".  :-)

Thanks,

Jonathan


Thread Next


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About