develooper Front page | perl.perl6.language | Postings from February 2004

Re: Compiler writing tools

Thread Previous | Thread Next
Larry Wall
February 2, 2004 20:35
Re: Compiler writing tools
Message ID:
On Mon, Feb 02, 2004 at 02:09:33AM -0700, Luke Palmer wrote:
: I've been writing a lot of compiler recently, and figuring as how Perl
: 6 is aiming to replace yacc, I think I'll share some of my positive and
: negative experiences.   Perhaps Perl 6 can adjust itself to help me out
: a bit.  :-)

Perl 6 is designed to be adjusted, but it would be quite an AI feat
for it to adjust itself.  :-)

: =over
: =item * RegCounter
: I have a class called RegCounter which is of immense use, but could be
: possibly more elegant.  It's a tied hash that, upon access, generates a
: new name and stores it in a table for later retrieval under the same
: name.  
: It has a method called C<next> that returns a new RegCounter that shares
: the same counter, and puts whatever was in that one's "ret" slot into
: whatever argument was given to C<next>, by default "next".
: The first <[^a-z]> characters in the name are passed along to the
: generated register name, defaulting to a target-specific string (for
: instance, I use $P for Parrot programs).
: So I can do, for instance:
:     method if_statement::code($rc) { # $rc is the regcounter
:           self.item[0].code($'condition'))
:         ~ "unless $rc{condition}, $rc{Lfalse}\n"
:         ~ self.item[1].code($
:         ~ "$rc{Lfalse}:\n"
:     }

What do you want Perl 6 to do for you here?

: =item * Concatenations
: The code example you just saw gets much, much uglier if there is added
: complexity.  One of my compilers returns lists of lines, the other
: concatenates strings, and they're both pretty hard to read -- especially
: when there are heredocs all over the place (which happens frequently).
: I think $() will help somewhat, as will interpolating method calls, but
: for a compiler, I'd really like PHP-like parse switching.  That is, I
: could do something like (I'll use $< and $> for <? and ?>):
:     method logical_or_expression::code($rc) {
:         <<EOC;
:             null $rc{ret}
:             $< for @($self.item[0]) -> $item { $>
:                   $item.code($
:                   if $rc{next}, $rc{Ldone}
:             $< } $>
:             $rc{Ldone}:
:         EOC
:     }

This seems to me to fall into the category of useful language warpings,
but not necessarily for mandatory public consumption.  String literals
are parsed by the main parser in Perl 6, unlike in Perl 5.  So a
grammatical munging should be doable.  "All is fair if you predeclare" and
all that...

By the way, the first production language I ever wrote was an
inside-out language where control commands were embedded in text that
was to be output by default.  So I'm not knocking your proposal.

: For this case, I think it would also be a good idea to have a string
: implementation somewhere that stores things as "ropes", a list of
: strings, so that immense copying isn't necessary.

Well, I suggested something like this early in the design of Parrot,
but it doesn't seem to have flown in the general case.  On the other
hand, the string abstraction ought to be big enough to hide alternate
implementations behind it.  The whole "is from" notion is built on that

: =item * Comments
: We've already gone over this, but it'd be good to have the ability for
: parsers to (somehow) "feed" into one another, so that you can do
: comments without putting a <comment> in between every grammar rule (or
: mangling things to do that somehow), or search and replace, which has
: the disadvantage of being unable to disable comments during parts of the
: parse.  $Parse::RecDescent::skip works well, but I don't think it's
: general enough.

Agreed.  I do think you want the comments in the grammar, if for no
other reason than it provides a hook to do something with the comment
if you retarget the grammar from normal compilation to, say, code
translation.  I don't think it's out of the realm of possibility for
Perl 6 to support strings with embedded objects as funny characters.
In the limit, a string could be composed of nothing but a stream
of objects.  (As a hack, one can embed illegal Unicode characters
(above U+10FFFF) that map an integer to an array of objects, but
maybe we can do better from a GC perspective.)

: =item * Line Counting
: It is I<essential> that the regex engine is capable (perhaps off by
: default) of keeping track of your line number.

By all means!  A compiler must absolutely never emit an inaccurate line
number if it can help it.  Few things are as irritating as "...bailing
out near line 100."  If we don't provide an explicit lexical analysis
pass that handles this, then the regex engine must somehow.  Though I
haven't really thought much about the *how* part of the somehow.


Thread Previous | Thread Next Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at | Group listing | About