Front page | perl.perl6.compiler |
Postings from January 2005
Re: Let the hacking commence!
Thread Previous
|
Thread Next
From:
Luke Blanshard
Date:
January 8, 2005 09:15
Subject:
Re: Let the hacking commence!
Message ID:
41E01545.6030905@blanshard.us
Luke Palmer wrote:
>>[By the way, shouldn't this grammar be called "Perl" rather than
>>"Perl6::Grammar"?...
>>
>>
>Grammars and classes share a namespace, so I think Perl::Grammar is
>correct...
>
>
I got the name Perl for the grammar from S05, which also gives this example:
given $source_code {
$parsetree = m/<Perl.prog>/;
}
>># Whitespace definition for Perl code.
>>rule ws() {
>> # Case 1: Unicode space characters, comments, or POD blocks, or
>> # any combination thereof.
>> [ \s | «comment» | «pod» ]+
>>
>>
>I changed your «comment» and «pod» to <comment> and <pod>. We don't
>have a policy yet on what we're caputring and how, so I'm just leaving
>all the angle brackets single. Once we decide how our resultant data
>structure should look, we can go back and change them.
>
>
Good idea.
>> # Case 2: We're looking at a non-word-constituent or EOF,
>> # meaning zero-width counts as whitespace.
>> | <before \W> | $
>>
>> # Case 3: We must be looking at a word constituent. We match
>> # whitespace at BOF or after a non-word-constituent.
>> | ^ | <after \W>
>>
>>
>
>I'm going to kill these last two cases. The rules for where whitespace
>is optional are more complex than whether you're on a word constituent
>or not. The user of the ws rule is going to know whether whitespace is
>optional or required in a particular position, so he can put <ws> or
><ws>? as he needs to. Also, if we're being good little boys, we'll be
>putting backtracking colons after our identifier matches, so a <ws> rule
>will never show up in the middle of an identifier.
>
>
I'm not sure this will work, unless you get rid of the :w's everywhere
in this grammar. My understanding of how :w works (from S05) is that it
puts <ws> in place of every whitespace sequence in the rule. This means
that <ws> has to be smart enough to match the empty string at particular
places. These two cases are my take on where those particular places
should be for Perl code -- though I may well be missing something!
>>}
>>
>># Comment definition for Perl code.
>>rule comment() {
>> # A hash ("#"), then everything through the next newline or EOF.
>> <'#'> .*? [ \n | $ ]
>>}
>>
>>
>I factored <'#'> out into <comment_introducer>. We're putting all token
>characters into their own rules so it's easy for extenders to change
>them.
>
>
Also a good idea, though of course the fact that comment is a rule means
that extenders can already do this with a little more work.
>>...
>>
>>
>Okay, it's in. I can't say it's correct, since I've never been very
>good at writing regexes, and this is certainly more like a regex than
>like a grammar. When I wasn't sure how something worked, I just assumed
>you did it right.
>
>
Ouch -- I hope somebody (Larry?) gives it a once-over. I'm not a regex
guru either, I find myself writing them every year or two. Nowhere near
often enough for me to assume I've done it right, anyway.
>However, we'd like to eventually make the POD rule less like a match and
>more like a parse. The POD sections are going to be stored as metadata
>for the program to grab if it needs to. Right now, it just pretends
>it's all a comment.
>
>
That makes sense. So the plan is to change POD syntax to not require a
blank line before each command line? I think that will help a lot.
Maybe I'll take a crack at expanding this to more completely parse the
POD, while it's still fresh in my mind.
Luke
Thread Previous
|
Thread Next