develooper Front page | perl.perl6.compiler | Postings from January 2005

Re: Let the hacking commence!

Thread Previous | Thread Next
Luke Blanshard
January 8, 2005 09:15
Re: Let the hacking commence!
Message ID:
Luke Palmer wrote:

>>[By the way, shouldn't this grammar be called "Perl" rather than
>Grammars and classes share a namespace, so I think Perl::Grammar is
I got the name Perl for the grammar from S05, which also gives this example:

    given $source_code {
        $parsetree = m/<Perl.prog>/;

>># Whitespace definition for Perl code.
>>rule ws() {
>>      # Case 1: Unicode space characters, comments, or POD blocks, or
>>      # any combination thereof.
>>    [ \s | «comment» | «pod» ]+
>I changed your «comment» and «pod» to <comment> and <pod>.  We don't
>have a policy yet on what we're caputring and how, so I'm just leaving
>all the angle brackets single.  Once we decide how our resultant data
>structure should look, we can go back and change them.  
Good idea.

>>      # Case 2: We're looking at a non-word-constituent or EOF,
>>      # meaning zero-width counts as whitespace.
>>  | <before \W> | $
>>      # Case 3: We must be looking at a word constituent.  We match
>>      # whitespace at BOF or after a non-word-constituent.
>>  | ^ | <after \W>
>I'm going to kill these last two cases.  The rules for where whitespace
>is optional are more complex than whether you're on a word constituent
>or not.  The user of the ws rule is going to know whether whitespace is
>optional or required in a particular position, so he can put <ws> or
><ws>? as he needs to.  Also, if we're being good little boys, we'll be
>putting backtracking colons after our identifier matches, so a <ws> rule
>will never show up in the middle of an identifier.
I'm not sure this will work, unless you get rid of the :w's everywhere 
in this grammar.  My understanding of how :w works (from S05) is that it 
puts <ws> in place of every whitespace sequence in the rule.  This means 
that <ws> has to be smart enough to match the empty string at particular 
places.  These two cases are my take on where those particular places 
should be for Perl code -- though I may well be missing something!

>># Comment definition for Perl code.
>>rule comment() {
>>      # A hash ("#"), then everything through the next newline or EOF.
>>    <'#'> .*? [ \n | $ ]
>I factored <'#'> out into <comment_introducer>.  We're putting all token
>characters into their own rules so it's easy for extenders to change
Also a good idea, though of course the fact that comment is a rule means 
that extenders can already do this with a little more work.

>Okay, it's in.  I can't say it's correct, since I've never been very
>good at writing regexes, and this is certainly more like a regex than
>like a grammar.  When I wasn't sure how something worked, I just assumed
>you did it right.
Ouch -- I hope somebody (Larry?) gives it a once-over.  I'm not a regex 
guru either, I find myself writing them every year or two.  Nowhere near 
often enough for me to assume I've done it right, anyway.

>However, we'd like to eventually make the POD rule less like a match and
>more like a parse.  The POD sections are going to be stored as metadata
>for the program to grab if it needs to.  Right now, it just pretends
>it's all a comment.
That makes sense.  So the plan is to change POD syntax to not require a 
blank line before each command line?  I think that will help a lot.  
Maybe I'll take a crack at expanding this to more completely parse the 
POD, while it's still fresh in my mind.


Thread Previous | Thread Next Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at | Group listing | About