develooper Front page | perl.perl6.language | Postings from May 2005

character classes in p6 rules

Thread Next
From:
Patrick R. Michaud
Date:
May 11, 2005 17:59
Subject:
character classes in p6 rules
Message ID:
20050512010020.GB31936@pmichaud.com
I now have a basic implementation for enumerated character classes in 
the grammar engine (i.e., <[xyz]>, <-[xyz]>, <[x..z]>, and <-[x..z]>).

I didn't see it specified anywhere, but are the \d, \D, \s, \S, etc.
metacharacters still supposed to work inside of a enumerated character 
class, as they do in Perl 5?   Or in p6 do we always use
<+<digit>+[xyz]>, <-<digit>>, <+<sp>>, <-<sp>>, etc.?

(Yes, I know that normally the absence of any spec to the contrary
indicates that we're still using p5 semantics, but this one is worth 
verification for me.)

While I'm on the subject, let me just ramble a bit -- there are 
times when <alpha>, <digit>, <upper>, etc. give me a bad feeling 
-- they look a little too much like subrules to me, especially 
when looking at <+<alpha>> and the like.  I keep wondering about 
things like <+<ident>> and <-<expr>>.

And something like  C<< rx / <alpha>* / >>  may generate a lot
of not-very-useful one-character captures into $/<alpha> , so that
we'll typically want to get in the habit of writing 

    rx / <?alpha>* /
    rx / <+<alpha>>* /

and then have the engine recognize when this occurs so it
can optimize to a much faster character class op rather than
a lot of calls to a separate subrule.

Plus, <+<alpha>> just looks plain ugly and unbalanced to me.  
Somehow I'd like to get rid of those inner angles, so 
that we always use  <+alpha>, <+digit>, <-sp>, <-punct> to 
indicate named character classes, and specify combinations 
with constructions like  <+alpha+punct-[aeiou]>  and  <+word-[_]>.  
We'd still allow <[abc]> as a shortcut to <+[abc]>.

To me this looks cleaner overall, makes it clear we're doing a
one-character non-capturing match, and may enable a few optimization
possibilities.  (I'm sure that with enough effort we can get 
equivalent optimizations out of the existing syntax, and we may
need them anyway in the long run, but this might simplify that a 
fair bit.)

I haven't thought far ahead to the question of whether
character classes would continue to occupy the same namespace
as rules (as they do now) or if they become specialized kinds
of rules or what.  I'll just leave it at this for now and
see what the rest of p6l thinks.

Pm

Thread Next


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About