develooper Front page | perl.perl6.language | Postings from May 2005

Re: comprehensive list of perl6 rule tokens

Thread Previous | Thread Next
From:
Jeff 'japhy' Pinyan
Date:
May 26, 2005 16:09
Subject:
Re: comprehensive list of perl6 rule tokens
Message ID:
Pine.LNX.4.61.0505261855070.19144@perlmonk.org
On May 26, Patrick R. Michaud said:

> On Tue, May 24, 2005 at 08:25:03PM -0400, Jeff 'japhy' Pinyan wrote:
>> I have looked through the latest
>> revisions of Apo05 and Syn05 (from Dec 2004) and come up with the
>> following list:
>>
>>   http://japhy.perlmonk.org/perl6/rules.txt
>
> I'll review the list below, but it's also worthwhile to read
>
>   http://www.nntp.perl.org/group/perl.perl6.language/21120
>
> which is Larry's latest missive on character classes, and
>
>   http://www.nntp.perl.org/group/perl.perl6.language/20985
>
> which describes the capturing semantics (but be sure to note
> the lengthy threads that follow concerning changes in the
> indexing from $1, $2, ... to $0, $1, ... ).

I'll check them out.  Right now, I'm really only concerned with syntax 
rather than implementation.  Perl6::Rule::Parser will only parse the rule 
into a tree structure.

> 	&	a&b		N	conjunction
> 		&var		N	subroutine
>
> I'm not sure that "&var" means subroutine anymore.  A05 does mention

Ok.  If it goes away, I'm fine with that.

> 	x**{n..m}	N	previous atom n..m times
>
> Keeping in mind that the "n..m" can actually be any sort of closure

Yeah, I know.

> 	(	(x)		Y	capture 'x'
> 	)			Y	must match opening '('
>
> It may be worth noting that parens not only capture, they also
> introduce a new scope for any nested subpattern and subrule captures.

Ok.  I don't think that'll affects me right now.

> 	:ignorecase	N	case insensitivity :i
> 	:global		N	match globally :g
> 	:continue	N	start scanning after previous match :c
>        ...etc
>
> I'm not sure these are "tokens" in the sense of "single unit of purpose"
> in your original message.  I think these are all adverbs, and the "token"
> is just the initial C<:> at the beginning of a group.

I understand, but that set is particularly important to me, because as far 
as I am concerned, the rule

   /abc/

is the object Perl6::Rule::Parser::exact->new('abc'), whereas the rule

   /:i abc/

is the object Perl6::Rule::Parser::exactf->new('abc') -- this is using 
node terminology from Perl 5, where "exactf" means "exact with case 
folding".

> 	:keepall	N	all rules and invoked rules remember everything
>
> That's now  ":parsetree" according to Damian's proposed capture rules.

Ok.  I haven't seen those yet.

> 	<commit>	N	backtracking fails completely
> 	<cut>		N	remove what matched up to this point from the string
> 	<after P>	N	we must be after the pattern P
> 	<!after P>	N	we must NOT be after the pattern P
> 	<before P>	N	we must be before the pattern P
> 	<!before P>	N	we must NOT be before the pattern P
>
> As with ':words', etc., I'm not sure that these qualify as "tokens"
> when parsing the regex -- the tokens are actually "<" or "<!" and

I understand.  Luckily this new syntax will enable me to abstract things 
in the parser.

   my $obj = $S->object(assertion => $name, $neg);
   # where $name is the part after the < or <!
   # and $neg is a boolean denoting the presence of !

Since there's no longer different prefixes for every type of assertion, I 
no longer need to make specific classes of objects.

>        <?ws>		N	match whitespace by :w rules
> 	<?sp>		N	match a space character (chr 32 ONLY)
>
> Here the token is "<?", indicating a non-capturing subrule.

Right.

> 	<$rule>		N	indirect rule
> 	<::$rulename>	N	indirect symbolic rule
> 	<@rules>	N	like '@rules'
> 	<%rules>	N	like '%rules'
> 	<{ code }>	N	code produces a rule
> 	<&foo()>	N	subroutine returns rule
> 	<( code )>	N	code must return true or backtracking ensues
>
> Here the leading tokens are actually "<$", "<::$", "<@", "<%", "<{", "<&",
> and "<(", and I suspect we have "<?$", "<?::$", "<?@", and "<!$", "<!::$",
> "<!@", etc. counterparts.

Per your second message, <!@rules> would mean <!before <@rules>>, right?

>                            Of course, one could claim that these are
> really separated as in "<", "?", and "$" tokens, but PGE's parser currently
> treats them as a unit to make it easier to jump directly into the correct
> handler for what follows.

Yes, so does mine. :)

> 	<[a-z]>		N	character class
> 	<+alpha>	N	character class
> 	<-[a-z]>	N	complemented character class
>
> The tokens for character class manipulation are currently "<[", "<+",
> and "<-", although that's not officially documented in A05 or S05 yet.
> Also, ranges are now <[a..z]> -- an unescaped hyphen appearing in an
> enumerated character class generates a warning.
>
> 	<+\w-[0-9]>	N	character class "arithmetic"
>
> I'm not sure that it's been decided/documented that \w, \s, etc.
> can appear in character class arithmetic (although it seems like it
> should).

The new character class idiom is going to confuse me for a while.  I'll 
have to read the above URL in which Larry sheds light.

> 	<prop:X>	N	Unicode property match
> 	<-prop:X>	N	complemented Unicode property match
>
> Here "prop" is just a subrule (or character class) similar to
> <+alpha>, <+digit>, etc.  Also, note that <prop:X> is a capturing
> subrule, while <+prop:X> would be a character class match (and presumably
> not capture).

I think I'll wait to handle Unicode properties until a syntax has been 
agreed upon... <prop:X>, <X>, <prop(X)>, etc.

> 	<rule>		N	match rule (and capture to $rule)
> 	<?rule>		N	match rule (don't capture)
> 	<<rule>>	N	match rule (don't capture)
>
> Do we still have the <<rule>> syntax, or was that abandoned in
> favor of <?rule> ?  (I know there are still some remnants of <<...>>
> in S05 and A05, but I'm not sure they're intentional.)

I saw <<...>> in A/S 05, but if they're accidental, then I just won't deal 
with it.

And, what's the deal with <RULE> capturing?  Does that mean I have to 
write <?digit> everywhere instead of <digit> unless I want a capture?  Eh, 
I guess \d exists for that reason...

>> Thanks for your help.  Unless you're difficult.
>
>    "You're welcome"  unless $Pm ~~ /<?difficult>/;

Difficulty nonexistent.

-- 
Jeff "japhy" Pinyan         %  How can we ever be the sold short or
RPI Acacia Brother #734     %  the cheated, we who for every service
http://japhy.perlmonk.org/  %  have long ago been overpaid?
http://www.perlmonks.org/   %    -- Meister Eckhart

Thread Previous | Thread Next


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About