develooper Front page | perl.perl6.language | Postings from May 2008

[svn:perl6-synopsis] r14542 - doc/trunk/design/syn

From:
larry
Date:
May 17, 2008 14:37
Subject:
[svn:perl6-synopsis] r14542 - doc/trunk/design/syn
Message ID:
20080517213738.A0740CBAC4@x12.develooper.com
Author: larry
Date: Sat May 17 14:37:37 2008
New Revision: 14542

Modified:
   doc/trunk/design/syn/S05.pod

Log:
Clarifications to how tied longest tokens are handled under LTM


Modified: doc/trunk/design/syn/S05.pod
==============================================================================
--- doc/trunk/design/syn/S05.pod	(original)
+++ doc/trunk/design/syn/S05.pod	Sat May 17 14:37:37 2008
@@ -14,9 +14,9 @@
    Maintainer: Patrick Michaud <pmichaud@pobox.com> and
                Larry Wall <larry@wall.org>
    Date: 24 Jun 2002
-   Last Modified: 7 May 2008
+   Last Modified: 18 May 2008
    Number: 5
-   Version: 78
+   Version: 79
 
 This document summarizes Apocalypse 5, which is about the new regex
 syntax.  We now try to call them I<regex> rather than "regular
@@ -2094,8 +2094,14 @@
 expressions).  A logical alternation using C<|> then takes two or
 more of these lists and dispatches to the alternative that matches
 the longest token prefix.  This may or may not be the alternative
-that comes first lexically.  (However, in the case of a tie between
-alternatives, the textually earlier alternative does take precedence.)
+that comes first lexically.
+
+However, if two alternatives match at the same length, the tie is
+broken by one of two methods.  If the alternatives are in different
+grammars, standard MRO (method resolution order) determines which
+one to try first.  If the alternatives are in the same grammar, the
+textually earlier alternative takes precedence.  (If a grammar's rules
+are defined in more than one file, the results are undefined.)
 
 This longest token prefix corresponds roughly to the notion of "token"
 in other parsing systems that use a lexer, but in the case of Perl
@@ -2150,6 +2156,11 @@
 Greedy quantifiers and character classes do not terminate a token pattern.
 Zero-width assertions such as word boundaries are also okay.
 
+Because such assertions can be part of the token, the lexer engine must
+be able to recover from the failure of such an assertion and backtrack
+to the next best token candidate, which might be the same length or shorter,
+but can never be longer than the current candidate.
+
 For a pattern that starts with a positive lookahead assertion,
 the assertion is assumed to be more specific than the subsequent
 pattern, so the lookahead's pattern is treated as the longest token;



nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About