Tom Hughes wrote: > > In message <20001128193022.A11924@pembro33.pmb.ox.ac.uk> > Simon Cozens <simon@cozens.net> wrote: > > > In a sense, though, you're right; this is a general problem. I'm currently > > trying to work out a design for a tokeniser, and it seems to me that > > there's going to be a lot of communicating of "hints" between the > > tokeniser, the lexer and the parser. > > You have to be vary careful about downward communication from the > parser to the lexer if there's any lookahead involved as you can > find that you're trying to affect the lexing of tokens which are > already in the lookahead buffer of the parser. > > Backtracking may well be better than lookahead here as you can always > jump back a bit after you change the lexer's state ;-) The difference may be more illusory than real. In either case, you have to undo something: your recursion stack for backtracking, some parser state for lookahead. And in both cases, undoing those things is a hell of a lot easier than undoing the code that was run when you recognized (or thought you recognized) some chunk of tokens as an anonymous sub or whatever. For example, say you stuck an entry for the subroutine into the package's symbol table. You'd better have kept the original, in case you were wrong -- but you might not want to keep all originals, or you'll blow your memory. Perhaps we can avoid doing anything significant during parsing (when will BEGIN{} run?), but perhaps not. Handling the parser's state can be done in a backtracking DFA-like or a direct NFA-like way. The NFA way is to keep track of all possible parse states and advance each one in parallel based on the next token. The DFA way is recursive descent, backing out of blind alleys and trying again, keeping a single working hypothesis alive at a time. The DFA approach is probably easier to undo user code in, because in the NFA case you have to consider each token under the assumptions of all possible parses up to that point. The NFA case has the advantage that you never have to back up, so you can permanently forget about a token as soon as it whizzes by. Perl5 is parseable with a single token of lookahead and lots of parser/lexer communication. Sort of. It would be nice to prevent it from getting any worse. We could pretend to support full DWIMmery by telling the user when it fails: 10 print foo bar(); 13 sub foo { ... } DWIMmery badness 10000: Sorry, but I screwed up by assuming 'foo' was a direct object in line 10, and only found out on line 13. Would you mind predeclaring 'foo' somewhere before line 10? ...but that would be weird. print foo bar(); eval "sub foo { $code }"; print foo bar();Thread Previous | Thread Next