develooper Front page | perl.perl5.porters | Postings from April 2010

[PATCH] Re: [perl #74022] Parser hangs on some Unicode numbers and symbols in identifiers

Thread Next
From:
Father Chrysostomos
Date:
April 26, 2010 03:09
Subject:
[PATCH] Re: [perl #74022] Parser hangs on some Unicode numbers and symbols in identifiers
Message ID:
925BA8AF-93B8-41CE-B8EF-9ACA730A2D1F@cpan.org
This was broken by change 30148
(<http://perl5.git.perl.org/perl.git/commitdiff/8158862b92>), which
introduces \p{OtherIDContinue} into the set of characters matched by
\p{ID_Continue}.

In Perl_yylex in toke.c:

     switch (*s) {
     default:
	if (isIDFIRST_lazy_if(s,UTF))
	    goto keylookup;

isIDFIRST_lazy_if returns true for characters in ID_Continue that are  
not digits. (see handy.h:

/* The ID_Start of Unicode is quite limiting: it assumes a L-class
  * character (meaning that you cannot have, say, a CJK character).
  * Instead, let's allow ID_Continue but not digits. */
#define isIDFIRST_utf8(p)	(is_utf8_idcont(p) && !is_utf8_digit(p))

)

Then further down (in toke.c):

       keylookup: {
...(8 lines snipped)...
	s = scan_word(s, PL_tokenbuf, sizeof PL_tokenbuf, FALSE, &len);

S_scan_word has:

	else if (UTF && UTF8_IS_START(*s) && isALNUM_utf8((U8*)s)) {

So characters in \p{OtherIDContinue}, such as U+387 and U+1369, get  
treated as the first char of a keyword by isIDFIRST_lazy_if, but  
scan_word rejects them and does not advance, since it doesn’t use  
isIDFIRST_lazy_if except after a ‘'’. So we have an infinite number of  
zero-length keywords....

I think scan_word should be using is_utf8_idcont, rather than  
isALNUM_utf8. The attached patch makes it do just this.

Thread Next


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About