develooper Front page | perl.perl5.porters | Postings from April 2010

[PATCH] Re: [perl #74022] Parser hangs on some Unicode numbers and symbols in identifiers

Thread Next
Father Chrysostomos
April 26, 2010 03:09
[PATCH] Re: [perl #74022] Parser hangs on some Unicode numbers and symbols in identifiers
Message ID:
This was broken by change 30148
(<>), which
introduces \p{OtherIDContinue} into the set of characters matched by

In Perl_yylex in toke.c:

     switch (*s) {
	if (isIDFIRST_lazy_if(s,UTF))
	    goto keylookup;

isIDFIRST_lazy_if returns true for characters in ID_Continue that are  
not digits. (see handy.h:

/* The ID_Start of Unicode is quite limiting: it assumes a L-class
  * character (meaning that you cannot have, say, a CJK character).
  * Instead, let's allow ID_Continue but not digits. */
#define isIDFIRST_utf8(p)	(is_utf8_idcont(p) && !is_utf8_digit(p))


Then further down (in toke.c):

       keylookup: {
...(8 lines snipped)...
	s = scan_word(s, PL_tokenbuf, sizeof PL_tokenbuf, FALSE, &len);

S_scan_word has:

	else if (UTF && UTF8_IS_START(*s) && isALNUM_utf8((U8*)s)) {

So characters in \p{OtherIDContinue}, such as U+387 and U+1369, get  
treated as the first char of a keyword by isIDFIRST_lazy_if, but  
scan_word rejects them and does not advance, since it doesn’t use  
isIDFIRST_lazy_if except after a ‘'’. So we have an infinite number of  
zero-length keywords....

I think scan_word should be using is_utf8_idcont, rather than  
isALNUM_utf8. The attached patch makes it do just this.

Thread Next Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at | Group listing | About