On 12/29/2010 03:57 AM, Dave Mitchell wrote: > On Fri, Dec 17, 2010 at 08:11:15AM -0700, Tom Christiansen wrote: >>> I've always wondered why a lone } or ] does not need escaping (they're >>> only special after an opening { or [ has been seen), but a lone ) does. >> >> So have I. It could be worse: things like quantifiers still >> need escaping to be made literals even if they couldn't quantify >> something, such as at the beginning of a string. A (poor) argument >> could be made that in such a position, escaping isn't necessary >> to infer function, and it seems to me some nasty regex dialects >> do just that. I certainly don't care for it. >> >>> And I don't think Perl5 every will. There's so much code out there that >>> doesn't escape \W characters outside of the dozen mentioned above (and >>> if we see a newbie escaping a \W outside of the dozen, we pick on him), >> >> Now that you mention it, you're right, we do. Hadn't thought of that. > > > Ok. How about the following resolution: we change it so that utf8 strings > get chr(128)-chr(255) escaped, so that it matches the non-utf8 case, and > leave chars> 255 unescaped. In some future world if chars> 255 start > having special meaning to the regex engine, then we start escaping them > too. > This proposal and all others died in 5.14 for lack of consensus. This leaves the Unicode bug extant for quotemeta, and I would like to get it fixed. Tom has told me privately that he's ok with changing things to get consistent rules for UTF8- vs non-UTF8 encoded strings. I'm thinking we should just do what the original trouble ticket asks for, and what the documentation has always said, and that is to quote everything that matches [^a-zA-Z0-9_]. This agrees with the first part of Dave's proposal, but makes all above Latin1 chars also escaped. I'm reopening this publicly now, in order to try to get resolution in the next week or so, so that we can do something for 5.16. Either proposal is easy to implement, and fast in cpu cycles. If we do this, does that close the door on later changing to use the pattern syntax should it ever become necessary? I think that it doesn't. This thread included extensive discussion on that.Thread Previous | Thread Next