karl williamson
January 31, 2010 09:46
RFC: What to do about something like qr/\N{A/
I'm working on a patch to the tokenizer.  In the past \N in regexes was 
special, always followed by a {NAME}.  Now, \N can also be a substitute 
for dot, unaffected by the /s modifier.  So, it is analogous to dot.

But consider:

./perl -Ilib -w -E "say 'B{A' =~ qr/.{A/"

But the corresponding:

./perl -Ilib -w -E "say 'B{A' =~ qr/\N{A/"
Missing right brace on \N{} in regex;

So they don't act analogously.  (Escaping the brace with a backslash works.)

If the letter after the brace is a digit, they do act analogously:

./perl -Ilib -w -E "say 'B{1' =~ qr/.{1/"
./perl -Ilib -w -E "say 'B{1' =~ qr/\N{1/"

The behavior could be to leave it this way.  No pre-existing code is 
broken, as \N always had to be followed by a {NAME}, and still does 
outside of patterns.

Or the behavior could be to not consider \N special unless it is 
followed by a pair of braces with only things that could be a Unicode 
name between them (the space character and \w restricted to ASCII, and 
hyphen, I think without looking it up).  But then, if someone forgets 
the closing brace, it won't generate a warning like it has always done, 
and won't do what they probably wanted, like:

./perl -Ilib -w -E "say 'B{SPACE =~ qr/\N{SPACE/"  # HYPOTHETICAL

I'm inclined to go with the existing behavior, but thought I should 
check it out with you all.  We could also extend it so that any non-word 
character following the left brace would behave as a digit does.

