I wonder if people realize that it is possible to redefine the standard properties? You can invert the meaning, for example, of \p{ASCII}. (Fortunately, I believe that this can only be done for the current package, and I haven't found a way for it to leak outside that.) I was unaware of this, and don't find it documented. It is therefore possible that someone could accidentally do this. Just defining a subroutine that has the same name as a Unicode property removes the use of that Unicode property for the package, and tries to call the subroutine instead: perl -E '"a" =~ /\p{ASCII}/; sub ASCII { say "fooled you" } ' fooled you (Unlike standard properties, the name is case sensitive.) It has been agreed on that ANYOF regexp nodes, which are used for [bracketed] character classes and \p{} properties, should change to use inversion lists for specifying the characters (above 255) to match. And it is my intention to do this for 5.16. For efficiency, and accuracy because certain characters need special handling under /i, it is best to calculate at compile time the complete list of things that the node can match. However, this can't currently be done for user-defined (or redefined) properties. This is because the regex can currently be compiled before the subroutine that defines the property is encountered in the input, as in the example above. This could be avoided if a 'use subs' is required, but that raises some backward compatibility issues. The one solution I can think of defers to runtime the compilation of patterns that contain \p, the same as already happens with patterns that contain interpolated variables. Then all applicable subroutines will have been defined, and this problem goes away. So, my questions are: 1) Is it right that we allow system properties to be overridden? 2) Is my approach of deferring compilation of \p patterns reasonable?Thread Previous | Thread Next