develooper Front page | perl.perl5.porters | Postings from November 2010

RFC: Issues with user-defined \p{} properties

Thread Previous | Thread Next
From:
karl williamson
Date:
November 5, 2010 09:41
Subject:
RFC: Issues with user-defined \p{} properties
Message ID:
4CD4336F.9000801@khwilliamson.com
I wonder if people realize that it is possible to redefine the standard 
properties?  You can invert the meaning, for example, of \p{ASCII}. 
(Fortunately, I believe that this can only be done for the current 
package, and I haven't found a way for it to leak outside that.)  I was 
unaware of this, and don't find it documented.  It is therefore possible 
that someone could accidentally do this.  Just defining a subroutine 
that has the same name as a Unicode property removes the use of that 
Unicode property for the package, and tries to call the subroutine instead:

  perl -E '"a" =~ /\p{ASCII}/; sub ASCII { say "fooled you" } '
  fooled you

(Unlike standard properties, the name is case sensitive.)

It has been agreed on that ANYOF regexp nodes, which are used for 
[bracketed] character classes and \p{} properties, should change to use 
inversion lists for specifying the characters (above 255) to match.  And 
it is my intention to do this for 5.16.

For efficiency, and accuracy because certain characters need special 
handling under /i, it is best to calculate at compile time the complete 
list of things that the node can match.  However, this can't currently 
be done for user-defined (or redefined) properties.  This is because the 
regex can currently be compiled before the subroutine that defines the 
property is encountered in the input, as in the example above.

This could be avoided if a 'use subs' is required, but that raises some 
backward compatibility issues.

The one solution I can think of defers to runtime the compilation of 
patterns that contain \p, the same as already happens with patterns that 
contain interpolated variables.  Then all applicable subroutines will 
have been defined, and this problem goes away.

So, my questions are:

1) Is it right that we allow system properties to be overridden?

2) Is my approach of deferring compilation of \p patterns reasonable?

Thread Previous | Thread Next


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About