develooper Front page | perl.perl5.porters | Postings from August 2013

RFC: What to do about warning: "Code point 0xFOO is not Unicode,all \\p{} matches fail; all \\P{} matches succeed"

Thread Next
Karl Williamson
August 25, 2013 05:17
RFC: What to do about warning: "Code point 0xFOO is not Unicode,all \\p{} matches fail; all \\P{} matches succeed"
Message ID:
Unicode properties are only defined for code points 0-0x10FFFF, yet Perl 
allows the expression of code points up to UV_MAX, a much larger number. 
  That means that the Unicode behavior for the remainder is undefined. 
We have chosen to make that behavior be as described in the warning.  I 
believe it was me who came up with the warning, and that it was 
introduced in Perl for 5.16.  (And I don't have the tuits/energy to 
fully research it right now, as I believe that is tangential anyway to 
this post.)

Most code will never deal with such large code points, and hence will 
never encounter the warning.  But if it does happen, the warning will 
likely be displayed many times, even in the same regex match when 
backtracking occurs over the large code point(s).

There are also bugs in the implementation.  It was quite wrong for Perl 
v5.16, as Tom Christiansen discovered; largely fixed for v5.18, but the 
warning still doesn't get displayed if the regex node that contains the 
\p{} or \P{} is optimized into something besides the normal one; and it 
can be displayed twice for the same code point even if there is no 
backtracking, the first time for the regex optimizer's synthetic start 
class, before regular matching begins.

I have been thinking of what to do.  One potential solution is to make 
this a once-only display (per thread) message.  That means that its 
display would set a per-interpreter variable that would cause it to 
never display again on the current thread.

But then it occurred to me.  Is this message really necessary?  Perhaps 
we should just get rid of it altogether, and make sure the pod 
documentation is very clear about this possibility for the very rare 
program that is affected.


Thread Next Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at | Group listing | About