develooper Front page | perl.perl5.porters | Postings from September 2013

Re: RFC: What to do about warning: "Code point 0xFOO is not Unicode,all \\p{} matches fail; all \\P{} matches succeed"

Thread Previous | Thread Next
Nicholas Clark
September 13, 2013 09:16
Re: RFC: What to do about warning: "Code point 0xFOO is not Unicode,all \\p{} matches fail; all \\P{} matches succeed"
Message ID:
On Fri, Sep 13, 2013 at 08:55:03AM -0000, Father Chrysostomos wrote:
> Nicholas Clark wrote:
> > 1) this means that it's still viable to use out-of-range code points for
> >    "internal" purposes without generating so many warnings that they get
> >    turned off
> > 2) it permits a warning that is useful to leave on by default
> > 3) the warning can be made fatal for strict(er) behaviour
> > 
> > 
> > But what it doesn't (directly) offer is a way for a Unicode purist to treat
> > as fatal any attempt to match an out-of-range code point.
> I have said this before and am going to say it again:  I think it's
> wrong for the regular expression engine ever to warn (let alone croak)
> based on data passed to it.

Why just the regular expression engine?
Surely this reasoning applies equally well to any other operator that might
warn based on data passed to it, such as arithmetic operators being used on

> It just means that data fed to some module that doesn't care what data
> its getting (it just does some matches and then passes it through)
> will cause it to blow up because fatal warnings happened to be on.
> Or it will cause the module to produce warnings even though there is
> nothing wrong with the code, and the caller has every reason to pass
> non-Unicode data through it.

Fatal warnings are not on by default. Code which asks for fatal warnings is
getting exactly what it asked for. (Just like distributions which make Pod
warnings into fatal tests.)

Warnings are warnings. And intended to be *warnings*, not errors.

And I could say the same thing about some code which is adding values,
which gets passed strings (or references) by mistake.

If the code isn't validating its inputs, then the warning is useful.
And if the code's behaviour is correct when given such inputs, then it
will generate warnings which in that case are spurious.

> The end result is that I have to money-patch to get
> things to work.

I don't see why this follows either. If your code is using a construction
which is documented to warn, and you know that the code in question is
behaving as designed even with inputs that will warning, then the code should
disable the warning.

Just like any other warning being generated in similar circumstances.

You seem to be saying "I disagree with this warning completely and it should
not happen in any code I run". If your entire codebase has been audited to
be sure that it works correctly, that's a viable position to take.

But way too many codebases that I've dealt with have so much legacy and
third partly code that you can't be sure of anything like this, and it's
not commercially viable to spend developer time untangling the black boxes.
It's scary how much ill understood code businesses are running on, but that's
commercial reality, and most firms would go bust if they diverted resources
to trying to fix it. Which is sad, but how it is. In this context, warnings
are useful.

I fail to see why your argument about warnings from the regular expression
engine doesn't generalise to any other class of warnings triggered by data
at runtime, which is why I believe that it's not valid to single out the
regular expression engine.

Nicholas Clark

Thread Previous | Thread Next Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at | Group listing | About