develooper Front page | perl.perl5.porters | Postings from November 2008

Re: PATCH [perl #59342] chr(0400) =~ /\400/ fails for >= 400

Thread Previous | Thread Next
Glenn Linderman
November 3, 2008 21:31
Re: PATCH [perl #59342] chr(0400) =~ /\400/ fails for >= 400
Message ID:
On approximately 11/3/2008 7:04 PM, came the following characters from 
the keyboard of karl williamson:
> Abigail wrote:
>> On Sat, Oct 25, 2008 at 07:23:29PM -0600, Tom Christiansen wrote:
>>> Dear Glenn and Karl,
>>>   +=============================+
>>>   | SUMMARY of Exposition Below |
>>>   v-----------------------------v
>>>   *  I fully agree there's a bug.
>>>   *  I believe Karl has produced a reasonable patch to fix it.
>>>   *  I wonder what *else* might/should also change in tandem
>>>      with this estimable amendment so as to:
>>>      ?  avoid evoking or astonishing any hobgoblins of
>>>         foolish inconsistency (ie: breaking bad expectations)
>>>      ?  what (if any?) backwards-compat story might need
>>>         spinning (ie, breaking code, albeit cum credible Apologia)
>> I am seldomly in favour of new warnings for existing code, but perhaps
>> use of \NNN, with NNN > 377 in a regexp should trigger a warning, as
>> its behaviour is surprising - not to mention some code out there may
>> rely on the current (buggy) behaviour.
>> Abigail
> So what's the answer?  I don't think it should be an error for >=400.
> I think a warning would be ok, and I know enough to put such a warning 
> into regcomp.c.  And at compile time, one could handle >8-bit machines 
> by testing sizeof(char).  But what to do besides warn?  Assume they 
> meant unicode, as character classes do now, my patch would do?  And I 
> don't know the rest of the code.  The only other place that grok_oct() 
> is called is from the oct() function.  My limited knowledge of how perl 
> works suggests that a call to this is put on the stack when evaluating a 
> double-quotish constant.  I don't know enough about the code at this 
> time to know how to add a warning to that.
> I think we're all agreed that there is a bug.  And that there should be 
> consistency of handling, unlike now.  I await your responses.

I think it should be an error on an 8-bit machine (but you'd need to 
test MAXCHAR not sizeof(char) which is 1), because it is a useless way 
of specifying a Unicode codepoint — useless, because it can only be used 
for a small fraction of the possible codepoints, and the documentation 
for codepoints is all in hexadecimal, not octal.

But if you don't think it should be an error, then my number two 
position is to let it silently be a Unicode codepoint, which is what 
your patch already does.

I'm not in favor of warnings for this sort of thing, but regexp and 
quoted strings should handle it the same way, and either both or neither 
should produce the warning.

I'm quite happy to let the pumpking decide when there are questions of 
whether or not to add new warnings or errors.

Glenn --
A protocol is complete when there is nothing left to remove.
-- Stuart Cheshire, Apple Computer, regarding Zero Configuration Networking

Thread Previous | Thread Next Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at | Group listing | About