develooper Front page | perl.perl5.porters | Postings from November 2008

Re: PATCH [perl #59342] chr(0400) =~ /\400/ fails for >= 400

Thread Previous | Thread Next
November 12, 2008 23:30
Re: PATCH [perl #59342] chr(0400) =~ /\400/ fails for >= 400
Message ID:
2008/11/13 Tom Christiansen <>:
>> My understanding is that in a regex, if you have 3 matches, that "\333"
>> might be more ambiguous than you are assuming.
>>> There is GREAT reason *not* to delete it, as the quantity of code you would
>>> see casually rendered illegal is incomprehensibly large, with the work
>>> involved in updating code, databases, config files, and educating
>>> programmers and users incalculably great.  To add insult to injury, this
>>> work you would see thrust upon others, not taken on yourself.
>> Yep, that's a great reason.
> I'm glad you agree, easily suffices to shut off the rathole.
> And in case it doesn't, the output below will convince anyone
> that we ***CANNOT*** remove \0ctal notation.  Larry would never
> allow you to break so many people's code.  It would be the worst
> thing Perl has ever done to its users.  It verges upon the insane.

First please separate what Glenn said from what Rafael and I said,
which is that it might be a good idea to deprecate octal IN REGULAR

I spoke perhaps more harshly than I meant originally, which is what
kicked this off. I should have said "strongly discouraged" and not

Obviously from a back compat viewpoint we can't actually remove octal
completely FROM THE REGEX ENGINE. At the very least there is a large
amount of code that either generates octal sequences or contains them

But we sure can say n the docs that "it is recommended that you do not
use octal in regular expressions in new code as it is ambiguous as to
how they will be interpreted, especially low value octal (excepting
\0) can easily be mistaken for a backreference".

> Grepping for \\\d *ONLY* in the indented code segments of the standard pods:

Oh cmon! You of all people must know a whole whack of ways to count
them. You dont have to include them all in a mail. Gmail didn't even
let me see the full list. The list also is a bit off-topic* as very
few of those are actually in regular expressions, and amusingly the
second item in your list isn't octal. Illustrating the problem nicely.

Personally I dislike ambiguous syntax and think it should in general
be avoided, and that maybe we should do something to make it easier to
see when there is ambiguous syntax. And I especially dislike ambiguous
syntax that can be made to change meaning by action at a distance. If
I concatenate a pattern that contains an octal sequence to a pattern
that contains a bunch of capture buffers the meaning of the "octal"
changes. That is bad.

Assuming that grok_oct() consumes at most 3 octal digits, I think we
can apply Karls patch. However I do think we should recommend against
using octal IN REGULAR EXPRESSIONS. And should note that while you CAN
use octal to represent codepoints up to 511 it is strongly recommended
that you don't.

Also I have a concern that Karls patch merely modifies the behaviour
in the regular expression engine. It doesn't do the same for other
strings. If it is going to be legal it should be legal everywhere.

Anyway theres no need to flood the list with grep output or proclaim
that if people don't get your point that you will appeal to the BDFL.
We are all nice rational people here and in general if you point out
the flaws in our logic we will admit it. And you have made your point,
and would have made your point regardless of the hyperbole and drama.

* Glen changed the topic of this subthread somewhat by taking an idea
and seeing how far he could run with it. But the original topic was
octal IN REGULAR EXPRESSIONS, so lets keep it on that subject.
perl -Mre=debug -e "/just|another|perl|hacker/"

Thread Previous | Thread Next Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at | Group listing | About