2008/11/13 Tom Christiansen <tchrist@perl.com>: >> My understanding is that in a regex, if you have 3 matches, that "\333" >> might be more ambiguous than you are assuming. > >>> There is GREAT reason *not* to delete it, as the quantity of code you would >>> see casually rendered illegal is incomprehensibly large, with the work >>> involved in updating code, databases, config files, and educating >>> programmers and users incalculably great. To add insult to injury, this >>> work you would see thrust upon others, not taken on yourself. > >> Yep, that's a great reason. > > I'm glad you agree, easily suffices to shut off the rathole. > > And in case it doesn't, the output below will convince anyone > that we ***CANNOT*** remove \0ctal notation. Larry would never > allow you to break so many people's code. It would be the worst > thing Perl has ever done to its users. It verges upon the insane. > First please separate what Glenn said from what Rafael and I said, which is that it might be a good idea to deprecate octal IN REGULAR EXPRESSIONS. I spoke perhaps more harshly than I meant originally, which is what kicked this off. I should have said "strongly discouraged" and not "deprecated". Obviously from a back compat viewpoint we can't actually remove octal completely FROM THE REGEX ENGINE. At the very least there is a large amount of code that either generates octal sequences or contains them IN REGULAR EXPRESSSIONS. But we sure can say n the docs that "it is recommended that you do not use octal in regular expressions in new code as it is ambiguous as to how they will be interpreted, especially low value octal (excepting \0) can easily be mistaken for a backreference". > > Grepping for \\\d *ONLY* in the indented code segments of the standard pods: Oh cmon! You of all people must know a whole whack of ways to count them. You dont have to include them all in a mail. Gmail didn't even let me see the full list. The list also is a bit off-topic* as very few of those are actually in regular expressions, and amusingly the second item in your list isn't octal. Illustrating the problem nicely. Personally I dislike ambiguous syntax and think it should in general be avoided, and that maybe we should do something to make it easier to see when there is ambiguous syntax. And I especially dislike ambiguous syntax that can be made to change meaning by action at a distance. If I concatenate a pattern that contains an octal sequence to a pattern that contains a bunch of capture buffers the meaning of the "octal" changes. That is bad. Assuming that grok_oct() consumes at most 3 octal digits, I think we can apply Karls patch. However I do think we should recommend against using octal IN REGULAR EXPRESSIONS. And should note that while you CAN use octal to represent codepoints up to 511 it is strongly recommended that you don't. Also I have a concern that Karls patch merely modifies the behaviour in the regular expression engine. It doesn't do the same for other strings. If it is going to be legal it should be legal everywhere. Anyway theres no need to flood the list with grep output or proclaim that if people don't get your point that you will appeal to the BDFL. We are all nice rational people here and in general if you point out the flaws in our logic we will admit it. And you have made your point, and would have made your point regardless of the hyperbole and drama. Cheers, yves * Glen changed the topic of this subthread somewhat by taking an idea and seeing how far he could run with it. But the original topic was octal IN REGULAR EXPRESSIONS, so lets keep it on that subject. -- perl -Mre=debug -e "/just|another|perl|hacker/"Thread Previous | Thread Next