develooper Front page | perl.perl5.porters | Postings from November 2008

Re: PATCH [perl #59342] chr(0400) =~ /\400/ fails for >= 400

Thread Previous | Thread Next
From:
Rafael Garcia-Suarez
Date:
November 12, 2008 07:04
Subject:
Re: PATCH [perl #59342] chr(0400) =~ /\400/ fails for >= 400
Message ID:
b77c1dce0811120703r5d7476b5n83d2cbd896f7f111@mail.gmail.com
2008/11/4 demerphq <demerphq@gmail.com>:
> 2008/10/25 karl williamson <public@khwilliamson.com>:
>> So we have an existing bug.  sometimes \400 matches \400, and sometimes it
>> matches \01\00, depending on what I would call spooky action at a distance.
>>  (This means that \777 sometimes already matches \777 now.) I'm trying to
>> get rid of these consistencies.
>
> Ive been out of touch for a while so I'm not entirely up to speed on
> what exactly you have in mind. But I think that you can not eliminate
> the inconsistencies in octal/backref escapes. Especially some aspects
> of spooky action at a distance, as that interacts with the number of
> capture buffers in the LAST pattern matched.
>
> So basically the docs should say, if they do not already,"it is
> *strongly* recommended that you do NOT use octal in any form in a
> regex" (except perhaps in a charclass definition).
>
> And I would argue that in 5.12 we should make them warn, and then in
> 5.14 make them illegal or ONLY mean capture buffers.

I agree with Yves here.

I don't think it's worth changing the meaning of \400 in double quoted
strings, or making it warn. However, in regexps, it's too dangerously
inconsistent and should be deprecated. First, a deprecation warning
seems in order.

However, I see some value in still allowing [\000-\377] character
ranges, for example. Do we really want to deprecate that as well?
This doesn't seem necessary.

> Consider /\1/ means the first capture buffer of the previous match,
> \17 means the _seventeenth_ capture buffer of the previous match IFF
> the previous match contains more 17 or more capture buffers, otherwise
> it means \x{F}.
>
> In short: resolving the inconsistencies in octal notation in regex
> notation would appear to be impossible.

Error messages are a mess, too. This one is correct:
$ perl -wE '/\8/'
Reference to nonexistent group in regex; marked by <-- HERE in m/\8
<-- HERE / at -e line 1.

This one shows clearly that we're using a regexp that matches
"\x{1}8", but why is there a duplicated warning? Double magic?
$ perl -wE '/\18/'
Illegal octal digit '8' ignored at -e line 1.
Illegal octal digit '8' ignored at -e line 1.
Use of uninitialized value $_ in pattern match (m//) at -e line 1.

Thread Previous | Thread Next


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About