develooper Front page | perl.perl5.porters | Postings from June 2013

Re: Regex \8 and \9 after literals no longer work

Thread Previous | Thread Next
From:
demerphq
Date:
June 25, 2013 22:38
Subject:
Re: Regex \8 and \9 after literals no longer work
Message ID:
CANgJU+XBRRt=vGiOB2No2yNgKhvTJokkD=dappAoh0kPKkcwew@mail.gmail.com
On 25 June 2013 20:26, demerphq <demerphq@gmail.com> wrote:
> On 25 June 2013 16:55, Michael Schroeder <mls@suse.de> wrote:
>>
>> Hi Porters,
>>
>> commit #726ee55d breaks matching of \8 and \9 if they come after a
>> literal:
>>
>>     use re 'debug';
>>     my $a = '(((((((((x)))))))))foo\9';
>>     my $b = 'xfoox';
>>     $b =~ /$a/;
>>
>> Output:
>>
>> Final program:
>>    1: OPEN1 (3)
>>    3:   OPEN2 (5)
>>    5:     OPEN3 (7)
>>    7:       OPEN4 (9)
>>    9:         OPEN5 (11)
>>   11:           OPEN6 (13)
>>   13:             OPEN7 (15)
>>   15:               OPEN8 (17)
>>   17:                 OPEN9 (19)
>>   19:                   EXACT <x> (21)
>>   21:                 CLOSE9 (23)
>>   23:               CLOSE8 (25)
>>   25:             CLOSE7 (27)
>>   27:           CLOSE6 (29)
>>   29:         CLOSE5 (31)
>>   31:       CLOSE4 (33)
>>   33:     CLOSE3 (35)
>>   35:   CLOSE2 (37)
>>   37: CLOSE1 (39)
>>   39: EXACT <foo9> (41)
>>   41: END (0)
>>
>> Note the "foo9" exact match. A workaround is to use \g9, of course,
>> but the perlre man page says: "C<\1> through C<\9> are always
>> interpreted as backreferences".
>>
>> (The change breaks the latex2html package, btw.)
>
> Thanks for the report. I agree this is a bug. I am looking into a fix.

I ended up pushing the following:

f1e1b256c5c1773d90e828cca6323c53fa23391b

which makes multidigit backslash escapes illegal when they start with
8 or 9 and are larger than the number of capture buffers in the
string.

IOW, /\87/ is a fatal error and not /\x{00}87/ nor /87/ with a
warning. My rationale for this is we have two precedents to consider:

a) a case like /\9/ where we would die with an error about a
backreference to a non-existent buffer.
b) a case like "\9" where we would warn, and then treat the escape as "9".

IMO the precedent for the regex wins over the precedent of the double
quoted string.

The rules for handling backreferences are pretty arcane. \118 could
mean the 118th capture buffer, if it exists, or it could mean
"\x{09}8". In other words not only do we change the base we interpret
it in, we also change the number of digits we consider part of the
escape!

This patch does not change this behavior, and affects only escapes
starting with an 8 or 9 as they have no reasonable interpretation as
octal,  but do have reasonable interpretations as back references.

I personally think maybe we should warn on something like \118, but i
leave that debate for another day.

cheers,
Yves
ps: Karl too worked on a fix for this, but i got mine wrapped up a bit
quicker. He may push follow up patches.


--
perl -Mre=debug -e "/just|another|perl|hacker/"

Thread Previous | Thread Next


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About