develooper Front page | perl.perl5.porters | Postings from January 2012

Re: [perl #109206] regexes: . different from [^\n]

Thread Previous | Thread Next
From:
demerphq
Date:
January 27, 2012 07:33
Subject:
Re: [perl #109206] regexes: . different from [^\n]
Message ID:
CANgJU+WPFu_-ORm05+PohE2P8tBVywPCKQQg2GmAqKP+JY6GLg@mail.gmail.com
On 27 January 2012 15:40, l.mai@web.de <perlbug-followup@perl.org> wrote:
> # New Ticket Created by  l.mai@web.de
> # Please include the string:  [perl #109206]
> # in the subject line of all future correspondence about this issue.
> # <URL: https://rt.perl.org:443/rt3/Ticket/Display.html?id=109206 >
>
>
>
> This is a bug report for perl from l.mai@web.de,
> generated with the help of perlbug 1.39 running under perl 5.14.2.
>
>
> -----------------------------------------------------------------
> [Please describe your issue here]
>
> % perl -wle '$_ = "\n"; print $+[0] while /[^\n]*/g'
> 0
> 1
>
> % perl -wle '$_ = "\n"; print $+[0] while /.*/g'
> 0
>
> I think this is a bug because in the absence of /s '.' should match any
> character except newline, i.e. be equivalent to '[^\n]'. The two programs
> should produce identical output.
>
> I also think the first result is correct because there are two zero-length
> matches in "\n", one at the beginning of the string and one at the end.
> In conclusion: it looks like /.*/g is broken.

This problem is caused by a broken optimisation. The ANCH_MBOL
optmisation. Notice it the principle difference in these two outputs:

$ perl  -Mre=Debug,DUMP -wle '$_ = "\n"; print pos($_),":",$+[0] while /.*/g'
Compiling REx ".*"
Final program:
   1: STAR (3)
   2:   REG_ANY (0)
   3: END (0)
anchored(MBOL) implicit minlen 0
0:0
Freeing REx: ".*"

$ perl  -Mre=Debug,DUMP -wle '$_ = "\n"; print pos($_),":",$+[0] while
/[^\n]*/g'
Compiling REx "[^\n]*"
Final program:
   1: STAR (13)
   2:   ANYOF[\0-\11\13-\377][{unicode_all}] (0)
  13: END (0)
minlen 0
0:0
1:1
Freeing REx: "[^\n]*"

It is enabled by this block of code in regcomp. Notice the comment:

 /* turn .* into ^.* with an implied $*=1 */

I have to admit I have not checked to see what the heck $*=1 means.

        else if ((!sawopen || !RExC_sawback) &&
            (OP(first) == STAR &&
            PL_regkind[OP(NEXTOPER(first))] == REG_ANY) &&
            !(r->extflags & RXf_ANCH) && !(RExC_seen & REG_SEEN_EVAL))
        {
            /* turn .* into ^.* with an implied $*=1 */
            const int type =
                (OP(NEXTOPER(first)) == REG_ANY)
                    ? RXf_ANCH_MBOL
                    : RXf_ANCH_SBOL;
            r->extflags |= type;
            r->intflags |= PREGf_IMPLICIT;
            first = NEXTOPER(first);
            goto again;
        }

The following patch disables the optimization:

diff --git a/regcomp.c b/regcomp.c
index 668f8f7..12d0ac0 100644
--- a/regcomp.c
+++ b/regcomp.c
@@ -5235,7 +5235,7 @@ reStudy:
            first = NEXTOPER(first);
            goto again;
        }
-       else if ((!sawopen || !RExC_sawback) &&
+       else if (0 && (!sawopen || !RExC_sawback) &&
            (OP(first) == STAR &&
            PL_regkind[OP(NEXTOPER(first))] == REG_ANY) &&
            !(r->extflags & RXf_ANCH) && !(RExC_seen & REG_SEEN_EVAL))


Producing this output:
$ ./perl -Ilib -Mre=Debug,DUMP -wle '$_ = "\n"; print
pos($_),":",$+[0] while /.*/g'
Compiling REx ".*"
Final program:
   1: STAR (3)
   2:   REG_ANY (0)
   3: END (0)
minlen 0
0:0
1:1
Freeing REx: ".*"

I have not committed this patch as I dont know what effects it might
have, however as it is a "conversion optimization" I would assume it
can be safely disabled until the underlying logic is fixed. However I
will note that fixing it might be tricky, the relevent code is spread
out over pp_hot.c and CALLREG_INTUIT_START(), and is particularly
hairy anyway. It always makes me kinda cringe when I look at pp_match.

Yves



-- 
perl -Mre=debug -e "/just|another|perl|hacker/"

Thread Previous | Thread Next


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About