On 27 January 2012 15:40, l.mai@web.de <perlbug-followup@perl.org> wrote: > # New Ticket Created by l.mai@web.de > # Please include the string: [perl #109206] > # in the subject line of all future correspondence about this issue. > # <URL: https://rt.perl.org:443/rt3/Ticket/Display.html?id=109206 > > > > > This is a bug report for perl from l.mai@web.de, > generated with the help of perlbug 1.39 running under perl 5.14.2. > > > ----------------------------------------------------------------- > [Please describe your issue here] > > % perl -wle '$_ = "\n"; print $+[0] while /[^\n]*/g' > 0 > 1 > > % perl -wle '$_ = "\n"; print $+[0] while /.*/g' > 0 > > I think this is a bug because in the absence of /s '.' should match any > character except newline, i.e. be equivalent to '[^\n]'. The two programs > should produce identical output. > > I also think the first result is correct because there are two zero-length > matches in "\n", one at the beginning of the string and one at the end. > In conclusion: it looks like /.*/g is broken. This problem is caused by a broken optimisation. The ANCH_MBOL optmisation. Notice it the principle difference in these two outputs: $ perl -Mre=Debug,DUMP -wle '$_ = "\n"; print pos($_),":",$+[0] while /.*/g' Compiling REx ".*" Final program: 1: STAR (3) 2: REG_ANY (0) 3: END (0) anchored(MBOL) implicit minlen 0 0:0 Freeing REx: ".*" $ perl -Mre=Debug,DUMP -wle '$_ = "\n"; print pos($_),":",$+[0] while /[^\n]*/g' Compiling REx "[^\n]*" Final program: 1: STAR (13) 2: ANYOF[\0-\11\13-\377][{unicode_all}] (0) 13: END (0) minlen 0 0:0 1:1 Freeing REx: "[^\n]*" It is enabled by this block of code in regcomp. Notice the comment: /* turn .* into ^.* with an implied $*=1 */ I have to admit I have not checked to see what the heck $*=1 means. else if ((!sawopen || !RExC_sawback) && (OP(first) == STAR && PL_regkind[OP(NEXTOPER(first))] == REG_ANY) && !(r->extflags & RXf_ANCH) && !(RExC_seen & REG_SEEN_EVAL)) { /* turn .* into ^.* with an implied $*=1 */ const int type = (OP(NEXTOPER(first)) == REG_ANY) ? RXf_ANCH_MBOL : RXf_ANCH_SBOL; r->extflags |= type; r->intflags |= PREGf_IMPLICIT; first = NEXTOPER(first); goto again; } The following patch disables the optimization: diff --git a/regcomp.c b/regcomp.c index 668f8f7..12d0ac0 100644 --- a/regcomp.c +++ b/regcomp.c @@ -5235,7 +5235,7 @@ reStudy: first = NEXTOPER(first); goto again; } - else if ((!sawopen || !RExC_sawback) && + else if (0 && (!sawopen || !RExC_sawback) && (OP(first) == STAR && PL_regkind[OP(NEXTOPER(first))] == REG_ANY) && !(r->extflags & RXf_ANCH) && !(RExC_seen & REG_SEEN_EVAL)) Producing this output: $ ./perl -Ilib -Mre=Debug,DUMP -wle '$_ = "\n"; print pos($_),":",$+[0] while /.*/g' Compiling REx ".*" Final program: 1: STAR (3) 2: REG_ANY (0) 3: END (0) minlen 0 0:0 1:1 Freeing REx: ".*" I have not committed this patch as I dont know what effects it might have, however as it is a "conversion optimization" I would assume it can be safely disabled until the underlying logic is fixed. However I will note that fixing it might be tricky, the relevent code is spread out over pp_hot.c and CALLREG_INTUIT_START(), and is particularly hairy anyway. It always makes me kinda cringe when I look at pp_match. Yves -- perl -Mre=debug -e "/just|another|perl|hacker/"Thread Previous | Thread Next