develooper Front page | perl.perl5.porters | Postings from September 2013

Re: [perl #119793] /x on \Q#foo\E doesn't match '#foo', # becomesspecial

Thread Previous | Thread Next
From:
Nicholas Clark
Date:
September 18, 2013 14:56
Subject:
Re: [perl #119793] /x on \Q#foo\E doesn't match '#foo', # becomesspecial
Message ID:
20130918145619.GE66035@plum.flirble.org
On Sun, Sep 15, 2013 at 12:44:18AM -0700, bulk88 wrote:

> Following perlre, I used \Q\E to escape things from /x.
> The \Q\E line in perlre was added in June 2006 in
> http://perl5.git.perl.org/perl.git/commitdiff/1031e5dba2bc40203b5942f84d3d2bc335470dba .
> 
> _______________________________________________
> $_ = '# These definitions are from config.sh (via C:\p519\src\lib/Config.pm).';
> if(/\Q# These definitions are from config.sh (via \E/x) {print 1;}
> else {print 0;}
> ______________________________________________
> With /x modifier, it prints 0, without prints 1. The bug is the "#" is still special after \Q\E, but only under /x, and doesn't become a normal dead character. This contradicts the perlre suggestion. It also contradicts that \Q\E are supposed to be the same as quotemeta() according to jamesw on irc. This line from perlfunc supports that in bulk88's opinion http://perl5.git.perl.org/perl.git/blob/HEAD:/pod/perlfunc.pod#l5334 .
> 
> ______________________________________________
> $_ = '# These definitions are from config.sh (via C:\p519\src\lib/Config.pm).';
> my $x = quotemeta;
> print /$x/x;
> ______________________________________________
> prints 1.

Nice bug. I'm not sure if the behaviour is as simple as # being a "dead"
character or not. If differs depending on whether \E is present:

$ perl -e 'warn "a# " =~ /\Qa# \E/ ? "M" : "."'
M at -e line 1.
$ perl -e 'warn "a# " =~ /\Qa# / ? "M" : "."'
M at -e line 1.
$ perl -e 'warn "a# " =~ /\Qa# \E/x ? "M" : "."'
. at -e line 1.
$ perl -e 'warn "a# " =~ /\Qa# /x ? "M" : "."'
M at -e line 1.

It seems to be have happened between 5.000 and 5.001

$ ./perl -e 'warn "a# " =~ /\Qa# \E/ ? "M" : "."'
M at -e line 1.
$ ./perl -e 'warn "a# " =~ /\Qa# / ? "M" : "."'
M at -e line 1.
$ ./perl -e 'warn "a# " =~ /\Qa# \E/x ? "M" : "."'
. at -e line 1.
$ ./perl -e 'warn "a# " =~ /\Qa# /x ? "M" : "."'
M at -e line 1.

bisect.pl --target miniperl --start=perl-5.000 --end=perl-5.001 -e 'if ("a#b" !~ /\Qa#b\E/x) { exit 1 }'

reports:

commit 748a93069b3d16374a9859d1456065dd3ae11394
Author: Larry Wall <lwall@netlabs.com>
Date:   Sun Mar 12 22:32:14 1995 -0800
    Perl 5.001
    [See the Changes file for a list of changes]


The symptom may relate to the code referred to by this entry in its Changes:

+NETaa13369: # is now a comment character, and \# should be left for regcomp.
+From: Simon Parsons
+Files patched: toke.c
+ It was not skipping the comment when it skipped the white space, and construct
+ an opcode that tried to match a null string.  Unfortunately, the previous
+ star tried to use the first character of the null string to optimize where
+ to recurse, so it never matched.


but the fact that even back then it differs depending on \E being present or
implict makes me think that it's actually a bug somewhere else.

And -Dr suggests something far more screwed up:

$ ./perl -Dr -e 'warn "a# " =~ /\Qa# \E/x ? "M" : "."'
Compiling REx "a\#\ \\E"
rarest char \ at 3
Final program:
   1: EXACT <a# \\E> (4)
   4: END (0)
anchored "a# \E" at 0 (checking anchored isall) minlen 5 
Enabling $` $& $' support (0x7).

EXECUTING...

String shorter than min possible regex match (3 < 5)
. at -e line 1.
Freeing REx: "a\#\ \\E"


It seems that the trailing \E is being left in the string, and then ending up
as something which the engine attempts to match against.

Looks like 5.001 makes exactly the same mistake:

$ ./miniperl -Dr -e 'warn "a# " =~ /\Qa# \E/x ? "M" : "."'
rarest char \ at 3
first 14 next 97 offset 0
 1:BRANCH(15)
 5:EXACTLY(15) <a# \E>
15:END(0)
start `a# \E' minlen 5 

EXECUTING...

. at -e line 1.
$ ./miniperl -Dr -e 'warn "a# " =~ /\Qa# \E/ ? "M" : "."'
rarest char # at 1
first 14 next 97 offset 0
 1:BRANCH(13)
 5:EXACTLY(13) <a# >
13:END(0)
start `a# ' minlen 3 

EXECUTING...

 1:BRANCH       <a# >
 5:EXACTLY      <a# >
13:END          <>
M at -e line 1.
$ ./miniperl -Dr -e 'warn "a# " =~ /\Qa# /x ? "M" : "."'
rarest char # at 1
first 14 next 97 offset 0
 1:BRANCH(13)
 5:EXACTLY(13) <a# >
13:END(0)
start `a# ' minlen 3 

EXECUTING...

 1:BRANCH       <a# >
 5:EXACTLY      <a# >
13:END          <>
M at -e line 1.

> I don't know what the correct behavior here, and if there is a doc problem here or a regexp bug or what but something has to change. Tested with Perl 5.12.3 and Perl 5.19.4.


I think that the implementation is at fault here. In that, conceptually,
\Q\E processing is meant to be an earlier step than comment stripping,
hence \Q\E should apply to comments too.

Nicholas Clark

Thread Previous | Thread Next


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About