develooper Front page | perl.perl5.porters | Postings from November 2016

Re: [perl #129803] Regexp syntax check when variables contained with\Q...\E

Thread Previous | Thread Next
From:
Abigail
Date:
November 11, 2016 10:36
Subject:
Re: [perl #129803] Regexp syntax check when variables contained with\Q...\E
Message ID:
20161111104001.GB1565@almanda.fritz.box
On Tue, Oct 04, 2016 at 07:29:05AM -0700, Ed Avis wrote:
> # New Ticket Created by  "Ed Avis" 
> # Please include the string:  [perl #129803]
> # in the subject line of all future correspondence about this issue. 
> # <URL: https://rt.perl.org/Ticket/Display.html?id=129803 >
> 
> 
> 
> This is a bug report for perl from eda@waniasset.com,
> generated with the help of perlbug 1.40 running under perl 5.22.2.
> 
> 
> -----------------------------------------------------------------
> [Please describe your issue here]
> 
> When possible regular expressions are compiled and checked at program compile time.
> This program gives an error even though the sub f() is never called:
> 
>     sub f { /(/ }
> 
> But when the regexp includes variables, this early checking cannot be done.
> The following code will only give an error if and when f() is called:
> 
>     $x = 'a';
>     sub f { /($x/ }
> 
> In general, this has to be so.  Perl can't know what the possible
> values of $x will be at run time.  If $x contains ')' then the regexp
> is well-formed.
> 
> However, when building a regexp you may choose to use the \Q...\E
> mechanism as a safer alternative to raw string interpolation.
> Whatever appears inside \Q...\E is effectively matched as a literal
> string, with regexp metacharacters like ) not having their usual
> effect.  (As an implementation detail, this might work by \Q...\E
> carefully escaping all such characters with backslashes before parsing
> the regexp, but from the user's point of view you can consider it a
> way to match literal text contained in a scalar.)
> 
> If the variable appears in the regexp protected by \Q...\E then the
> syntax checks can still happen as normal.
> 
>     sub f { /(\Q$x\E/ }
> 
> No matter the value of $x at run time, this will never be a valid
> regexp; it will always have an unbalanced ( at the start.  Perl could
> warn for this regexp at compile time just as it warns for /(/.
> 
> One possible way to do this would be to try making a munged version of
> the regexp, replacing all \Q...\E fragments of the regexp with the
> dummy construct (?:).  If the resulting munged regexp does not contain
> any variables, then it can be used as a compile-time check; the munged
> regexp will be syntactically valid if and only if the original regexp
> is syntactically valid for all possible uses.
> 
> There might be a better way to do this checking depending on
> implementation details of the regexp engine.


I don't see much of a benefit of this. You'd be adding an additional
compilation of a regexp at compile time, but then you have to throw
away the compiled result, as it's a different pattern than what would
really be there. The only win is that if you write an regexp with a
syntax error, you get an error earlier. But once you have fixed the
the error, you keep paying the speed penalty each time you run the
program. And if you don't make a mistake in the first place, you pay
the speed penalty.



Abigail

Thread Previous | Thread Next


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About