develooper Front page | perl.perl5.porters | Postings from May 2016

Re: [perl #128213] No deprecation warning on literal left curlybracket in /.{/ etc

Thread Previous | Thread Next
Karl Williamson
May 23, 2016 19:56
Re: [perl #128213] No deprecation warning on literal left curlybracket in /.{/ etc
Message ID:
On 05/23/2016 04:51 AM, Tom Wyant via RT wrote:
> $ perl -c -Mre=debug -e '/\d{/'
> Compiling REx "\d{"
> Unescaped left brace in regex is deprecated, passed through in regex; marked by <-- HERE in m/\d{ <-- HERE / at -e line 1.
> Final program:
>     1: POSIXU[\d] (2)
>     2: EXACT <{> (4)
>     4: END (0)
> anchored "{" at 1 (checking anchored) stclass POSIXU[\d] minlen 2
> -e syntax OK
> Freeing REx: "\d{"
> and similarly for \D, \s, \S, \v, \V, \N, \h. Which is why I expected a warning (5.24.0) or error (5.25.1) on /.{/.
> On the other hand, the comment seems to say none of these should warn. The 'Quantifiers' section of the Perl 5.24.0 and 5.25.1 perlre seems to say that all left curlys not part of a quantifier are literals and deprecated, but omits the "first character in the pattern" exception that appears in the 5.25.1 perldelta. None of which seems to help, thoug. :-(
> ---
> via perlbug:  queue: perl5 status: open

Thank you for finding this. If you hadn't found this now, there would 
have been much bigger problems later on.

This situation is the result of oversights on my part.

The goal of the deprecation is two-fold:
    1) to allow us to extend the language;
    2) to catch typos in {m,n} quantifiers that currently silently 
compile into something unintended.

As an example of item 2), before 5.22, if you say qr/a{3, 4}/, the blank 
makes it not a quantifier, and the result is to match the exact sequence 
"a{3, 4}".  That may or may not be what you wanted.  It's quite possible 
that you meant to match "a" 3 or 4 times and didn't realize that blanks 
are not allowed in a quantifier.  But Perl can't read your mind, and so 
currently has to assume you meant the only legal interpretation.  It 
would also be nice to also allow spaces in quantifiers, or to make the 
lower bound optional.  None of those can be done today.

All the other extensions that are envisioned involve using backslash 
sequences such as qr/\w{latin, greek}/.  This extension would mean to 
match \w, but only in the latin or greek scripts.  This might be useful 
in something that parses mathematical equations but wants to exclude 
look-alike characters from other scripts, or for a web server that 
wants to exclude look-alike malicious addresses.  paypal, for example, 
can be written almost entirely in cyrillic, redirecting the unwary to a 
scam website.

However, { is often used in patterns to mean a literal left brace. 
Making this change will disrupt existing code.  See
At the time we made the decision to go ahead with this change, during 
the development of 5.15, it was deemed worth the breakage.

But another goal would be to minimize this disruption.  So I tried to 
raise the warning only where the left brace could, with our plans, mean 
something other than a literal left brace.  It turns out that there is 
lots of code like qr/{..../.  That '{', since it's the first thing, can 
only be a left brace.  And similarly in qr/^{.../ or qr/ ... ({...) 
.../, it can only be a left brace.  So there is no need to disturb code 
where there is not going to be ambiguity.  And this cuts down the amount 
of disruption significantly.

But what you have found is that I missed some cases where it should have 
been deprecated.  qr/.{3, 4}/ could have been meant to be a quantifier, 
and should have warned.

We can't just forbid things like that without a deprecation cycle.  I 
intend to add the deprecation message for the missing cases shortly.  I 
believe these all involve '{' being used in quantifiers.  That means 
there can't be any other changes to quantifier handling until probably 5.30.

I think that 5.26 should go out with the current fatal error for the 
contexts where we previously warned, and 5.28 is free to re-purpose the 
'{' uses in these contexts.

And, I want to make the rule simple to follow.  I think it's better to 
simply say that unescaped literal '{' uses are deprecated, and then add 
a caveat that this is enforced only where there is ambiguity of intent. 
  The current text can be improved in that regard.

And then where's the cut off?  qr/.{/ could not have been meant to be a 
quantifier of '.'.  But what about qr/.{3/?  That could be meant to be a 
quantifier.  We could use an edit distance calculation to see how far 
away from a legal quantifier the text is, but I don't see the need for 
it.  I think qr/.{/ should be deprecated, same as qr/.{3, 4}/.

Thread Previous | Thread Next Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at | Group listing | About