develooper Front page | perl.perl5.porters | Postings from January 2018

Re: RFC: deprecate literal \v in patterns except under /x; /[#]/xx

Thread Previous | Thread Next
From:
demerphq
Date:
January 1, 2018 17:43
Subject:
Re: RFC: deprecate literal \v in patterns except under /x; /[#]/xx
Message ID:
CANgJU+XMafv+w=Ww=18AsSHmsGh1Oe4mLXqyGXE67CL2QmgJyg@mail.gmail.com
On 1 January 2018 at 02:17, Father Chrysostomos <sprout@cpan.org> wrote:
> Karl Williamson wrote:
>> I am proposing two deprecations
>>
>> First, using literal vertical space, such as a form feed or new line, in
>> a regular expression pattern unless that pattern is /x.  My guess is
>> that this is extremely uncommon, and that just about all such
>> occurrences would be from forgetting the /x.  So deprecating this should
>> affect hardly anyone.
>
> What about generated code?

Indeed.

> Why introduce a discrepancy between dif-
> ferent quote-like operators with regard to vertical whitespace?

FWIW, I don't follow you here. The parser and the regex engine already
handles various escapes differently from the other quote operators, in
that expansion of various escapes is delayed so we can distinguish
\x{7c} from a raw |, for instance.

So at that level there is precedent for what Karl is asking for. We expect that:

  my $str="a\x{7c}b";
  s/$str/c/;

be equivalent to

  s/a|b/c;

and we expect

  s/a\x{7c}b/c/;

and

  my $str= "a\\x{7c}b";
  s/$str/c/;

to be equivalent to

  s/a\|b/c/;

So, with the exception of backwards-compatibility there is actually
precedent for what Karl wants to do.

On the other hand if I understand the proposal then I think it would
break any code that puts a raw newline in a constructed pattern.

This:

my $nl="\n";
my $str="foo" . $nl . "bar" . $nl;
s/$str/whatever/g;

should not warn, and I don't think the regex engine can tell this apart from

s/foo
bar
/whatever/g;

Maybe it can, in which case I would have less of an objection.

> Will
> quotemeta start quoting these characters so that we can do
> eval "/\Q$string\E/"?

Quotemeta already does quote these characters. FWIW it escapes the
following codepoints:

0 .. 47, 58 .. 64, 91 .. 94, 96, 123 .. 255, 847, 4447 .. 4448, 5760,
6068 .. 6069, 6155 .. 6158, 8192 .. 8254, 8257 .. 8275, 8277 .. 8303,
8592 .. 9311, 9472 .. 10101, 10132 .. 11263, 11776 .. 11903, 12288 ..
12291, 12296 .. 12320, 12336, 12644, 64830 .. 64831, 65024 .. 65039,
65093 .. 65094, 65279, 65440, 65520 .. 65528, 119155 .. 119162, 917504
.. 921599,

>> Second, in order to make the new /xx modifier more useful (and yes, this
>> should have been made experimental) I want to deprecate '#' occurring in
>> a bracketed character class.
>
> *Only* under /xx.  Do not forget that [x] is a common way of
> escaping x.

Agreed. I have seen people write [#] to escape a comment char, and
even recommend it.

I like the idea to reject this under /xx tho.

cheers,
Yves


-- 
perl -Mre=debug -e "/just|another|perl|hacker/"

Thread Previous | Thread Next


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About