develooper Front page | perl.perl5.porters | Postings from October 2015

Re: [perl #116639] regex optimiser wrongly rejects certain matchesinvolving embedded comments

Thread Previous | Thread Next
From:
Abigail
Date:
October 19, 2015 14:26
Subject:
Re: [perl #116639] regex optimiser wrongly rejects certain matchesinvolving embedded comments
Message ID:
20151019142606.GA8552@almanda.fritz.box
On Mon, Oct 19, 2015 at 11:33:55AM +0200, demerphq wrote:
> On 19 October 2015 at 04:50, Ricardo Signes <perl.p5p@rjbs.manxome.org> wrote:
> > * Karl Williamson <public@khwilliamson.com> [2015-10-18T00:16:44]
> >> Just to make sure everyone understands.
> >>
> >> Currently (?#...) comments are allowed even when there is no /x.  We
> >> probably have to support that in the places where it's been that way all
> >> along, but we could decide to not support them in the places that I just
> >> added, when not under /x.  Thus, we could say that you can't split a
> >> quantifier from its atom except under /x.
> >
> > Thanks, I was confused.
> >
> >> I don't have an opinion on this.
> >
> > I'm not strongly opinionated on this, but:  I think that I would find it useful
> > to say:
> >
> >   If you want to put comments into a regular expression, you have two
> >   options.  You use /x and then insert space and any kind of comments between
> >   tokens, or you can skip /x and use (?#...) between tokens.
> >
> > That is: always allow (?#...) in those places where space and comments become
> > allowed under /x.
> 
> 
> I really dont like this. A) it complicates the regex engine, and B) it
> makes a mockery of what an expert would consider to be one token.
> 
> so for instance to *me*: a{1,10} is a single token.

To me, it isn't.

To Perl, it isn't either, as 

    /a {1,10}/x

matches up to 10 a's:

    $ perl -wE '"aaaaaaaaaa" =~ /a {1,10}/x; say $& // "UNDEF"'
    aaaaaaaaaa
    $

And it's not that /x just removes any whitespace it finds, as it
doesn't remove the whitespace inside the braces:

    $ perl -wE '"aaaaaaaaaa" =~ /a { 1,10}/x; say $& // "UNDEF"'
    UNDEF
    $

(In this case, it tries to match the literal string "a{1,10}" -- one
can wonder whether that's the most useful thing it could do).


Compiling /a {1,10}/x shows the tokes perl think there are:

   1: CURLY {1,10} (5)
   3:   EXACT <a> (0)
   5: END (0)


> 
> a(?#whatever){1,10}
> 
> is two "tokens".
> 


I don't see any benefit of suddenly disallowing a space (under /x)
between 'a' and '{1,10}', nor to allow space there, but not a
comment. I don't see myself easily use an inline comment there,
but I also don't think Perl should enforce a particular coding
style upon its users. That just makes the language less flexible,
and prohibits creativity.



Abigail

Thread Previous | Thread Next


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About