develooper Front page | perl.perl5.porters | Postings from May 2016

Re: [perl #128213] No deprecation warning on literal left curlybracket in /.{/ etc

Thread Previous | Thread Next
From:
demerphq
Date:
May 25, 2016 04:04
Subject:
Re: [perl #128213] No deprecation warning on literal left curlybracket in /.{/ etc
Message ID:
CANgJU+VzKkq9i2MfhmUpkGQqgxxA-s86_TmRyLjLoEZjwZxJOw@mail.gmail.com
On 25 May 2016 at 05:31, Zefram <zefram@fysh.org> wrote:
> demerphq wrote:
>>My second reaction to this is that without a quantifier or alternation
>>(?:) is semantically invisible, and should that make a difference?
>
> It should indeed be semantically invisible, but it is *syntactically*
> visible, and yes, that does and must make a difference.  /(?:ab)*/
> is not the same thing as /ab*/.

Yes I know. (Hence why I said "without a quantifier or alternation")

>>So I would expect /^*/ and /(?:^)*/ to produce the same compiled
>>optree, and the same errors.
>
> If they're syntactically legal (as they currently are), they should
> indeed produce the same optree, and they currently do.  However, it's not
> essential for them to have the same syntactic legality.

No its not essential. But IMO it is prima facie *saner*.

> Hypothetically,
> if we were to forbid applying a quantifier directly to /^/, we'd be
> giving /^/ the same syntactic status as an already-quantified term,
> such as /a{3}/.  It's not legal to apply another quantifier to /a{3}/;
> /a{3}{4}/ will generate a "nested quantifiers" error, and /^{4}/ could
> similarly give an error.  But it's perfectly legal to *semantically*
> nest quantifiers; you just need some extra grouping to get past the
> syntactic hurdle.  /(?:a{3}){4}/ is legal and works, and in the same
> vein /(?:^){4}/ would have to be legal.

Well, I dunno. This is isnt as obvious to me as it seems to be to you.
I could easily see us making /a{3}{4}/ legal and equivalent to
/(?:a{3}){4}/ (I dont see why it should be illegal, except for
implementation convenience). And similarly I could easily see us
making /(?:^){4}/ be treated the same as /^{4}/.

>
>>guess is the difference between erroring at parse time, versus
>>optimization time.
>
> No, I would expect /(?:^){4}/ not to error at all.  The optimiser would
> be free to reduce it to a single SBOL op, just as it is already free to
> do that with /^{4}/.

I think that "No" is superfluous. :-) My point is that if we error at
optimisation time then we have compiled opcodes to look at, and we
would have no way to distinguish /(?:^)*/ from /^*/, but we also
wouldn't have to worry about weird edge cases like comments or
whitespace in between the assertion and the quantifier.

Perhaps I should mention that with the way the regex engine is
implemented you have two options, do things at parse time when you
dont have an opcodes to work with, or do it at optimisation time when
you do. (I dislike calling this an "optree" as "Railway Normal Form"
really isn't very tree like).

Note that some of your arguments seem to predicated on the assumption
that if we allow /(?:^foo)+/ we have to allow /(?:^)+/, which isn't so
obvious to me.

>>Historically I have done everything I could to get us out of special
>>casing things based on the spelling of the pattern[1], and to instead
>>use the optree representation instead, so doing the opposite here
>>feels wrong. Although I am open minded about it.
>
> There are some things that should be based on the parse tree of the
> pattern, and others that should be based on the optree.  The acceptability
> of directly applying a quantifier is a syntactic matter that should be
> based on the parse tree.

There is no parse tree in the regex engine. :-(

Adding one is on my todo list. :-)

> The parse tree does not include whitespace
> that is insignificant due to /x.

The opcodes produced do not include the whitespace.

> Semantically, a quantifier is *always*
> acceptable, so the optree is not a concern here.  The optimiser may have
> things it can usefully do with quantifiers, but those should be limited
> to optimisation.  The optimiser should work entirely on the optree.

Again there is no optree in the regex engine. Not in the classic sense
of the term optree. We go directly from parse to compiled opcodes, and
then we optimise the opcodes. IOW, the compiled program and the
"optree" are one and the same.

Anyway, I will have to think about this a bit, I think I am generally
with you but not everything is as clear to me as it seems to be to
you. (For those reading along it might be worth noting that one needs
to consider more types of assertion than simply "start of line" or
"end of line".)

No doubt in a day or two I will say something like "I get Zefram's
point now, and we should do what he suggests".  I do appreciate you
taking the time to explain!

Cheers!
Yves

-- 
perl -Mre=debug -e "/just|another|perl|hacker/"

Thread Previous | Thread Next


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About