develooper Front page | perl.perl5.porters | Postings from October 2015

Re: [perl #116639] regex optimiser wrongly rejects certain matchesinvolving embedded comments

Thread Previous | Thread Next
From:
Abigail
Date:
October 14, 2015 11:43
Subject:
Re: [perl #116639] regex optimiser wrongly rejects certain matchesinvolving embedded comments
Message ID:
20151014114305.GA21025@almanda.fritz.box
On Tue, Oct 13, 2015 at 03:42:29PM -0700, Karl Williamson via RT wrote:
> I may have closed this prematurely.  I had not read the extensive
> commentary on this when I closed it, only the original report.  So I
> had forgotten the controversy over what should happen.
> 
> To recap what has happened in blead:  It turns out that no one
> (including me) thought about nextchr()'s behavior when the pattern is
> UTF-8 encoded.  It did a simple ++ of the parse position, which is the
> wrong thing to do when the character is a multi-byte character.  It would
> point to the 2nd byte of that, hence the tests it did after the increment
> for white space under /x would fail for white space that was multi-byte.
> When I tried to write tests after fixing that, I discovered that nothing
> I came up with would reliably fail.  And valgrind showed that there
> reads outside the buffer of garbage data.  That led to me fixing a
> bunch of nextchr calls, and that led to making all such stuff uniform.
> And that led to this bug being fixed.
> 
> But do we really want a (?#...) comment between a character and its
> quantifier?

I'd vote yes. For consistency. See below.

> quantifier?  I can see both sides of the issue, so am now bringing it up
> to discussion again.  blead is now in a state where it would be easy to
> add the ability to choose which places allow (?#...) and which forbid
> it, but allow white space and regular # comments, both only under /x.
> We could allow (?#...) only under /x in such cases if we choose.
> It's easy to change it to do any of this, and I'm willing to do the work,
> once a decision has been made as to what to do.
> 
> My only stance on this is that I think (but am convince-able the other
> way) that under /x, anywhere there is a # comment, should also allow a
> (?#...) comment

I agree. And I'd throw whitespace in it as well: anywhere where we
ignore whitespace under /x, we should allow a # comment, and hence,
should allow a (?#...) comment.

Ignorable whitespace between a character and its quantifier(s) is allowed:

    $ perl -wE 'say "aa" =~ /^a + +$/x'
    1
    $

A comment there is also allowed:

    $ perl -wE 'say "aa" =~ /^a # Foo
    + # Bar
    +$/x'
    1
    $

If we have different rules for whitespace and (?#), it won't be easy
to document properly, and it won't be easy to learn.



Abigail

Thread Previous | Thread Next


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About