On Tue, Oct 13, 2015 at 03:42:29PM -0700, Karl Williamson via RT wrote: > I may have closed this prematurely. I had not read the extensive > commentary on this when I closed it, only the original report. So I > had forgotten the controversy over what should happen. > > To recap what has happened in blead: It turns out that no one > (including me) thought about nextchr()'s behavior when the pattern is > UTF-8 encoded. It did a simple ++ of the parse position, which is the > wrong thing to do when the character is a multi-byte character. It would > point to the 2nd byte of that, hence the tests it did after the increment > for white space under /x would fail for white space that was multi-byte. > When I tried to write tests after fixing that, I discovered that nothing > I came up with would reliably fail. And valgrind showed that there > reads outside the buffer of garbage data. That led to me fixing a > bunch of nextchr calls, and that led to making all such stuff uniform. > And that led to this bug being fixed. > > But do we really want a (?#...) comment between a character and its > quantifier? I'd vote yes. For consistency. See below. > quantifier? I can see both sides of the issue, so am now bringing it up > to discussion again. blead is now in a state where it would be easy to > add the ability to choose which places allow (?#...) and which forbid > it, but allow white space and regular # comments, both only under /x. > We could allow (?#...) only under /x in such cases if we choose. > It's easy to change it to do any of this, and I'm willing to do the work, > once a decision has been made as to what to do. > > My only stance on this is that I think (but am convince-able the other > way) that under /x, anywhere there is a # comment, should also allow a > (?#...) comment I agree. And I'd throw whitespace in it as well: anywhere where we ignore whitespace under /x, we should allow a # comment, and hence, should allow a (?#...) comment. Ignorable whitespace between a character and its quantifier(s) is allowed: $ perl -wE 'say "aa" =~ /^a + +$/x' 1 $ A comment there is also allowed: $ perl -wE 'say "aa" =~ /^a # Foo + # Bar +$/x' 1 $ If we have different rules for whitespace and (?#), it won't be easy to document properly, and it won't be easy to learn. AbigailThread Previous | Thread Next