develooper Front page | perl.perl5.porters | Postings from December 2010

Re: RFC: Restatement of /a regex proposal

Thread Previous | Thread Next
From:
karl williamson
Date:
December 8, 2010 20:24
Subject:
Re: RFC: Restatement of /a regex proposal
Message ID:
4D005993.70309@khwilliamson.com
Abigail wrote:
> On Sun, Dec 05, 2010 at 10:01:03PM -0700, karl williamson wrote:
>> Another wrinkle.  In looking through the code I identified several more  
>> possible things that might ought to be restricted to ASCII by /a.  Does  
>> anyone have an opinion on these?:
>>
>> \h
>>
>> \v
>>
>> \R
>>
>> \X
> 
> 
> I have an opinion.
> 
> I hardly see any code using \h, \v, \R and \X, and if it's used,
> it's seldomly *mis*used to mean just the ASCII subset of their 
> meaning. They are new enough to never had a pre-5.6 meaning which
> is still present in books and documentation. There has never been
> an opportunity for people to make mistakes as with \d, \w and \s.
> 
> I see /a as a way to correct (or revert) the changes introduced in
> 5.6. As there's no need to revert \h, \v, \R, I rather not see their
> meaning change under /a.
> 
> As for \X, I think /\X/a having match only ASCII characters is rather
> pointless. For the same reason, I don't think /\C/a should match a
> different set of characters than /\C/ does.
> 
> 
> Deep down, I really only care about \d, \D, \w, and \W. [[:posix:]] 
> I see so infrequently used, that while I think it's nice to be fixed,
> it doesn't bother me that much. And \s matching outside of the ASCII
> range usually doesn't lead to potential problems.
> 
> 
> 
> Abigail
> 

I have come to the opinion that /a should not apply to prohibiting an 
ASCII-range character matching /i with a non-ASCII range character. 
That may be a worthwhile thing to implement, and I'm willing to do it, 
but I think it should be separable from /a; so needs another flag.

My reasoning is that I think /a should be something we could reasonably 
recommend people set up by default for those who aren't heavily into 
Unicode (for those who are, /u should be the recommended default.) 
Perhaps /a should even be the selected by 'use 5.14'.  And I think the 
/i change makes that less desirable; I'm struggling to put why I think 
so into words, perhaps someone can help me out.  Just now I reread the 
entire set of threads on this topic, and I think Abigail expressed it 
fairly well "'m afraid that if we put too much functionality in /a, we 
end up with something no one is actually going to use - because it will 
do something unintended. And that something will be different for everyone."

Any way, I'm essentially in accord with Abigail.  I didn't think \X 
should be affected by /a; but I thought someone might make a case for 
it, so I mentioned it.

I do think it should affect \w and \d.  I could go either way on \s; 
perhaps the deciding factor would be ease of user's remembering it's 
effect.  Similarly with \h and \v.

I also think it should restrict [[:posix:]], as previously proposed.  I 
do think they probably shouldn't ever match outside what the Posix 
standard says, but given that they do, it is reasonable for /a to also 
restrict that.

Thread Previous | Thread Next


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About