develooper Front page | perl.perl5.porters | Postings from December 2010

Re: RFC: Restatement of /a regex proposal

Thread Previous | Thread Next
December 4, 2010 13:43
Re: RFC: Restatement of /a regex proposal
Message ID:
On Sat, Dec 04, 2010 at 10:18:19AM -0700, karl williamson wrote:
> I realized as I got further into the design that there were some  
> unstated things about what I'm proposing.  So here is a complete  
> statement, AFAIK:
> Using /a will have the following effects:
> 1) \s, \d, \w will match only the appropriate ASCII characters
> 2) [:posix:] will match only (the appropriate) ASCII characters
> 3) /i of ASCII characters will match only ASCII characters.  eg. the  
> Kelvin sign will not match 'k'
> 4) /i of non-ASCII characters will obey Unicode semantics, eg, a capital  
> and lower case Greek beta will match, as will the Angstrom sign and an A  
> with a circle above.
> 5) \p{} will match in the full Unicode range, so that \p{Nd} will match  
> many more characters than the 10 matched by \d.
> 6) All of the above is true as well on EBCDIC platforms whose native  
> character set is Latin1. ie. under /a they would behave identically as  
> an ASCII platform would.

I'm confused by 3). Considering that the Kelvin sign isn't ASCII, I'm
not sure what you mean by this.

And to clearify 1), you mean that:

  \s matches \x09 (CHARACTER TABULATION), \x0A (LINE FEED), 
             \x0C (FORM FEED), \x0D (CARRIAGE RETURN), and
             \x20 (SPACE), with \x0B (LINE TABULATION) not included?

\x0B is a rare enough character that I don't care much either way, but
since it was never included, it's probably shouldn't now.

Does your proposal also say something about locales? Personally, I 
think that a /a should imply that locales are ignored.

Other then that, I fully endorse the proposal.


Thread Previous | Thread Next Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at | Group listing | About