RFC: Restatement of /a regex proposal

karl williamson
December 4, 2010 09:18
RFC: Restatement of /a regex proposal
I realized as I got further into the design that there were some 
unstated things about what I'm proposing.  So here is a complete 
statement, AFAIK:

Using /a will have the following effects:
1) \s, \d, \w will match only the appropriate ASCII characters
2) [:posix:] will match only (the appropriate) ASCII characters
3) /i of ASCII characters will match only ASCII characters.  eg. the 
Kelvin sign will not match 'k'
4) /i of non-ASCII characters will obey Unicode semantics, eg, a capital 
and lower case Greek beta will match, as will the Angstrom sign and an A 
with a circle above.
5) \p{} will match in the full Unicode range, so that \p{Nd} will match 
many more characters than the 10 matched by \d.
6) All of the above is true as well on EBCDIC platforms whose native 
character set is Latin1. ie. under /a they would behave identically as 
an ASCII platform would.

