On Sat, Dec 04, 2010 at 10:18:19AM -0700, karl williamson wrote: > I realized as I got further into the design that there were some > unstated things about what I'm proposing. So here is a complete > statement, AFAIK: > > Using /a will have the following effects: > 1) \s, \d, \w will match only the appropriate ASCII characters > 2) [:posix:] will match only (the appropriate) ASCII characters > 3) /i of ASCII characters will match only ASCII characters. eg. the > Kelvin sign will not match 'k' > 4) /i of non-ASCII characters will obey Unicode semantics, eg, a capital > and lower case Greek beta will match, as will the Angstrom sign and an A > with a circle above. > 5) \p{} will match in the full Unicode range, so that \p{Nd} will match > many more characters than the 10 matched by \d. > 6) All of the above is true as well on EBCDIC platforms whose native > character set is Latin1. ie. under /a they would behave identically as > an ASCII platform would. I'm confused by 3). Considering that the Kelvin sign isn't ASCII, I'm not sure what you mean by this. And to clearify 1), you mean that: \s matches \x09 (CHARACTER TABULATION), \x0A (LINE FEED), \x0C (FORM FEED), \x0D (CARRIAGE RETURN), and \x20 (SPACE), with \x0B (LINE TABULATION) not included? \x0B is a rare enough character that I don't care much either way, but since it was never included, it's probably shouldn't now. Does your proposal also say something about locales? Personally, I think that a /a should imply that locales are ignored. Other then that, I fully endorse the proposal. AbigailThread Previous | Thread Next