Karl wroteL > I realized as I got further into the design that there were some > unstated things about what I'm proposing. So here is a complete > statement, AFAIK: > Using /a will have the following effects: > 1) \s, \d, \w will match only the appropriate ASCII characters > 2) [:posix:] will match only (the appropriate) ASCII characters Your reference to POSIX reminds me that I'm not entirely sure how the /l or (?l) locale flag quite works out. The /a would override a use re /l or /u that was in scope, right? I guess the only forbidden thing would be to try to specify more than one of those in the same pattern or use re declaration. Is that so, or have I misunderstood the way /a and /l and /u are envisioned to behave? > 3) /i of ASCII characters will match only ASCII characters. > eg. the Kelvin sign will not match 'k' > 4) /i of non-ASCII characters will obey Unicode semantics, eg, a > capital and lower case Greek beta will match, as will the Angstrom > sign and an A with a circle above. > 5) \p{} will match in the full Unicode range, so that \p{Nd} will > match many more characters than the 10 matched by \d. > 6) All of the above is true as well on EBCDIC platforms whose native > character set is Latin1. ie. under /a they would behave identically > as an ASCII platform would. I no longer recall enough about EBCDIC to say anything about it at all. My last experience may have been thirty years ago with the Sperry UNIVAC, where we just as often packed up six 6-bit RAD-50 characters into one 36-bit word. Or maybe that was for the DEC machines? Possibly both. Except for those muddying points above clarify, I believe that all makes good sense, that it is desirable, and probably also that it is necessary. I can report that I am unfond of the typed-strings and the typed- patterns that you need to use in some of the other languages, especially Python, where things blow up with an exception if you ever apply the wrong type of pattern to the wrong type of string. It's very annoying to forever have to hold in your mind which flavor you are or are not using. It's like all the many kinds of pointers in C++'s Boost library: no thanks! Java's (?u) flag in patterns does nothing more than enabling Unicode case matching, and then only in conjunction with (?i), so you usually see in written (?iu) if they don't use the flags argument to Pattern.compile. (?iu) makes things like 017F LATIN SMALL LETTER LONG S match "s" or "S", and it works both way, so Pattern.compile("(?iu)s").matcher("\u017F").find() returns true, as does Pattern.compile("(?iu)\\u017F").matcher("s").find() --tomThread Previous | Thread Next