2009/10/5 Tom Christiansen <tchrist@perl.com>: >>\w should be its historical meaning > > Careful: wouldn't historical meaning include > locales, wherein \w would also include (for example) > é and ç in French, ñ in Spanish, ß in German, > and ð and þ in Icelandic? And didn't we already > find that locale-shifting char classes made > life really hard on the regex engine (at least)? use locale is in some respects broken by qr//, as it doesnt use regex flags and depends on the context it is compiled within. So for instance, if you use local and the have a sub return a qr// compiled regex and then use that object alone in a match anywhere that you pass it it will match using the semantics of the locale in effect when it is matched. If the qr// is inserted in another pattern the localeness of the pattern is destroyed. In short qr// results compiled under use locale have different results depending on how they are used. These regexes are also much slower than ones not compiled under locale as they have to do a lot more run time comparisons to check if they match. > I don't know whether this is harder on it than > it already suffers under the Unicode vs bytes > shifts in behavior, but both seem problematic > to an annoying degree. Locale regexes are irritating because you cant precompute them. They are defined to change based on your environment which can change in between compilation and execution of the regex. So you delay a lot of stuff that could be precomputed to inside of the regex matching loop. > This is why my test program was tricked into > thinking \s suddenly started matching VT like > \v does, despite decades of historical precedent. > I'd forced it into Unicode mode. :( And this is why we really really want \w and \s and \d to match the traditional thing, even if this means requiring people add something to older scripts to support the legacy behaviour. You cant tell what a pattern does by looking at it, you have to know the internal bit flags of the string involved. Yves -- perl -Mre=debug -e "/just|another|perl|hacker/"Thread Previous | Thread Next