It turns out that Perl has two disparate methods for dealing with locale and utf8. One way is to lose localeness when a scalar is changed to utf8, so that whatever character is at ordinal X suddenly is assumed to be the character that Unicode thinks is at that position. I believe the injunctions in the documentation against mixing localeness and utf8 stem from this broken behavior. For a long time I thought that this was the only method that Perl used, and it bothered me, as being wrong. Eventually, I hit on a better solution, only to discover that in other places, Perl uses exactly what I had thought of. That method is to treat latin1-range characters as if they were in their locale, even if encoded in utf8; and to treat above latin1-range characters as Unicode. I propose to convert the code that doesn't use the second method to do so. This presents various backwards compatibility issues, as behavior will change; probably for the better, though. The biggest change is that no \p{} properties would apply to latin1 characters. If you think about it, that is how it should be, as we don't really know that 0x41 represents an alphabetic in the locale, for example. One should not be using Unicode properties in locales; instead one should be using the [:posix:] ones or \s, \w, \d. Nor should one be using \h, \v. I propose to output a warning when a \p{}, or \h, \v, \R, is used with locale, the warning would say that it only applies to code points above 255. Similarly, \N{} can't legitimately be used under locale for code points in the latin1 range. I propose to output a warning when this happens. The current behavior is demonstrably broken, as /\s/ uses the better approach, and /[\s]/ uses the worse approach. These changes would mean that locale and utf8 could work together, reasonably, for the first time.Thread Next