I haven't given up on this proposal. To refresh your memory, the proposal is for Perl to check if the current locale is a UTF-8 one, and if so, treat strings for LC_CTYPE purposes like strings normally are in Perl, and without looking at the actual locale data. This works because UTF-8 is an underlying Perl string data type. The original thread was at http://markmail.org/message/q4vorzd2xcxbm43y I reiterated this proposal in the discussion of https://rt.perl.org/rt3/Ticket/Display.html?id=117787 (which this would fix) and got no responses. I have a branch which has it mostly implemented, but the bitrot needs to be cleaned up. I have a further proposal. And that is to use, on machines that have it, wcsxfrm(), for LC_COLLATE. Unicode publishes high-quality POSIX locale definitions, and this would use them to avoid the need and slowdown from using Unicode::Collate for many cases. To summarize the proposal. When Perl does a locale-sensitive operation within the scope of 'use locale' it would check if the locale is a UTF-8 one or not. If not, it would behave as it currently does. Under a UTF-8 locale, for LC_CTYPE operations, it would behave as if it weren't under 'use locale'. Thus the LC_CTYPE operations within a UTF-8 locale are indistinguishable from non-locale operations. (This means there's not much to implement, as we are just using existing code paths for the most part.) For LC_COLLATE operations under UTF-8 locales, the wide character transform would be used on platforms where it is available. This is slower than the existing but gives much better results, as currently things just don't work at all under these locales, as Tom Christiansen has lamented. To be clear, Perl has never said it supports non-8bit locales, so this is an enhancement. But on Linux, at least, these days most of our users seem to be using these unsupported locales, so it seems right that we should support them, especially as the implementation cost is not high.Thread Previous | Thread Next