Karl Williamson wrote: >It's a lot of work to handle multi-byte locales in general, but Perl >already knows how to handle Unicode utf8. This leads to my proposal: If >under "use locale", a locale name ends in '.utf8', then Perl treats it >for purposes of cytpe-only as regular Unicode. This sounds wrong: it'll be a source of double-encoding bugs. Locale-encoded text input will, for a UTF-8 locale, be a sequence of octets obeying UTF-8 syntactic rules. If you treat those octets as Unicode characters, using Perl's aliasing of octets to characters U+00 to U+ff, then they'll look like very strange character sequences (with lots of C1 controls), on which case folding (for example) won't give locale-correct results. Outputting Unicode text will often not generate correct locale-encoded text output. We should discourage the use of locale-encoded strings within Perl space. We should encourage decoding on input, encoding on output, and using native Unicode representation in the middle. To this end, there should be a PerlIO layer :locale, which {de,en}codes according to the locale's preferred encoding. The locale's encoding may perfectly well be UTF-8, and in *this* context we can handle it in an entirely regular manner, on a par with ISO-8859-*. -zeframThread Previous | Thread Next