On 06/27/2011 05:01 AM, Zefram wrote: > Karl Williamson wrote: >> Currently, under locale, the user is warranting that the strings are >> correctly encoded in the specified locale. > ... >> under utf8 locales, which are currently documented as >> not working, the regex engine and the casing functions would assume that >> their strings were properly Unicode-encoded. > > So you're proposing that the meaning of "use locale", with respect to the > expected encoding of strings, be completely different for UTF-8 locales > from what it is for other locales. I oppose this. If the programmer is > working with strings in native Unicode form, ey should declare this with > "use feature 'unicode_strings'" or equivalent, not with "use locale". > > -zefram > It appears to me that you've got it completely backwards. In all cases, the programmer is warranting that the string is correctly encoded in the specified locale. It's just that UTF-8 locales ARE in native Unicode form. The expected encoding for it is its encoding, just as the expected coding for any other locale is its encoding. I think you're throwing red herrings at this proposal; I don't know how to explain it more clearly. Right now the programmer has a choice: 1) to manipulate strings properly with those locales by using the unicode_strings feature; or 2) to get proper LC_TIME, etc. handling by using locale. The programmer cannot currently get both. To get them the ability to do both, simplest to implement is the :locale layer which converts all I/O so that internally things are native Unicode, and "use locale 'NO_CTYPE'" which divorces LC_CTYPE from the rest of locale handling, so that those remain, but native Unicode is used for string operations.Thread Previous | Thread Next