On Wed, Mar 26, 2014 at 4:59 AM, Aristotle Pagaltzis <pagaltzis@gmx.de> wrote: > Using a raw handle is fine. > > Printing decoded strings to it is not. > > You are missing the encoding step. Thanks for the explanation. I'm probably beyond hope, but I appreciate the effort :-). Of course no such thing as a "decoded string" actually exists. Every decoding is another encoding. Simply by virtue of being represented in computer memory, strings have to be encoded in some way. Even if each character is represented by a 21-bit integer containing the value of its Unicode code point, that is a form of encoding. Various docs say Perl's internal encoding is UTF-8, but also say not to depend on that. I believed the former but not the latter. My bad. Moving on to what to do with perlbug for 5.20. The main reason to specify layers on all the handles in perlbug was to ensure that patches attached with the new -p option come through the wash ok even if they have multiple encodings in them. Using the :raw layer on both input and output seems to accomplish that and I think this part is a keeper. It's probably a misnomer to call it "unicode awareness"; it might be more proper to say we're making perlbug encoding-agnostic. Somewhat as an afterthought, it seemed like it might be nice if we could handle more than ASCII in the message body as well. We could spell people's names correctly, and pasted-in code samples and output from code samples might actually look as intended. Somehow I got it into my head that in the case of a prepared report supplied with the -f option (or by having the filename typed in response to a prompt) we could not be encoding-agnostic and would have to know the input encoding and convert it into a specified output encoding. I now think this whole idea was a mistake (even aside from my implementation mistakes) and we should scrap it, at least for now. Guessing the input encoding is the tricky part. I was attempting to use encoding::_get_locale_encoding(). Aside from being a private method of a deprecated pragma, it depends on the locale being set up properly and whatever program that created the file having observed the locale setting. As I understand it, pretty much no program on Windows will do that. On any platform, there is no reason to assume the report file was created on the same system as the one running perlbug. And if the file was created in a text editor, any number of editor defaults and/or user preferences could cause it to be in some encoding other than what the locale specifies. So I think we should stop pretending that we can reasonably guess the encoding and instead focus on passing things through without mangling them. I have pushed the branch craigb/perlbug_encoding_fixup which takes a stab at this and also a rather blind swing at the CRLF expectations for die-hard users of Notepad. I have not tested this branch at all and must urgently return to several other neglected obligations so I'm not sure when I'll be able to. But it's the best I can offer at the moment as an alternative path forward. P.S. If someone wants to write a robust general-purpose encoding detector and include it in perlbug, please go ahead, but be sure to make it degrade nicely under miniperl when the Encode module is not available.Thread Previous | Thread Next