On Tue, Mar 25, 2014 at 6:44 PM, Craig A. Berry <craig.a.berry@gmail.com>wrote: > > When I say the output is a "binary handle", I mean one that only accepts > > bytes. But the input handle is providing strings of Unicode code points. > > This mismatch is a bug. There's a decoding layer too many, or there's a > > missing encoding layer. > > > > In practice: > > > > * You could input "\x{00E9}" (é), which would be output as "\xE9". > > > > * You could input "\x{20AC}" (EURO), which would be output as > "\xE2\x82\xAC" > > and a warning. > > > > * You could input "\x{00E9}\x{20AC}" (é EURO), which would be output as > > "\xC3\xA9\xE2\x82\xAC" and a warning. > > Why would they warn when written to a :raw handle? > Because the output is now suddenly in mixed-encoding, which tends to be unparsable because you don't know how to decode it. > >> The primary goal was to > >> construct a mail message with multiple attachments having potentially > >> multiple encodings, each potentially different from the encoding of > >> the message text. > > > > > > And you want to keep them as is? Then you want a binary input handle, and > > you want set the appropriate charset header if you don't already. > > That's exactly what it does. Everything is read in raw and written out > raw *except* the message body, which may have been generated by any > random editor on any platform. For that I attempt to infer the locale > encoding and read into a handle decoding that specific encoding > (which may not be working right per Zefram though I haven't confirmed > that yet). > > Since I'm decoding on input, the data at that point have been > converted to Perl's internal encoding, which as far as I know is UTF-8 > (or a lax variant of it) when it needs to be. Then I write it out raw > and say in the MIME header that it's UTF-8. Yes, I'm intentionally > losing the encoding on the output handle. I pretty much have to since > that handle may get data in multiple unknown encodings. > Decoding on input but not encoding on output will give unpredictable output. I also don't specify character encodings in the MIME headers on the > attachment(s) since there is no way to know what they are. But that's > not a problem since the main use case is patches created by > git-format-patch and git seems happy as long as we don't mangle > anything in transport. > Sounds reasonable to me. LeonThread Previous