develooper Front page | perl.perl5.porters | Postings from March 2014

Re: Perl 5.20.0 Blockers, 2014-03-24

Thread Previous | Thread Next
From:
Eric Brine
Date:
March 25, 2014 13:41
Subject:
Re: Perl 5.20.0 Blockers, 2014-03-24
Message ID:
CALJW-qHpaujwQ82NfEnDHTQqf0rDMdq15B_yVe-RbR-RGTsqYw@mail.gmail.com
On Tue, Mar 25, 2014 at 8:45 AM, Craig A. Berry <craig.a.berry@gmail.com>wrote:

> On Tue, Mar 25, 2014 at 12:14 AM, Eric Brine <ikegami@adaelis.com> wrote:
> > On Mon, Mar 24, 2014 at 12:08 PM, Zefram <zefram@fysh.org> wrote:
> >>
> >> Ricardo Signes wrote:
> >> >3.  "Make perlbug Unicode-aware" broke perlbug on Win32
> >> >    https://rt.perl.org/Ticket/Display.html?id=121277
> >>
> >> I think there's a bug in the Unicode-awareness patch.
> >
> >
> > And I have a third. It's printing decoded text to a binary handle.
> >
> > open(REP, '>:raw', $filename) or die "Unable to create report file
> > '$filename': $!\n";
> >
> > open(F, "<:$input_encoding", $file)
> >     or die "Unable to read report file from '$file': $!\n";
> > while (<F>) {
> >     print REP $_
> > }
>
> Maybe it's obvious to everyone else, but I'd appreciate it if you
> could explain to me what the bug is.


When I say the output is a "binary handle", I mean one that only accepts
bytes. But the input handle is providing strings of Unicode code points.
This mismatch is a bug. There's a decoding layer too many, or there's a
missing encoding layer.

In practice:

* You could input "\x{00E9}" (é), which would be output as "\xE9".

* You could input "\x{20AC}" (EURO), which would be output as "\xE2\x82\xAC"
and a warning.

* You could input "\x{00E9}\x{20AC}" (é EURO), which would be output as
"\xC3\xA9\xE2\x82\xAC" and a warning.

What encoding is expected?

* If you're expecting UTF-8, you have invalid UTF-8 in some cases and
warnings in others.

* If you're expecting iso-8859-1, you have improperly encoded characters in
some cases, and you are warned about it.

* If you're expecting another encoding, you have improperly encoded
characters in some cases, and you are sometimes warned about it.

The primary goal was to
> construct a mail message with multiple attachments having potentially
> multiple encodings, each potentially different from the encoding of
> the message text.


And you want to keep them as is? Then you want a binary input handle, and
you want set the appropriate charset header if you don't already.

Thread Previous | Thread Next


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About