On 04/11/2018 11:17 AM, Ricardo SIGNES (via RT) wrote: > # New Ticket Created by Ricardo SIGNES > # Please include the string: [perl #133101] > # in the subject line of all future correspondence about this issue. > # <URL: https://rt.perl.org/Ticket/Display.html?id=133101 > > > > Mark Dominus sent me a bug report that he couldn't get perlbug to accept. > > ------- > > This is a bug report. > > The attached input file “bad” is a one-line summary of an email message > whose subject field was malformed. The subject field is encoded in GB-2312 > and its raw bytes are invalid when interpreted as utf8. Let us suppose > that this data is saved in a file named bad. Now consider the following > invocations: > > 1$ perl -lne 'print if /[ąę]/' bad > /dev/null > 2$ PERL_UNICODE=39 perl -lne 'print if /[ąę]/' bad > /dev/null > 3$ cat bad | perl -lne 'print if /[ąę]/' > /dev/null > 4$ cat bad | PERL_UNICODE=39 perl -lne 'print if /[ąę]/' > /dev/null > Malformed UTF-8 character (fatal) at -e line 1, <> line 1. > > 5$ perl -lne 'print if /ą/' bad > /dev/null > 6$ PERL_UNICODE=39 perl -lne 'print if /ą/' bad > /dev/null > 7$ cat bad | perl -lne 'print if /ą/' > /dev/null > 8$ cat bad | PERL_UNICODE=39 perl -lne 'print if /ą/' > /dev/null Shouldn't use utf8 be used? > > There are at least two anomalies here. > > Invocation 4 properly fails. (PERL_UNICODE=39 is equivalent to supplying > the -CAS flag to Perl.) But invocation 8 is identical, except that the > pattern is /ą/ instead of /[ąę]/; why doesn't this fail as well? > > Invocation 2 is completely identical, except that the data is delivered on > stdin rather than coming from ARGV. Why doesn't this fail as well? (The > data itself is identical, as confirmed by cat bad | cmp - bad). > > The complete message header is also attached (msg-hdr.txt), and the > examples above all behave the same when I use it in place of the shorter > excerpt. > > This is perl 5, version 22, subversion 1 (v5.22.1) built for > x86_64-linux-gnu-thread-multi > (with 60 registered patches, see the attached output of perl -V for more > detail) > > Please cc me on replies, as I do not regularly read this list. >Thread Previous | Thread Next