# New Ticket Created by Ricardo SIGNES # Please include the string: [perl #133101] # in the subject line of all future correspondence about this issue. # <URL: https://rt.perl.org/Ticket/Display.html?id=133101 > Mark Dominus sent me a bug report that he couldn't get perlbug to accept. ------- This is a bug report. The attached input file “bad” is a one-line summary of an email message whose subject field was malformed. The subject field is encoded in GB-2312 and its raw bytes are invalid when interpreted as utf8. Let us suppose that this data is saved in a file named bad. Now consider the following invocations: 1$ perl -lne 'print if /[ąę]/' bad > /dev/null 2$ PERL_UNICODE=39 perl -lne 'print if /[ąę]/' bad > /dev/null 3$ cat bad | perl -lne 'print if /[ąę]/' > /dev/null 4$ cat bad | PERL_UNICODE=39 perl -lne 'print if /[ąę]/' > /dev/null Malformed UTF-8 character (fatal) at -e line 1, <> line 1. 5$ perl -lne 'print if /ą/' bad > /dev/null 6$ PERL_UNICODE=39 perl -lne 'print if /ą/' bad > /dev/null 7$ cat bad | perl -lne 'print if /ą/' > /dev/null 8$ cat bad | PERL_UNICODE=39 perl -lne 'print if /ą/' > /dev/null There are at least two anomalies here. Invocation 4 properly fails. (PERL_UNICODE=39 is equivalent to supplying the -CAS flag to Perl.) But invocation 8 is identical, except that the pattern is /ą/ instead of /[ąę]/; why doesn't this fail as well? Invocation 2 is completely identical, except that the data is delivered on stdin rather than coming from ARGV. Why doesn't this fail as well? (The data itself is identical, as confirmed by cat bad | cmp - bad). The complete message header is also attached (msg-hdr.txt), and the examples above all behave the same when I use it in place of the shorter excerpt. This is perl 5, version 22, subversion 1 (v5.22.1) built for x86_64-linux-gnu-thread-multi (with 60 registered patches, see the attached output of perl -V for more detail) Please cc me on replies, as I do not regularly read this list. -- rjbsThread Next