develooper Front page | perl.perl5.porters | Postings from April 2018

[perl #133101] Anomalies in handling malformed utf8 input

Thread Next
Ricardo SIGNES
April 11, 2018 17:17
[perl #133101] Anomalies in handling malformed utf8 input
Message ID:
# New Ticket Created by  Ricardo SIGNES 
# Please include the string:  [perl #133101]
# in the subject line of all future correspondence about this issue. 
# <URL: >

Mark Dominus sent me a bug report that he couldn't get perlbug to accept.


This is a bug report.

The attached input file “bad” is a one-line summary of an email message
whose subject field was malformed. The subject field is encoded in GB-2312
and its raw bytes are invalid when interpreted as utf8.  Let us suppose
that this data is saved in a file named bad.  Now consider the following

1$ perl -lne 'print if /[ąę]/' bad                       > /dev/null
2$ PERL_UNICODE=39 perl -lne 'print if /[ąę]/' bad       > /dev/null
3$ cat bad | perl -lne 'print if /[ąę]/'                 > /dev/null
4$ cat bad | PERL_UNICODE=39 perl -lne 'print if /[ąę]/' > /dev/null
Malformed UTF-8 character (fatal) at -e line 1, <> line 1.

5$ perl -lne 'print if /ą/' bad                          > /dev/null
6$ PERL_UNICODE=39 perl -lne 'print if /ą/' bad          > /dev/null
7$ cat bad | perl -lne 'print if /ą/'                    > /dev/null
8$ cat bad | PERL_UNICODE=39 perl -lne 'print if /ą/'    > /dev/null

There are at least two anomalies here.

Invocation 4 properly fails.  (PERL_UNICODE=39 is equivalent to supplying
the -CAS flag to Perl.)  But invocation 8 is identical, except that the
pattern is /ą/ instead of /[ąę]/; why doesn't this fail as well?

Invocation 2 is completely identical, except that the data is delivered on
stdin rather than coming from ARGV.  Why doesn't this fail as well?  (The
data itself is identical, as confirmed by cat bad | cmp - bad).

The complete message header is also attached (msg-hdr.txt), and the
examples above all behave the same when I use it in place of the shorter

This is perl 5, version 22, subversion 1 (v5.22.1) built for
(with 60 registered patches, see the attached output of perl -V for more

Please cc me on replies, as I do not regularly read this list.


Thread Next Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at | Group listing | About