develooper Front page | perl.perl5.porters | Postings from April 2018

Re: [perl #133101] Anomalies in handling malformed utf8 input

Thread Previous | Thread Next
From:
Dan Book
Date:
April 11, 2018 20:52
Subject:
Re: [perl #133101] Anomalies in handling malformed utf8 input
Message ID:
CABMkAVWHQTFUCWqyojgAGMTmA2Tyje=R7S3B534QDxMcwj83sQ@mail.gmail.com
On Wed, Apr 11, 2018 at 4:46 PM, Karl Williamson <public@khwilliamson.com>
wrote:

> On 04/11/2018 11:17 AM, Ricardo SIGNES (via RT) wrote:
>
>> # New Ticket Created by  Ricardo SIGNES
>> # Please include the string:  [perl #133101]
>> # in the subject line of all future correspondence about this issue.
>> # <URL: https://rt.perl.org/Ticket/Display.html?id=133101 >
>>
>>
>> Mark Dominus sent me a bug report that he couldn't get perlbug to accept.
>>
>> -------
>>
>> This is a bug report.
>>
>> The attached input file “bad” is a one-line summary of an email message
>> whose subject field was malformed. The subject field is encoded in GB-2312
>> and its raw bytes are invalid when interpreted as utf8.  Let us suppose
>> that this data is saved in a file named bad.  Now consider the following
>> invocations:
>>
>> 1$ perl -lne 'print if /[ąę]/' bad                       > /dev/null
>> 2$ PERL_UNICODE=39 perl -lne 'print if /[ąę]/' bad       > /dev/null
>> 3$ cat bad | perl -lne 'print if /[ąę]/'                 > /dev/null
>> 4$ cat bad | PERL_UNICODE=39 perl -lne 'print if /[ąę]/' > /dev/null
>> Malformed UTF-8 character (fatal) at -e line 1, <> line 1.
>>
>> 5$ perl -lne 'print if /ą/' bad                          > /dev/null
>> 6$ PERL_UNICODE=39 perl -lne 'print if /ą/' bad          > /dev/null
>> 7$ cat bad | perl -lne 'print if /ą/'                    > /dev/null
>> 8$ cat bad | PERL_UNICODE=39 perl -lne 'print if /ą/'    > /dev/null
>>
>
> Shouldn't
>
> use utf8
>
> be used?
>
>
Yes. -CAS only sets @ARGV to be interpreted as UTF-8 and :utf8 layers on
STDIN/STDOUT/STDERR. The source code still needs `use utf8;` to be
interpreted correctly.

-Dan

Thread Previous | Thread Next


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About