develooper Front page | perl.perl5.porters | Postings from September 2012

[perl #101384] perldiag does not adequately describe how to avoid malformed UTF8 scalars

Thread Next
From:
James E Keenan via RT
Date:
September 20, 2012 18:57
Subject:
[perl #101384] perldiag does not adequately describe how to avoid malformed UTF8 scalars
Message ID:
rt-3.6.HEAD-11172-1348192648-164.101384-15-0@perl.org
On Fri Oct 14 16:06:47 2011, tom christiansen wrote:
> Also, this entry from perldiag lies:
> 
>    Malformed UTF-8 character (%s)
>        (S utf8) (F) Perl detected a string that didn't comply with UTF-8
>        encoding rules, even though it had the UTF8 flag on.
> 
>        One possible cause is that you set the UTF8 flag yourself for
>        data that you thought to be in UTF-8 but it wasn't (it was for
>        example legacy 8-bit data). To guard against this, you can use
>        Encode::decode_utf8.
> 
>        If you use the ":encoding(UTF-8)" PerlIO layer for input, invalid
>        byte sequences are handled gracefully, but if you use ":utf8",
>        the flag is set without validating the data, possibly resulting
>        in this error message.
> 
>        See also "Handling Malformed Data" in Encode.

The above is from 'perldiag'.

The below is Tom's comment:

> 
> That's because using ":encoding(UTF-8)" instead of ":utf8" makes 
> absolutely no difference.  The output and behavior are identical.
> Therefore it does *not*do*you*any*good*, and perldiag is in error.
> 
> Karl, isn't there something about this being some sort of security 
> problem?  Or is it ok because the code point seems to be construed
> as U+0000?
> 

Can anyone comment on these issues?

Thank you very much.
Jim Keenan

---
via perlbug:  queue: perl5 status: new
https://rt.perl.org:443/rt3/Ticket/Display.html?id=101384

Thread Next


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About