develooper Front page | perl.perl5.porters | Postings from September 2012

[perl #101384] perldiag does not adequately describe how to avoid malformed UTF8 scalars

Thread Next
James E Keenan via RT
September 20, 2012 18:57
[perl #101384] perldiag does not adequately describe how to avoid malformed UTF8 scalars
Message ID:
On Fri Oct 14 16:06:47 2011, tom christiansen wrote:
> Also, this entry from perldiag lies:
>    Malformed UTF-8 character (%s)
>        (S utf8) (F) Perl detected a string that didn't comply with UTF-8
>        encoding rules, even though it had the UTF8 flag on.
>        One possible cause is that you set the UTF8 flag yourself for
>        data that you thought to be in UTF-8 but it wasn't (it was for
>        example legacy 8-bit data). To guard against this, you can use
>        Encode::decode_utf8.
>        If you use the ":encoding(UTF-8)" PerlIO layer for input, invalid
>        byte sequences are handled gracefully, but if you use ":utf8",
>        the flag is set without validating the data, possibly resulting
>        in this error message.
>        See also "Handling Malformed Data" in Encode.

The above is from 'perldiag'.

The below is Tom's comment:

> That's because using ":encoding(UTF-8)" instead of ":utf8" makes 
> absolutely no difference.  The output and behavior are identical.
> Therefore it does *not*do*you*any*good*, and perldiag is in error.
> Karl, isn't there something about this being some sort of security 
> problem?  Or is it ok because the code point seems to be construed
> as U+0000?

Can anyone comment on these issues?

Thank you very much.
Jim Keenan

via perlbug:  queue: perl5 status: new

Thread Next Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at | Group listing | About