develooper Front page | perl.perl5.porters | Postings from October 2011

[perl #101384] Re: seeking on bytes causes broken perl strings

From:
tchrist1
Date:
October 14, 2011 16:07
Subject:
[perl #101384] Re: seeking on bytes causes broken perl strings
Message ID:
rt-3.6.HEAD-31297-1318633607-7.101384-75-0@perl.org
# New Ticket Created by  tchrist1 
# Please include the string:  [perl #101384]
# in the subject line of all future correspondence about this issue. 
# <URL: https://rt.perl.org:443/rt3/Ticket/Display.html?id=101384 >


Also, this entry from perldiag lies:

   Malformed UTF-8 character (%s)
       (S utf8) (F) Perl detected a string that didn't comply with UTF-8
       encoding rules, even though it had the UTF8 flag on.

       One possible cause is that you set the UTF8 flag yourself for
       data that you thought to be in UTF-8 but it wasn't (it was for
       example legacy 8-bit data). To guard against this, you can use
       Encode::decode_utf8.

       If you use the ":encoding(UTF-8)" PerlIO layer for input, invalid
       byte sequences are handled gracefully, but if you use ":utf8",
       the flag is set without validating the data, possibly resulting
       in this error message.

       See also "Handling Malformed Data" in Encode.

That's because using ":encoding(UTF-8)" instead of ":utf8" makes 
absolutely no difference.  The output and behavior are identical.
Therefore it does *not*do*you*any*good*, and perldiag is in error.

Karl, isn't there something about this being some sort of security 
problem?  Or is it ok because the code point seems to be construed
as U+0000?

--tom




nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About