develooper Front page | perl.perl5.porters | Postings from January 2012

[perl #82772] utf8::decode doesn't always work on an in-memory file buffer

Thread Next
From:
Father Chrysostomos via RT
Date:
January 20, 2012 08:41
Subject:
[perl #82772] utf8::decode doesn't always work on an in-memory file buffer
Message ID:
rt-3.6.HEAD-14510-1327077686-1441.82772-15-0@perl.org
On Tue Jan 25 03:39:02 2011, markusl@bluearc.com wrote:
> 
> This is a bug report for perl from markusl@bluearc.com,
> generated with the help of perlbug 1.39 running under perl 5.10.1.
> 
> I have a program that reads its own text, prints it to an in-memory
>    file, closes the in-memory file, and prints the resulting buffer.
>    The program ought to print its own text.  However, there's a
>    complication.  The text of the program contains UTF-8 characters,
>    and the in-memory file is opened with ':encoding(utf8)'.  As a
>    result, the buffer contains a string of bytes (not characters), and
>    I have to call utf8::decode before printing it.  That much is fine.
> 
> However, utf8::decode sometimes fails.  If I print the resulting
>    buffer, it's doubly UTF-8-encoded.
> 
> [~/perl]$ ./utf8-decode-failure
> Died at ./utf8-decode-failure line 14, <$fh_in> line 19.
> [~/perl]$ ./utf8-decode-failure
> Died at ./utf8-decode-failure line 14, <$fh_in> line 19.
> [~/perl]$ ./utf8-decode-failure
> #!/usr/bin/perl
> use v5.10;
> use warnings;
> use strict;
> 
> use open 'encoding(utf8)';
> use open ':std';
> 
> open my $fh_out, '>:encoding(utf8)', \ my $buffer;
> open my $fh_in,  '<:encoding(utf8)', $0;
> print {$fh_out} <$fh_in>;
> close $fh_out;
> # $buffer .= '';
> utf8::decode $buffer or die;
> print $buffer;
> 
> __DATA__
> �b�d�fgh�jklmn�pqrst�vwxyz�
> 
> [~/perl]$ ./utf8-decode-failure
> #!/usr/bin/perl
> use v5.10;
> use warnings;
> use strict;
> 
> use open 'encoding(utf8)';
> use open ':std';
> 
> open my $fh_out, '>:encoding(utf8)', \ my $buffer;
> open my $fh_in,  '<:encoding(utf8)', $0;
> print {$fh_out} <$fh_in>;
> close $fh_out;
> # $buffer .= '';
> utf8::decode $buffer or die;
> print $buffer;
> 
> __DATA__
> �b�d�fgh�jklmn�pqrst�vwxyz�
> 
> [~/perl]$ ./utf8-decode-failure
> Died at ./utf8-decode-failure line 14, <$fh_in> line 19.
> [~/perl]$
> 
> If I uncomment the statement near the end, the program always works.

As discussed in ticket #108398, PerlIO::scalar was not adding a trailing
null to the string buffer, whereas $buffer .= '' does add a trailing null.

This has been fixed in commit 8af8844435.

But the internal function that implements this, sv_utf8_decode, is still
reading past the end of the string, and has been since commit 67e989fb5490.

I’ll fix that as soon as 5.15.7 is out.

-- 

Father Chrysostomos


---
via perlbug:  queue: perl5 status: open
https://rt.perl.org:443/rt3/Ticket/Display.html?id=82772

Thread Next


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About