Front page | perl.perl5.porters |
Postings from January 2012
[perl #82772] utf8::decode doesn't always work on an in-memory file buffer
Thread Next
From:
Father Chrysostomos via RT
Date:
January 20, 2012 08:41
Subject:
[perl #82772] utf8::decode doesn't always work on an in-memory file buffer
Message ID:
rt-3.6.HEAD-14510-1327077686-1441.82772-15-0@perl.org
On Tue Jan 25 03:39:02 2011, markusl@bluearc.com wrote:
>
> This is a bug report for perl from markusl@bluearc.com,
> generated with the help of perlbug 1.39 running under perl 5.10.1.
>
> I have a program that reads its own text, prints it to an in-memory
> file, closes the in-memory file, and prints the resulting buffer.
> The program ought to print its own text. However, there's a
> complication. The text of the program contains UTF-8 characters,
> and the in-memory file is opened with ':encoding(utf8)'. As a
> result, the buffer contains a string of bytes (not characters), and
> I have to call utf8::decode before printing it. That much is fine.
>
> However, utf8::decode sometimes fails. If I print the resulting
> buffer, it's doubly UTF-8-encoded.
>
> [~/perl]$ ./utf8-decode-failure
> Died at ./utf8-decode-failure line 14, <$fh_in> line 19.
> [~/perl]$ ./utf8-decode-failure
> Died at ./utf8-decode-failure line 14, <$fh_in> line 19.
> [~/perl]$ ./utf8-decode-failure
> #!/usr/bin/perl
> use v5.10;
> use warnings;
> use strict;
>
> use open 'encoding(utf8)';
> use open ':std';
>
> open my $fh_out, '>:encoding(utf8)', \ my $buffer;
> open my $fh_in, '<:encoding(utf8)', $0;
> print {$fh_out} <$fh_in>;
> close $fh_out;
> # $buffer .= '';
> utf8::decode $buffer or die;
> print $buffer;
>
> __DATA__
> �b�d�fgh�jklmn�pqrst�vwxyz�
>
> [~/perl]$ ./utf8-decode-failure
> #!/usr/bin/perl
> use v5.10;
> use warnings;
> use strict;
>
> use open 'encoding(utf8)';
> use open ':std';
>
> open my $fh_out, '>:encoding(utf8)', \ my $buffer;
> open my $fh_in, '<:encoding(utf8)', $0;
> print {$fh_out} <$fh_in>;
> close $fh_out;
> # $buffer .= '';
> utf8::decode $buffer or die;
> print $buffer;
>
> __DATA__
> �b�d�fgh�jklmn�pqrst�vwxyz�
>
> [~/perl]$ ./utf8-decode-failure
> Died at ./utf8-decode-failure line 14, <$fh_in> line 19.
> [~/perl]$
>
> If I uncomment the statement near the end, the program always works.
As discussed in ticket #108398, PerlIO::scalar was not adding a trailing
null to the string buffer, whereas $buffer .= '' does add a trailing null.
This has been fixed in commit 8af8844435.
But the internal function that implements this, sv_utf8_decode, is still
reading past the end of the string, and has been since commit 67e989fb5490.
I’ll fix that as soon as 5.15.7 is out.
--
Father Chrysostomos
---
via perlbug: queue: perl5 status: open
https://rt.perl.org:443/rt3/Ticket/Display.html?id=82772
Thread Next
-
[perl #82772] utf8::decode doesn't always work on an in-memory file buffer
by Father Chrysostomos via RT