develooper Front page | perl.perl5.porters | Postings from August 2021

Re: "use v5.36.0" should imply UTF-8 encoded source

Thread Previous | Thread Next
Felipe Gasper
August 3, 2021 13:26
Re: "use v5.36.0" should imply UTF-8 encoded source
Message ID:

> On Aug 3, 2021, at 5:01 AM, Salvador Fandiño <> wrote:
> What we need is to add proper support for the transparent translation of data between the internal representation and the outside encoding everywhere.

For that to work Perl needs *some* stronger means of distinguishing text from binary. Ideally IMO that should be in the SV, but maybe other approaches would be beetter. It doesn’t do to assume every string is text; lots of Perl code does I/O on raw bytes, by design.

> And that means:
> a) Adding this translation feature to all the builtins doing IO
> b) Adding a mechanism so that the developer can configure it (for instance, set filesystem encoding).
> c) Infer sane defaults from the environment (utf8 has been the default encoding in most Linux/Unix systems for the last two decades, but Perl still expects latin1 from STDIO!)
> And regarding (c), that's also why most of your examples above work. Your terminal sends and expects utf-8 encoded data from perl but perl expects/sends latin1. It just happens that in your examples it is consistently wrong, but there are myriads of other cases where it isn't.

It sounds like you want STDIN/STDOUT/STDERR and @ARGV to be decoded by default, in addition to the source code. TBH that makes more sense to me than the present proposal since it would preserve parity of character encoding for simple programs, though I think it would still sow confusion since other filehandles will remain binary by default:

use utf8;
pipe my $r, my $w;
print {$w} "¡Hola, mundo!";        # oops! I just make mojibake, but nothing told me.

Thread Previous | Thread Next Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at | Group listing | About