develooper Front page | perl.perl5.porters | Postings from August 2021

Re: "use v5.36.0" should imply UTF-8 encoded source

Thread Previous | Thread Next
Aaron Priven
August 6, 2021 23:34
Re: "use v5.36.0" should imply UTF-8 encoded source
Message ID:
> On Aug 4, 2021, at 4:33 PM, Dan Book <> wrote:
> They are "text by default" in the ASCII sense, not the Unicode sense. The :crlf layer is enabled by default on Windows and translates CR LF to LF, but there is no default translation of bytes to characters. So you need to use binmode or :raw to make a filehandle binary-compatible on Windows, but you also need to apply an :encoding layer if you want to read/write characters instead of bytes.

I don’t think it’s true that there’s no default treatment of bytes as characters. By default, perl treats bytes as Latin-1. So if you open a file without an encoding layer, read some data, and then output it to a file opened with an encoding layer, that encoding layer will assume that the data being output is in Latin-1, and convert that to characters accordingly.

So an open filehandle is, in perl, a text filehandle using encoding Latin-1, unless a layer or binmode is used.  It wouldn’t be unreasonable to decide that, in some future version of perl, an open filehandle would be treated as a text filehandle using encoding UTF-8 instead.

The problem, of course, is that on some but not all operating systems the text filehandle returned by open can be used as a binary filehandle without loss. Conceptually it’s a text filehandle, the meaning in the perl language is that it’s a text filehandle, but people misuse it as a binary one because there’s no actual breakage, as long as it’s only running on those operating systems.

So I understand that in practice, it might well be more trouble than it’s worth to have “use 5.036” or even “use v7” make perl default to text filehandles using the UTF-8 encoding, instead of defaulting to text filehandles using the Latin-1 encoding. But I think it’s worth considering.

Aaron Priven,,
Thread Previous | Thread Next Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at | Group listing | About