develooper Front page | perl.perl5.porters | Postings from July 2021

Re: "use v5.36.0" should imply UTF-8 encoded source

Thread Previous | Thread Next
Felipe Gasper
July 30, 2021 16:56
Re: "use v5.36.0" should imply UTF-8 encoded source
Message ID:

> On Jul 30, 2021, at 10:45 AM, Ricardo Signes <> wrote:
> Porters,
> I propose that "use v5.36.0" should imply that the source code is, subsequently, UTF-8 encoded.
> Currently, I advise the following boilerplate:
> use v5.34.0;
> use warnings;
> use utf8;
> We're on the cusp or merging warnings in.  Next, we merge in utf8.  This shouldn't break existing programs, only programs that opt to change behavior by adding v5.36.0.

FWIW, I think this will regress Perl’s usability.

Probably the worst part about character encoding in Perl is that nothing indicates when you’ve over-encoded or under-encoded. But, at the very least everything right now is consistent by default: source code is parsed as bytes (“Latin-1”), and I/O happens as bytes. Thus, a “minimal-effort” approach to writing Perl will at least minimize the odds of encoding mismatches: you only run into trouble if you explicitly decode/encode.

If `use v5.36` is to disrupt that consistency by making source code UTF-8-decoded but *leaving* I/O as bytes, this seems likely to add another “shin-bumper” to use of Perl that doesn’t happen in languages that type byte strings differently from text strings.

So quick-and-simple things like `print "é"` will now, in “modern” Perl, break, with no indication of where/why until a human being comes along, notices the problem, and puts in the time to debug it.

It’s going to be particularly problematic with stuff like `mkdir "épée" because now we’re *really* expecting the SvPV bug--where we give the raw PV to the kernel/OS--to stick around.

UTF-8 decoding by default is a fine idea, but until Perl can tell me the difference between a byte string and a character string, I think the change would yield more harm than good.

Thread Previous | Thread Next Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at | Group listing | About