develooper Front page | perl.perl5.porters | Postings from August 2021

Re: "use v5.36.0" should imply ASCII source

Thread Previous | Thread Next
Graham Knop
August 6, 2021 18:04
Re: "use v5.36.0" should imply ASCII source
Message ID:
I think this might be reasonable, but I'm not certain about how we
would want it to interact with Pod.

Pod is often interleaved with code, and more likely to include names,
and thus, UTF-8 characters. The way to declare Pod content as UTF-8 is
separate from how it is declared for the source. Should users be
required to include both a 'use utf8;' and '=encoding UTF-8' if they
want to include UTF-8 characters in their documentation, even if their
code is pure ASCII?

On Fri, Aug 6, 2021 at 5:23 PM Ricardo Signes <> wrote:
> Porters,
> I recently posted the suggestion that "use v5.36.0" should imply "use utf8", which led to a pretty large thread in which Felipe Gasper repeatedly said "This is going to make things worse, not better."  I spent a lot of time grumbling about this to myself, figuring out exactly how to rebut this, and then deciding that I tentatively, partly, agreed with him.
> We want each improvement to be a ratcheting up in language usability, when possible, rather than "we made things worse so we could make them better."  At present, because we don't (and can't) know whether a string is text or bytes, we don't (and can't) automatically encode it when it hits a bytestream.  We also don't know reliably whether a given output handle is already expecting to do that encoding for us.
> I am 100% certain that adding "use utf8" to the feature bundle would be better for me, but I already have a pretty strong grasp of the I/O model of Perl.  I'm not sure it's better enough for everybody.
> At the PSC, we had a long talk about this, and another proposal was made:
> We introduce a new stricture, which I'll call "source_encoding".  Under "use strict 'source_encoding'", the compiler will raise an exception when the source contains non-ASCII content unless the utf8 pragma is in effect.  The error raised can drive the programmer to documentation explaining the various trade-offs.  That is: you can turn on utf8 and deal with how this affects your I/O, or you can disable the stricture, or you can restate your non-ASCII content as ASCII by using escaping constructs.
> I'm not sure this is an improvement, but I think it is.  This prevents the "I forgot to add utf8 and so only discovered after runtime that I have doubly-encoded my output" bug.
> --
> rjbs

Thread Previous | Thread Next Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at | Group listing | About