develooper Front page | perl.perl5.porters | Postings from August 2021

Re: "use v5.36.0" should imply ASCII source

Thread Previous | Thread Next
Felipe Gasper
August 8, 2021 12:17
Re: "use v5.36.0" should imply ASCII source
Message ID:

> On Aug 8, 2021, at 7:15 AM, Andreas K. Huettel <> wrote:
>> At the PSC, we had a long talk about this, and another proposal was made:
>> We introduce a new stricture, which I'll call "source_encoding".  Under "use strict 'source_encoding'", the compiler will raise an exception when the source contains non-ASCII content unless the utf8 pragma is in effect.  The error raised can drive the programmer to documentation explaining the various trade-offs.  That is: you can turn on utf8 and deal with how this affects your I/O, or you can disable the stricture, or you can restate your non-ASCII content as ASCII by using escaping constructs.
> This somehow feels like a step backwards.
> Nearly every modern Linux installation uses a unicode locale by default nowadays, I haven't come across a text file in latin1 (or similar) encoding for months...

Nearly every modern programming language also differentiates between text and binary. Alas, Perl doesn’t do this.

The language’s maintainers feel--reasonably, I think--that text in source code should be decoded. The fact that “é” in UTF-8 Perl source code is two characters (i.e., code points) by default is weird and counterintuitive. The problem is that’s auto-decoding behaviour imposes a requirement to encode manually, which is *really* weird/counterintuitive: it would “subtly invalidate” a simple “hello, world” implementation in “modern” Perl, which invalidity would only “bite” when there are >127 code points involved, which is, again, further weird/counterintuitive.

So, it’s a mess. The best fix here would be to teach Perl to track which strings are decoded and which aren’t. Perl would gain copiously therefrom, but it’s not easy to do. For now it’s at least reasonable to require, in “modern” Perl, that either:

a) Source code remain all-ASCII.


b) Perl’s auto-decoding mode be enabled (explicitly).

This will require that folks like myself, who desire “modernity” but for whom Perl’s status quo is actually useful and desirable (because $work almost never cares about strings’ Unicode content), find some workaround, but at least it’s a conspicuous change that won’t “surprise” anyone.

Thread Previous | Thread Next Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at | Group listing | About