develooper Front page | perl.perl5.porters | Postings from July 2021

Re: "use v5.36.0" should imply UTF-8 encoded source

Thread Previous | Thread Next
Darren Duncan
July 31, 2021 01:34
Re: "use v5.36.0" should imply UTF-8 encoded source
Message ID:
On 2021-07-30 7:45 a.m., Ricardo Signes wrote:
> I propose that "use v5.36.0" should imply that the source code is, subsequently, 
> UTF-8 encoded.
> We're on the cusp or merging warnings in.  Next, we merge in utf8.  This 
> shouldn't break existing programs, only programs that opt to change behavior by 
> adding v5.36.0.
> With that, the boilerplate could be:
> use v5.36.0;
> This doesn't need to load, and could just alter $^H, but: whatever.

This gets a +1 from me.

In theory this could be a problem if the source file isn't actually UTF-8 
encoded and someone adding that new boilerplate didn't realize this particular 

One thing we could do to help mitigate this is that Perl upon seeing that 
boilerplate will do a strict verification of the source file that it is indeed 
valid UTF-8 and die with a parsing error if it is not.

I don't know if "use utf8;" is already strict like that, instead using 
substitution characters or something, but "use v5.36.0;" can be.

On 2021-07-30 11:46 a.m., Felipe Gasper wrote:
> Changing it so that the (“modern”) default is to decode strings as UTF-8 but still output them as bytes seems likely to introduce lots of confusion, which will either a) discourage adoption of “use v5.36”, or b) discourage use of Perl at all:

I don't see a problem here, especially if my strict mode proposal is used.

I see that the encoding of how a program source is interpreted is completely 
separate and unrelated to the encoding of other filehandle operations.

It seems entirely appropriate for the source to be taken as UTF-8 but other 
filehandles still default to bytes.

-- Darren Duncan

Thread Previous | Thread Next Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at | Group listing | About