develooper Front page | perl.perl5.porters | Postings from August 2021

Re: "use v5.36.0" should imply UTF-8 encoded source

Thread Previous | Thread Next
From:
Felipe Gasper
Date:
August 2, 2021 11:13
Subject:
Re: "use v5.36.0" should imply UTF-8 encoded source
Message ID:
F80E292E-CD12-4E1C-AB25-B1B741FAAA7A@felipegasper.com

> On Aug 2, 2021, at 1:11 AM, Yuki Kimoto <kimoto.yuki@gmail.com> wrote:
> 
> 
> 2021-8-1 9:54 Felipe Gasper <felipe@felipegasper.com> wrote:
> 
> Another way to look at it: the content of the parsed strings actually differs between the two:
> 
> my $x = do { no utf8; "éé" };
> my $y = do { use utf8; "éé" };
> 
> 
> Felipe
> 
> I have a question.  
> 
> I think the problem is which is the better default in 2021 for general application users.
> 
> The existing code is "no utf8" so it won't break.
> 
> In the new code, the generally recommended way is
> 
>   use strict;
>   use warnings;
>   use utf8;

Recommended by whom? I generally don’t `use utf8`, and $work actually forbids it. The status quo’s consistency (i.e., everything’s a byte string until something explicitly decodes it) far outpaces whatever value I’d get from having `length "é"` return 1 rather than 2.

> If user needs old behavior, he need to write
> 
>   use v5.xx;
>   no utf8;
> 
> Are you clearly aware that this is a default change, not internal representation changes?

Yup, I know that this would only affect code that does `use 5.36`, or -E at the command line. The former would, by definition, be new code, and the latter is inherently unstable, so there’s no problem with the fact that it’s a behaviour change from default per se.

The problem is that the feature bundles, by definition, represent Perl at its ostensible best, its most modern. This particular proposal would make `perl -E'say "¡Hola, mundo!"` print mojibake. That seems undesirable in the extreme; no other major language introduces that complexity for such a trivial task, and if one did, it would give some indication of what’s wrong rather than Perl’s “silent failure” approach.

This all said: if the desire is more to be able to use non-ASCII in identifier names (e.g., `sub épée { … }`), could a variant of utf8.pm be created that leaves string literals undecoded but just decodes sub names and the like? *That* would seem a reasonable improvement upon status quo.

-F
Thread Previous | Thread Next


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About