develooper Front page | perl.perl5.porters | Postings from August 2021

Re: "use v5.36.0" should imply ASCII source

Thread Previous | Thread Next
From:
H.Merijn Brand
Date:
August 16, 2021 13:28
Subject:
Re: "use v5.36.0" should imply ASCII source
Message ID:
20210816152746.09dc60d5@pc09
On Mon, 16 Aug 2021 08:51:30 -0400, Felipe Gasper
<felipe@felipegasper.com> wrote:

> > On Aug 16, 2021, at 8:00 AM, Graham Knop <haarg@haarg.org> wrote:
> > 
> > After thinking about this again, I had another idea.
> > 
> > The reason implying 'use utf8' is a problem is because of the impact
> > it has on string semantics. Maybe we can just have it not impact
> > string semantics. Make 'use v5.36.0;' decode the source as UTF-8,
> > but store string literals as byte strings rather than characters.
> > The strings would still be required to be UTF-8 encoded, but would
> > be stored with the utf8 flag off. This would allow using UTF-8
> > encoded content in comments, Pod, or even in function names, but
> > would not create the confusion with strings and IO.  
> 
> I thought of this sometime back, but more in the context of adding
> flexibility to utf8.pm:
> 
> {
>     use utf8 decode => 'no_strings';    # What Graham envisions
>     my $foo = "é";                      # 2 code points
> }
> 
> {
>     use utf8 decode => 'all';    # status quo
>     my $foo = "é";               # 1 code point
> }
> 
> I personally would think decode=no_strings could be added to the
> feature bundle with little trouble. The use case for leaving strings
> undecoded doesn’t seem to apply for things besides strings.

In that vein, to ease porting from older ISO encoded source files

{   use utf8 decode => 'no_strings';    # What Graham envisions
    my $foo = "é";                      # 2 code points
    }
{   use utf8 decode => 'all';    # status quo
    my $foo = "é";               # 1 code point
    }
{   use utf8 convert => "utf-8";	# or convert ISO => "UTF-8"
    my $foo = "é";		# This ISO-8859-1 é will be upgraded to UTF-8
    }				# 1 codepoint

If well-documented and completely lexical, the path forward is
extremely easy and fast, and it will trigger coders to make their code
more 2021+. Note that a lot of software was written in times where
editors did not have a clue about mutlibyte encodings and (windows)
people still used Alt-234 and the-like to enter diacriticals.

> -F

-- 
H.Merijn Brand  https://tux.nl   Perl Monger   http://amsterdam.pm.org/
using perl5.00307 .. 5.33        porting perl5 on HP-UX, AIX, and Linux
https://tux.nl/email.html http://qa.perl.org https://www.test-smoke.org
                           

Thread Previous | Thread Next


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About