develooper Front page | perl.perl5.porters | Postings from July 2021

Re: "use v5.36.0" should imply UTF-8 encoded source

Thread Previous | Thread Next
From:
Leon Timmermans
Date:
July 30, 2021 17:48
Subject:
Re: "use v5.36.0" should imply UTF-8 encoded source
Message ID:
CAHhgV8iubi+Fgv8F92AnGy1sB7z81JPY+4wfUchF6=58xDjH0A@mail.gmail.com
On Fri, Jul 30, 2021 at 6:56 PM Felipe Gasper <felipe@felipegasper.com>
wrote:

> FWIW, I think this will regress Perl’s usability.
>
> Probably the worst part about character encoding in Perl is that nothing
> indicates when you’ve over-encoded or under-encoded. But, at the very least
> everything right now is consistent by default: source code is parsed as
> bytes (“Latin-1”), and I/O happens as bytes. Thus, a “minimal-effort”
> approach to writing Perl will at least minimize the odds of encoding
> mismatches: you only run into trouble if you explicitly decode/encode.
>
> If `use v5.36` is to disrupt that consistency by making source code
> UTF-8-decoded but *leaving* I/O as bytes, this seems likely to add another
> “shin-bumper” to use of Perl that doesn’t happen in languages that type
> byte strings differently from text strings.
>
> So quick-and-simple things like `print "é"` will now, in “modern” Perl,
> break, with no indication of where/why until a human being comes along,
> notices the problem, and puts in the time to debug it.
>

It doesn't actually break. PerlIO will try to downgrade that for a
non-:utf8 handle, or upgrade for a :utf8 handle.


> It’s going to be particularly problematic with stuff like `mkdir "épée"
> because now we’re *really* expecting the SvPV bug--where we give the raw PV
> to the kernel/OS--to stick around.
>

That problem exists with or without this change. That said, I don't think
I've ever seen a hard-coded non-ascii path in a program, I don't think this
is much of an issue.

Leon

Thread Previous | Thread Next


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About