> On Jul 30, 2021, at 1:48 PM, Leon Timmermans <fawaka@gmail.com> wrote: > > On Fri, Jul 30, 2021 at 6:56 PM Felipe Gasper <felipe@felipegasper.com> wrote: > FWIW, I think this will regress Perl’s usability. > > Probably the worst part about character encoding in Perl is that nothing indicates when you’ve over-encoded or under-encoded. But, at the very least everything right now is consistent by default: source code is parsed as bytes (“Latin-1”), and I/O happens as bytes. Thus, a “minimal-effort” approach to writing Perl will at least minimize the odds of encoding mismatches: you only run into trouble if you explicitly decode/encode. > > If `use v5.36` is to disrupt that consistency by making source code UTF-8-decoded but *leaving* I/O as bytes, this seems likely to add another “shin-bumper” to use of Perl that doesn’t happen in languages that type byte strings differently from text strings. > > So quick-and-simple things like `print "é"` will now, in “modern” Perl, break, with no indication of where/why until a human being comes along, notices the problem, and puts in the time to debug it. > > It doesn't actually break. PerlIO will try to downgrade that for a non-:utf8 handle, or upgrade for a :utf8 handle. It’ll downgrade it, but it won’t encode it, so you’ll get mojibake: > perl -Mutf8 -e'print "é"' � > It’s going to be particularly problematic with stuff like `mkdir "épée" because now we’re *really* expecting the SvPV bug--where we give the raw PV to the kernel/OS--to stick around. > > That problem exists with or without this change. That said, I don't think I've ever seen a hard-coded non-ascii path in a program, I don't think this is much of an issue. The problem exists, yes, but this change will make the bug that much more painful to fix. I would wager that folks using Perl in the context of non-Latin languages (Cyrillic, CJK, &c.) will be more likely to hard-code non-ASCII paths. I personally mostly do it for testing. And, of course, the problem pertains not just to filesystem paths, but to any string we give to the kernel (e.g., args to exec()). -FThread Previous | Thread Next