On Fri, Jul 30, 2021 at 2:28 PM Leon Timmermans <fawaka@gmail.com> wrote: > On Fri, Jul 30, 2021 at 7:56 PM Felipe Gasper <felipe@felipegasper.com> > wrote: > >> >> >> > On Jul 30, 2021, at 1:48 PM, Leon Timmermans <fawaka@gmail.com> wrote: >> > >> > On Fri, Jul 30, 2021 at 6:56 PM Felipe Gasper <felipe@felipegasper.com> >> wrote: >> > FWIW, I think this will regress Perlâs usability. >> > >> > Probably the worst part about character encoding in Perl is that >> nothing indicates when youâve over-encoded or under-encoded. But, at the >> very least everything right now is consistent by default: source code is >> parsed as bytes (âLatin-1â), and I/O happens as bytes. Thus, a >> âminimal-effortâ approach to writing Perl will at least minimize the odds >> of encoding mismatches: you only run into trouble if you explicitly >> decode/encode. >> > >> > If `use v5.36` is to disrupt that consistency by making source code >> UTF-8-decoded but *leaving* I/O as bytes, this seems likely to add another >> âshin-bumperâ to use of Perl that doesnât happen in languages that type >> byte strings differently from text strings. >> > >> > So quick-and-simple things like `print "é"` will now, in âmodernâ Perl, >> break, with no indication of where/why until a human being comes along, >> notices the problem, and puts in the time to debug it. >> > >> > It doesn't actually break. PerlIO will try to downgrade that for a >> non-:utf8 handle, or upgrade for a :utf8 handle. >> >> Itâll downgrade it, but it wonât encode it, so youâll get mojibake: >> >> > perl -Mutf8 -e'print "é"' >> � >> > > It will print mojibake as well if the script is latin-1 encoded. It's > mojibake because the terminal is utf-8, but the IO handle is latin1. > The difference is the orders of magnitude of people that would accidentally run a latin1 script on a utf8 terminal, vs that would run a utf8 script on a utf8 terminal with "use utf8" and not understand that they have to encode the output. -DanThread Previous | Thread Next