develooper Front page | perl.perl5.porters | Postings from July 2017

RFC: deprecate implicit-encoding "Wide character in print" behaviour

Thread Next
From:
Zefram
Date:
July 22, 2017 01:05
Subject:
RFC: deprecate implicit-encoding "Wide character in print" behaviour
Message ID:
20170722010516.GF9383@fysh.org
Currently, when an attempt is made to print a non-Latin-1 character to
a byte-oriented stream, the entire string containing that character
is output nevertheless, implicitly UTF-8 encoded.  I think we should
deprecate this behaviour, ultimately making this situation signal an
exception.

The current behaviour cannot be used to get consistent encoding from a
Perl program.  UTF-8 encoding is of course often useful, but because the
implicit encoding is only triggered if there's actually a non-Latin-1
character in the string, the actual encoding of the output varies in
a significant manner between non-ASCII Latin-1 strings and non-Latin-1
strings.  So different program runs produce incompatibly-encoded output.
In fact, because the encoding decision is made per argument, the output
from a single program run, and even a single print statement, can be
self-inconsistent.  This is not useful behaviour.

The inconsistency is misleading.  It adds to the difficulty that
we can't avoid around the aliasing of bytes and Latin-1 characters.
We've had at least one bug report from a user who was very confused about
encoding because encoded and decoded forms of a string would print out
identically for him.  (The `bug' was that encoded and decoded strings
got concatenated, causing them to be output with consistent encoding
that made them appear different.)

-zefram

Thread Next


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About