develooper Front page | perl.perl5.porters | Postings from July 2017

RFC: deprecate implicit-encoding "Wide character in print" behaviour

Thread Next
July 22, 2017 01:05
RFC: deprecate implicit-encoding "Wide character in print" behaviour
Message ID:
Currently, when an attempt is made to print a non-Latin-1 character to
a byte-oriented stream, the entire string containing that character
is output nevertheless, implicitly UTF-8 encoded.  I think we should
deprecate this behaviour, ultimately making this situation signal an

The current behaviour cannot be used to get consistent encoding from a
Perl program.  UTF-8 encoding is of course often useful, but because the
implicit encoding is only triggered if there's actually a non-Latin-1
character in the string, the actual encoding of the output varies in
a significant manner between non-ASCII Latin-1 strings and non-Latin-1
strings.  So different program runs produce incompatibly-encoded output.
In fact, because the encoding decision is made per argument, the output
from a single program run, and even a single print statement, can be
self-inconsistent.  This is not useful behaviour.

The inconsistency is misleading.  It adds to the difficulty that
we can't avoid around the aliasing of bytes and Latin-1 characters.
We've had at least one bug report from a user who was very confused about
encoding because encoded and decoded forms of a string would print out
identically for him.  (The `bug' was that encoded and decoded strings
got concatenated, causing them to be output with consistent encoding
that made them appear different.)


Thread Next Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at | Group listing | About