> On Jan 5, 2020, at 7:27 PM, Zefram via perl5-porters <perl5-porters@perl.org> wrote: > > Felipe Gasper wrote: >> The workflow you're describing--considering a non-decode as >> equivalent to decoding as Latin-1--violates the workflow that >> `perlunitut` prescribes. > > No, it doesn't, precisely because Perl doesn't distinguish between > a string of octets and a string of Latin-1 characters. As I wrote earlier, I daresay few, if any, who are new to Perl and read `perlunitut` would think in this way. The document specifically says in “I/O flow” that, if the input is not binary, “you should decode it”. It even shows an example decode() of Latin-1. I think nearly anyone who comes to this problem afresh would think the document means that strings encoded in Latin-1 should be explicitly decoded before being handled as text. If the intent truly is that forgoing an explicit decode with Latin-1 encoded binary is just as valid and encouraged of a workflow as an explicit decode, it would be nice if `perlunitut` were updated to make that clearer from the get-go. I’d offer to do it, but I’m still not sure that my mental model of all of this is what’s intended. > This doesn't > only happen with input streams that are in their entirety Latin-1 or > ASCII characters; it is also common for strings of such characters to > be extracted or decoded from larger file formats without setting the > SvUTF8 flag. It is also normal for string literals in Perl source to > be treated as character strings without any explicit decoding phase, > and those produce non-SvUTF8 strings wherever possible. Most string operations work perfectly well on undecoded strings. I myself rarely use decode/encode unless I have to interact with JSON. > >> What I propose ("strictstrings") is an opt-in mode of operation > > It's not feasible to opt into this mode, because strings cross module > boundaries all the time, and in all sorts of roles. Any type distinction > attached to strings will be lost by innocuous operations performed by > unaware modules, and the behaviour of modules with respect to the type > distinction would quickly become an API backcompat issue preventing > modules acquiring the type distinction. This is completely unlike "use > strict", which affects the interpretation of bits of code that are by > definition completely localised within a single module. A fair amount of existing Perl would not work with “strictstrings” mode, to be sure. But since the proposed mode would merely introduce new failure states, nothing that _does_ work with it would break without it, so couldn’t any existing code be rectified? The new failure states may also expose subtle encoding bugs in existing code. -FThread Previous | Thread Next