Felipe Gasper wrote: >The workflow you're describing--considering a non-decode as >equivalent to decoding as Latin-1--violates the workflow that >`perlunitut` prescribes. No, it doesn't, precisely because Perl doesn't distinguish between a string of octets and a string of Latin-1 characters. This doesn't only happen with input streams that are in their entirety Latin-1 or ASCII characters; it is also common for strings of such characters to be extracted or decoded from larger file formats without setting the SvUTF8 flag. It is also normal for string literals in Perl source to be treated as character strings without any explicit decoding phase, and those produce non-SvUTF8 strings wherever possible. >What I propose ("strictstrings") is an opt-in mode of operation It's not feasible to opt into this mode, because strings cross module boundaries all the time, and in all sorts of roles. Any type distinction attached to strings will be lost by innocuous operations performed by unaware modules, and the behaviour of modules with respect to the type distinction would quickly become an API backcompat issue preventing modules acquiring the type distinction. This is completely unlike "use strict", which affects the interpretation of bits of code that are by definition completely localised within a single module. >Sereal does indeed distinguish explicitly between character and octet strings: That page shows the distinction between "binary/latin1" and "utf8" *representations* of strings. The "binary/latin1" label makes clear that that representation can be used for both character and octet strings, and all the higher-level semantics that use strings refer to the data type as "string", making no distinction between representations. It's fairly clear from this that Sereal doesn't make any type distinction between character strings and octet strings; it aliases them just as Perl does. -zeframThread Previous | Thread Next