develooper Front page | perl.perl5.porters | Postings from January 2020

Re: ???strict??? strings?

Thread Previous | Thread Next
From:
Felipe Gasper
Date:
January 6, 2020 03:07
Subject:
Re: ???strict??? strings?
Message ID:
2E81FBE0-F6BD-41E0-90D6-96590E6CF037@felipegasper.com


> On Jan 5, 2020, at 7:27 PM, Zefram via perl5-porters <perl5-porters@perl.org> wrote:
> 
> Felipe Gasper wrote:
>> The workflow you're describing--considering a non-decode as
>> equivalent to decoding as Latin-1--violates the workflow that
>> `perlunitut` prescribes.
> 
> No, it doesn't, precisely because Perl doesn't distinguish between
> a string of octets and a string of Latin-1 characters.

As I wrote earlier, I daresay few, if any, who are new to Perl and read `perlunitut` would think in this way. The document specifically says in “I/O flow” that, if the input is not binary, “you should decode it”. It even shows an example decode() of Latin-1. I think nearly anyone who comes to this problem afresh would think the document means that strings encoded in Latin-1 should be explicitly decoded before being handled as text.

If the intent truly is that forgoing an explicit decode with Latin-1 encoded binary is just as valid and encouraged of a workflow as an explicit decode, it would be nice if `perlunitut` were updated to make that clearer from the get-go. I’d offer to do it, but I’m still not sure that my mental model of all of this is what’s intended.

>  This doesn't
> only happen with input streams that are in their entirety Latin-1 or
> ASCII characters; it is also common for strings of such characters to
> be extracted or decoded from larger file formats without setting the
> SvUTF8 flag.  It is also normal for string literals in Perl source to
> be treated as character strings without any explicit decoding phase,
> and those produce non-SvUTF8 strings wherever possible.

Most string operations work perfectly well on undecoded strings. I myself rarely use decode/encode unless I have to interact with JSON.

> 
>> What I propose ("strictstrings") is an opt-in mode of operation
> 
> It's not feasible to opt into this mode, because strings cross module
> boundaries all the time, and in all sorts of roles.  Any type distinction
> attached to strings will be lost by innocuous operations performed by
> unaware modules, and the behaviour of modules with respect to the type
> distinction would quickly become an API backcompat issue preventing
> modules acquiring the type distinction.  This is completely unlike "use
> strict", which affects the interpretation of bits of code that are by
> definition completely localised within a single module.

A fair amount of existing Perl would not work with “strictstrings” mode, to be sure. But since the proposed mode would merely introduce new failure states, nothing that _does_ work with it would break without it, so couldn’t any existing code be rectified?

The new failure states may also expose subtle encoding bugs in existing code.

-F
Thread Previous | Thread Next


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About