develooper Front page | perl.perl5.porters | Postings from January 2020

Re: ???strict??? strings?

Thread Previous | Thread Next
From:
=?UTF-8?Q?Andr=c3=a9_Warnier_=28tomcat/perl=29?=
Date:
January 5, 2020 20:23
Subject:
Re: ???strict??? strings?
Message ID:
824e95f8-fc2c-0383-63ef-2a2e9d65ca76@ice-sa.com
On 05.01.2020 19:05, Felipe Gasper wrote:
> 
>> On Jan 5, 2020, at 10:07 AM, Zefram <zefram@fysh.org> wrote:
>>
>> Felipe Gasper wrote:
>>> Is there any supported text-decode operation, as per the
>>> input-decode-work-encode-output workflow described in `perlunitut`,
>>> that doesn't set that flag?
>>
>> Yes.  Anyone who knows they're decoding Latin-1 is free to do it by *not*
>> calling any decoding function.  In general, anything that decodes to
>> any subset of the Latin-1 character repertoire is free to represent its
>> result in the internal Latin-1 encoding rather than the internal UTF-8.
>> Plenty of code that never generates non-Latin-1 characters does yield
>> downgraded results.
> 
> The workflow you’re describing--considering a non-decode as equivalent to decoding as Latin-1--violates the workflow that `perlunitut` prescribes. At least, I submit that *most* people who read `perlunitut` would think that document inconsistent with what you’re saying. Moreover, the “implicit” Latin-1 decode you describe mismatches how Perl implements an explicit Latin-1 decode (i.e., adds the UTF8 flag).
> 
> What I propose (“strictstrings”) is an opt-in mode of operation where Perl no longer would attempt to interpret un-decode()d strings as Latin-1. Everything that handles strings as text would have to explicitly decode/encode. Perl would thus more naturally interact with Python, JavaScript, Sereal (see below), CBOR, WebSocket, and whatever other popular technologies nowadays distinguish explicitly between text and binary.
> 
> Like “use strict”, it wouldn’t break any existing code since it would be opt-in.
> 
[...]

Hi.
I am unqualified to contribute to the underlying technical pros and cons of the subject 
matter.
But as a heavy perl user, and one who has over the years developed many applications using 
perl, and who has also over the years "persuaded" quite a few young programmers to learn 
and use perl, let me just say that I tend to agree with Felipe's suggestion, if it were at 
all possible to implement it.
The main reason is that perl 5's handling of Unicode/non-Unicode encodings of strings, 
although quite clever and powerful, is in the practice maybe the biggest stumbling block 
for new (and not so new) perl programmers, who constantly and repeatedly "get caught" by 
this aspect of the language.  So, anything which would go in the direction of making this 
clearer and catching the misuse of encoded/non-encoded strings would - I believe - be a 
very useful feature.

Thread Previous | Thread Next


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About