develooper Front page | perl.perl5.porters | Postings from January 2020

Re: ???strict??? strings?

Thread Previous | Thread Next
From:
Zefram via perl5-porters
Date:
January 6, 2020 00:27
Subject:
Re: ???strict??? strings?
Message ID:
20200106002706.e4o4r32rgwbrbihs@fysh.org
Felipe Gasper wrote:
>The workflow you're describing--considering a non-decode as
>equivalent to decoding as Latin-1--violates the workflow that
>`perlunitut` prescribes.

No, it doesn't, precisely because Perl doesn't distinguish between
a string of octets and a string of Latin-1 characters.  This doesn't
only happen with input streams that are in their entirety Latin-1 or
ASCII characters; it is also common for strings of such characters to
be extracted or decoded from larger file formats without setting the
SvUTF8 flag.  It is also normal for string literals in Perl source to
be treated as character strings without any explicit decoding phase,
and those produce non-SvUTF8 strings wherever possible.

>What I propose ("strictstrings") is an opt-in mode of operation

It's not feasible to opt into this mode, because strings cross module
boundaries all the time, and in all sorts of roles.  Any type distinction
attached to strings will be lost by innocuous operations performed by
unaware modules, and the behaviour of modules with respect to the type
distinction would quickly become an API backcompat issue preventing
modules acquiring the type distinction.  This is completely unlike "use
strict", which affects the interpretation of bits of code that are by
definition completely localised within a single module.

>Sereal does indeed distinguish explicitly between character and octet strings:

That page shows the distinction between "binary/latin1" and "utf8"
*representations* of strings.  The "binary/latin1" label makes clear that
that representation can be used for both character and octet strings, and
all the higher-level semantics that use strings refer to the data type as
"string", making no distinction between representations.  It's fairly
clear from this that Sereal doesn't make any type distinction between
character strings and octet strings; it aliases them just as Perl does.

-zefram

Thread Previous | Thread Next


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About