Front page | perl.perl5.porters |
Postings from January 2020
Re: ???strict??? strings?
From: Zefram via perl5-porters
January 7, 2020 03:06
Re: ???strict??? strings?
Message ID: email@example.com
Felipe Gasper wrote:
>In such a context it's important (for Go's sake) to distinguish blobs
>from text, so it's more than a matter of convenience, right?
Any kind of serialisation always produces this kind of impedance mismatch
when communicating between environments that have different type systems.
The communicating parties have to reach some accommodation on which type
distinctions matter in the communication and which don't, and the party
with less native type distinction very often has to go to some extra
effort to respect the agreed type distinctions. Perl, having aliased
quite a range of basic types, is often in that position. Look at how
I don't know enough about Go to comment on Perl<->Go or Sereal<->Go
>Sereal appears to intend to solve that problem by using the UTF8 flag.
It appears not, from the format spec and the Perl modules. It does
not give any impression of making the type distinction that you have
attributed to it.
Although any particular type mismatch can in theory be resolved by
changing one environment's type system to match the other, you're never
going to get all languages to agree on one type system, so there are
always still going to be mismatches. Furthermore, any such type system
change introduces exactly the same kind of type system mismatch between
code written for the old and new versions of the language that changed.
Those mismatches, between modules that need to work relatively tightly
together, matter much more than the mismatches between more loosely
coupled environments. So on the whole it's a fool's errand to try to
eliminate these problems around serialisation.
Even if it were worth tackling these serialisation issues by changing
languages' type systems, character string versus octet string is only one
of many common type distinctions that Perl lacks. If you want to argue
for a change to Perl on the basis of serialisation issues, you'd have a
better case arguing for a wholesale change to the type system of either
some other prominent language or some prominent serialisation format.
>Perl offers *no* reliable way to output reliable blobs vs. text
Not true, even for serialisation formats where it's an issue. (As far
as I can see this isn't an issue at all with respect to Sereal (as
noted above), but CBOR does make the distinction, and so do some
other serialisation formats.) Perl is a Turing-complete programming
language, and perfectly capable of making a distinction in its output
that isn't built into its internal type system. One merely has to
feed the serialiser some explicit type information in addition to the
actual data. So changing the type system is far from the only way to
address this serialisation issue.
As to the merits of introducing this type distinction into Perl, it
would be quite unPerlish. Perl's model is that operations decide what
type their operands are to be treated as, and everything gets coerced.
*Everything.* It's not just octet strings that get treated as character
strings, so do numbers, undef, globs, references to aggregates, and
references to blessed objects (for which the class may, but doesn't
have to, define its own stringification logic). Would "strictstrings"
prevent coercions from non-string types? What about coercions *to*
non-string types? If you're going to attack the coercion paradigm,
please take a holistic view.
If one wanted to introduce a distinction between character strings and
octet strings, the time to do it would have been some time around Perl
3.0, when Perl claimed to become binary-clean (and became nearly so in
practice). That opportunity has long passed. There was also arguably
an opportunity around Perl 5.6, when the first attempt at handling
the full Unicode repertoire was made, though this would have been much
more disruptive. That opportunity, too, has long passed.
But even if you're determined to argue for the introduction of this
type distinction, tying it to the existing SvUTF8 flag is misconceived.
The semantics of SvUTF8 are already well established to be purely about
representation, not about semantic type. If we were to introduce any
type distinction, it would take the form of new flags, not overloading
the meaning of the existing flag. Separate these concerns.