develooper Front page | perl.perl5.porters | Postings from April 2021

Re: on changing perl's behavior

Thread Previous | Thread Next
From:
Nicholas Clark
Date:
April 7, 2021 13:27
Subject:
Re: on changing perl's behavior
Message ID:
20210407132655.GD16703@etla.org
On Tue, Mar 30, 2021 at 09:54:48AM +0200, Christian Walde wrote:
> On Tue, 30 Mar 2021 05:07:37 +0200, Ricardo Signes <perl.p5p@rjbs.manxome.org> wrote:
> > This one I think I can only state as a trade-off per se:  Spending time on maintaining long-discouraged >behaviors is good only to the extent that they can't be deprecated and removed instead.  How do we know >whether we can deprecate and remove some behavior?  Well, that's largely a function of all of the above.
> Have more reading and thinking to do, but:
> 
> As far as i am informed this one is a red herring.
> 
> To my knowledge both DaveM and LeoNerd have opined that deprecating old behaviors gains little to none for the code Perl actually has at the moment.

I'm neither of the above but yes, generally, existing behaviour doesn't
"get in the way". In that

* most things that folks are interested in (and are achievable) are better
  syntax for existing fundamental operations
* (obscure) existing runtime behaviour rarely gets in the way of this

(So I think even for try/catch which is more than just compile time, there
isn't much in the existing runtime that gets in the way of implementing it -
it's mostly figuring out syntax and semantics, and then *extending* the
existing runtime to support it.)

Existing expectations for emergent behaviour *do* get in the way of
refactoring things - but these are rarely about syntax, or at least
documented syntax. It's hard to refactor things without discovering that
two (or more) subtle implementation details interact to create runtime
behaviour that wasn't documented or tested, but it turns out something
relies on it. That last part seemingly is "Hyrum's Law":

    With a sufficient number of users of an API,
    it does not matter what you promise in the contract:
    all observable behaviors of your system
    will be depended on by somebody.



Anyway, the substantive part of this message was meant to be the tale of
three punctuation variables $#, $* and $[, and how removing them had
different implications. All from memory:

    $#      The output format for printed numbers.  This variable is a
            half-hearted attempt to emulate awk's OFMT variable.  There are
            times, however, when awk and Perl have differing notions of
            what counts as numeric.  The initial value is "%.ng", where n
            is the value of the macro DBL_DIG from your system's float.h.
            This is different from awk's default OFMT setting of "%.6g", so
            you need to set $# explicitly to get awk's value.  (Mnemonic: #
            is the number sign.)


So the problem with $# is that (like all the rest) it's a global, and its
value can be changed at runtime. (Yes, technically not $[ since 5.000)

And this ability to be changed at runtime interferes with caching the string
values of scalars with floating point values - the caching is part of the
scalar, whereas the "correct" formatting is for the current scope.
(Strictly dynamic scope, but making it lexical scope wouldn't have changed
this).

So it was deprecated, and eventually removed, and all was good. I think
that this reduced a bit of complexity in just one place in sv.c

Oh pants, except for numeric locales, which have the same basic problem -
your formatting of 3.14 might be "3,14" in some places (or some calls)
depending on your locale.

So in the end, this wasn't a win.


    $*      Set to a non-zero integer value to do multi-line matching
            within a string, 0 (or undefined) to tell Perl that it can
            assume that strings contain a single line, for the purpose of
            optimizing pattern matches.  Pattern matches on strings con‐
            taining multiple newlines can produce confusing results when $*
            is 0 or undefined. Default is undefined.  (Mnemonic: * matches
            multiple things.) This variable influences the interpretation
            of only "^" and "$". A literal newline can be searched for even
            when "$* == 0".

            Use of $* is deprecated in modern Perl, supplanted by the "/s"
            and "/m" modifiers on pattern matching.

            Assigning a non-numerical value to $* triggers a warning (and
            makes $* act if "$* == 0"), while assigning a numerical value


As featured in Ricardo's example.

Again, dynamic, needs to be checked everywhere at runtime - qr// compiles
regexps that can be passed to different scopes, hence unlike the /s and /m
modifiers one can't just attach it to the compiled representation and be
done.

So removing this remove several small instances of code.


    $[      The index of the first element in an array, and of the first
            character in a substring.  Default is 0, but you could theoret‐
            ically set it to 1 to make Perl behave more like awk (or For‐
            tran) when subscripting and when evaluating the index() and
            substr() functions.  (Mnemonic: [ begins subscripts.)

            As of release 5 of Perl, assignment to $[ is treated as a com‐
            piler directive, and cannot influence the behavior of any other
            file.  (That's why you can only assign compile-time constants
            to it.)  Its use is highly discouraged.

            Note that, unlike other compile-time directives (such as
            strict), assignment to $[ can be seen from outer lexical scopes
            in the same file.  However, you can use local() on it to
            strictly bind its value to a lexical block.


OK, not strictly dynamic, but that's not the problem.

*This* one hides a bunch of "fun". Given that substring and array offsets
can be expressed as negative values, how does "1" interact with that?

That was actually reasonably well defined. The real fun was *implementing*
it. The naïve approach is something like `$len + 1 + abs($offset)`.

Problem was that's fine in Perl where values are floating point*, but in C
this needs to be done with integers. Which are probably the largest integer
on the system, and so "largest value plus one" can't be represented.
Hence (IIRC) there was a security bug about this, and then a lot of care
was taken by someone (sorry, I forget whom) to re-write the implementation
to avoid overflow in the corner cases.

And even after this and a lot of staring at it by several others it was
still "no obvious bugs" and not "obviously no bugs"

So getting rid of $[ meant that all this code could go.


So the point that I'd like to make is that there are *some* things that
might be removed to simplify the internals and reduce the support burden,
but *most* existing things don't get in the way, at least at runtime.

Nicholas Clark

* or fake it like they are

Thread Previous | Thread Next


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About