develooper Front page | perl.perl5.porters | Postings from September 2011

Re: Perl 5.16 and Beyond.

Thread Previous | Thread Next
Nicholas Clark
September 19, 2011 03:50
Re: Perl 5.16 and Beyond.
Message ID:

I like the plan. But the devil will be in the details.

It's a complex trade off between short, medium and long term
maintainability, and I don't think that this approach has been taken
by any comparable project.

On Tue, Sep 13, 2011 at 01:23:17PM -0600, Karl Williamson wrote:
> On 09/12/2011 10:28 AM, Jesse Vincent wrote:
> > If there is no "use v5.xx" line at the top of the code, the runtime
> > should act as it did on v5.14 without a use v5.14 line.
> Does that mean that without a 'use' line that the Unicode version will 
> be the one that is in 5.14?

Based on the default of

    If there is no "use v5.xx" line at the top of the code, the runtime
    should act as it did on v5.14 without a use v5.14 line.

then yes, I'm also assuming that the intent is that "runtime should act"
also means that the behaviour of Unicode should be the same.

There is a further section

	* New versions of Perl 5 should not break your existing software
 	* Backward compatibility must not stop Perl 5 from evolving
    Pay particular attention to "should" and "must" there.

On Wed, Sep 14, 2011 at 09:36:17AM -0500, Dave Rolsky wrote:
> On Wed, 14 Sep 2011, Dave Mitchell wrote:
> > I think the devil is in the detail. If, while running under 5.20,
> >
> >    use v5.16;
> >
> > is just about equivalent to
> >
> >    no feature 'list of features added in 5.18, 5.20';
> I think that realistically, it will have to be somewhere between this and 
> "acts exactly like 5.16 in all ways".
> I think each backwards incompatible change will probably need to be 
> considered on its own as a candidate for this policy.
> I think there are several categories of backwards incompatible changes.
> For bug fixes, I think we may just ask people to bite the bullet and 
> accept the change. It seems unrealistic for anyone to expect us to provide 
> infinite bugwards compatibility.

I don't think we get *any* meaningful choice on this.

The aim is to be able to slim the core distribution, and make [our]
maintenance easier. But every thing we permit to have divergent
implementations causes growth and [our] makes code maintenance harder.

Our size and effort scales roughly linearly with "features".
[as in ]

So to stand still on size and complexity we have to run very hard on
finding other savings. That's before we consider our ability to actually
add things.

> What about things like updating the Unicode database? That's not a bug 
> fix. But does it really make sense to ship every version of Unicode that 
> we've ever documented as support? Isn't getting a new version of this 
> database one of the reasons to upgrade? I think this is where things get 
> sticky.

In theory we can. In practice, I don't think we can.
We don't properly have *one* Unicode implementation working yet.
We don't have the infrastructure to support more than one - we'd have to
write it. It would definitely bring benefits, but relative to the effort
needed to actually fix the real bugs we still have, it's certainly a
distraction. I don't think that we have enough knowledge and time
[even with money no object] to do both before v5.16, and without unlimited
money, ever. So means everyone gets the same Unicode versions, so that forces
an either-or choice between "stick on what we have" and "upgrade everyone,
bug-and-feature alike". This is one of those short/medium/long term trade

Short term, keeping v5.16 on Unicode version 6.0.0 isn't really a problem
while we work out which approach is viable.

Medium term, keeping v5.18 on Unicode version 6.0.0 is bad, but if it buys
a long term of flexibility that's good. Medium term pain for long term gain.
But if we don't think it's do-able, then it's not worth the indefinite and
increasing medium-term pain of holding the Unicode version, then to change
our mind, abandon the approach, and make a big jump. We've denied people
Unicode improvements in the short term, and produced the pain of a big jump
even to people tracking yearly releases.

Tom has been explaining Unicode's stability guarantees. I think that we
are going to have to rely on these, and assume that changes that the
Unicode Consortium themselves make to existing behaviour are bug fixes.

> I also wonder if there will be a sunset on maintenance of these old 
> features. At what point might we consider removing a smartmatch 
> implementation from core? How many implementations would we be willing to 
> maintain?

I don't think that we can credibly consider *removing* any implementation,
without making a mockery of the whole *point* of the policy. The elevator
pitch of the policy is that if your code says "use v5.18;" then it will
keep working indefinitely.

This means that code saying "use v5.18" is allowed to rely on v5.18
features, and that (effectively) v5.20, v5.22 etc behave for it as new
non-binary compatible stable releases. Which means that they *aren't*
allowed to add new warnings. So how do we then decide to deprecate
something, let alone remove it, if we aren't allowed new warnings?

Even if we do allow new warnings in these stable releases, we'd effectively
end up with having to maintain $n different forks of the language *in one
codebase*, because each would need to be tracking which features are now
deprecated in which subversions, and which have been removed.

Which iteration of use v5.18 was sir asking for when sir typed that?

And the elevator pitch gains the small print "oh, but not really. When we
said indefinitely, and implied forever, actually we meant that in five to
seven years we might remove *some* of the features you've been using. So
you can't really rely on any of them"

To make this work, I believe we have to think in timescales of 5 or 10 years
of active changes to the language. Which means potentially 5 to 10 divergent
implementations of some things. What's the cost? How to we support this?

Questions I'm asking myself are:

1) Are modules shipped in the core covered by the guarantee about what
   v5.18 means?

   For example, in RT #72506 I propose a change to a not-really-supportable
   corner case feature of warnings, which I don't think anyone is relying on.
   But if that's considered a feature, not a bug fix, does that mean that
   we need to start shipping $n+1 copies of

2) What happens about invasive changes to the internals?
   For example, Chip's proposals for minimal copying will be visible to some
   code (particularly XS code making too many assumptions).
   They can't be restricted lexically.

   His types proposal possibly *can*, but it makes it way more complex, and
   doing this might actually introduce more bugs (or at least surprises)
   than it solves. [For example, passing a data structure into some other
   code that currently acts as v5.14 and reads values [with caching] would
   mean keeping current flags behaviour. Outer code is written to expect
   this - that a value "becomes" string or numeric. Then that inner code is
   tweaked, and use v5.18 is added. At which point the flags behaviour has
   to change. Only this would be visible to any calling code that happened
   to be relying on it. Hence adding v5.18 alone in one place might break
   other code.]

   Does the desire to minimise "use v5.18" breakage mean that any such
   structural changes to the underlying VM have to go through the exception
   process? If so, that's going to stifle if not kill improvements.

3) Are we going to assume that all undocumented, warning and error behaviour
   should stay the same?

   For example, Claes has worked on the todo item of accepting 0o42 as octal.
   This needs oct() to accept this format. Currently oct() only documents
   what it *does* accept as valid. So, the short-term-easy solution would be
   that oct() in the scope of v5.16 onwards accept 0o42, and earlier does

   But how is this implemented?
   As far as the C code goes, the obvious "clean" implementation takes about
   5 lines, adding a feature test.

   Except that this would be a test on the lexical scope of the caller.
   How does that fit with
   a) the desire to take references to builtins as if they are functions?
   b) the desire to be able to introspect builtins using %CORE:: ?

   The "clean" implementation doesn't fit sanely with being able to take a
   reference. It would provide a reference to a function whose behaviour
   changes depending on the calling scope. Whereas what's needed for sanity
   is for the function's behaviour to be consistent with the builtin's
   behaviour at the lexical scope where the reference is taken.

   This makes for a more complicated implementation - does one copy the code
   for pp_oct out into a second place (and fix bugs in two places), have
   conditional code compiled twice, or store state with the function reference?

   Also, how does it work with introspection? Will there be both
   $CORE::{'oct'} and $CORE::{'oct516'}? Or will there just be $CORE::{'oct'},
   but the value that Perl-spaces sees differ depending on lexical scope?

On Mon, Sep 12, 2011 at 12:28:47PM -0400, Jesse Vincent wrote:

> Standing still is not an option. Perl's internals, syntax and semantics
> have seen some much-needed improvements in the past few years. There are
> many additional changes we can't make because they may damage too much
> legacy code.

But I read this as you're saying that we should continue to make changes
that improve the internals, the syntax and the semantics.

Which I agree with. But for this plan to work, it needs to be sustainable.
We need to still be able to do these things in 5 years' and 10 years' time.

Which means I think we need to judge any changes on the basis of does this
pay off in 5 years? in 10 years? ever?

We only have finite effort available to us. Time we spend now refactoring
delays bug fixing and other improvements. If time spent now discounted at
10% per annum will never actually pay off, it's not worth doing.

> It is my strong preference that features granted R&R (removal and
> reinstatement) be implemented as modules, so as not to bloat the runtime
> when they're not needed. This isn't a pipe dream. Classic::Perl already
> does this for a few features removed in 5.10 and 5.12;
> If it's not possible to reinstate a feature we've removed with existing
> APIs, we'll need to look at the cost of removing the feature vs simply
> disabling it in the presence of a new-enough v5.xx declaration.

But not everything can be fully implemented as a module. For example, the
parser isn't pluggable. I don't know if it ever could be (fully), but it
certainly isn't *yet*.

Removing $[ meant that the parser code could actually get simpler. I can't
find the figure, but I think that chromatic measured the proportion of
the grammar needed to deal with the legacy 'do subroutine' syntax, and it
was large enough to be justifiable as a simplification worth the effort of
making, *if it's actually removed*. But we don't have a parser able to
do that *and* have it be re-instated via a module. So if we wanted to
remove it from the language with v5.16, we'd actually complicate the
parser and increase the maintenance burden, because we'd add more
conditional code to the core.

So the R&R policy comes with a cost - it makes some subset of the
deprecated features simply not worth removing, because it's now more
costly than keeping them. I suspect that this is one.

On the other hand, FORMATs keep being given as an example of something that
many people would like not to be there. As the "default is what 5.14 did",
FORMATs can't go away. Switching them off in the parser achieves most of the
"visible language simplification" goals of not having FORMATs

i)  less to teach, less to understand
ii) permitting re-use of $- as the start of syntax, instead of a scalar
    whilst saving the coding effort of re-implementing them

But is actually removing them a good trade off?

i)  The existing implementation is pretty stable and doesn't get in the 
    way of much
ii) removing them means some mix of
    a) *adding* a lot of hooks to let the existing C code work "outside"
       the core
    b) re-implementing a little used feature in new code, along with the
       resulting cost of having (and fixing) all the bugs this creates

So in this case, I think that it's worth the cost of making them
conditionally disabled, but it's not worth the cost of trying to purge
them from the core implementation, because over the future lifetime of
Perl 5, I think that we'll spend more effort than we save.

> For cases where we _can't_ implement R&R, I think we need to move to a two
> year deprecation cycle, so as to have as minimal an impact as possible on
> users who are upgrading.

Yes, this makes sense. Minimum 2 years, preferably longer.

But I'm not sure how we communicate clearly "this deprecated feature is
merely deprecated" vs "this deprecated feature is doomed". Particularly
if we change our mind about what sort of deprecated something is.

> It's time for us to start extracting parts of what has traditionally been
> considered the "language" part of Perl 5 into CPANable modules. To do this
> successfully, it is imperative that _nothing_ appear out of the ordinary
> to code that expects to use those features. If code doesn't declare use
> v5.16, it should still get the 5.14ish environment it would expect.

By which you mean that the policy should change. To date, new "things" have
been allowed if previously they were syntax errors. Henceforth, no language
changes, even those that are backwards compatible, should appear unless
asked for?

This does (mostly) remove the "problem" that one can write and test against
a newer perl interpreter, and not realise that one is relying on a new
feature. On balance, I think that this is better.

However, it's never going to be a substitute for actually testing, as bugs
will be fixed, and I don't think that all behaviour can be hidden. Likewise
code written on the older interpreter and running under C<use v5.12;> etc is
going to encounter "artifacts from the future". For example, if we're able
to move from throwing core exceptions as strings to
objects-that-stringify-the-old-way, then some existing code is going to spot
the difference, and may change behaviour in surprising ways.

I think we'd be on a hiding to nothing having a policy that we explicitly
hide examples of "future" such as this, because it will take progressively
more effort, introduce bodges that make future improvements more costly, and
eventually we'll hit something we can't conceal thoroughly, at which point
what gives? The policy, or the improvement. So I don't think we should try
to "guarantee" any "perfection" better than the level we've managed in the
17 years to date.

> Once language features are modularized, it also becomes _possible_ to
> maintain and improve them without requiring a full Perl upgrade.

On the other hand, maintaining something against multiple perl versions is
harder than just being in the core. As Zefram recently found out with Carp,
when he rolled it up as a CPAN distribution.

This might actually cost us more than it saves. We're great at modules.
But how many modules are XS? How many of those run on more than "both
kinds of operating system"?

> I don't know what we'll extract or when we'll extract it, but there are a
> number of language features that seem like they might make sense to make
> pluggable: Formats, SysV IPC functions, Socket IO functions, Unix user
> information functions, Unix network information functions and Process and
> process group functions. Jesse Luehrs has already built us a first version
> of an extraction, modularization and replacement of smartmatch.

This makes sense. We already have some infrastructure to do this. The code for
$! was generalised, and is now also used to implement %+ and %-

dbmopen and glob are both actually implemented as modules.

It should be possible to move more out. However, it's not a panacea. I'd
estimate that the total size savings for the interpreter binary will be no
more than 10%. Right now, comparing microperl [pretty much all of the above
missing] versus perl on the same platform [x86_64, gcc -Os, -DNO_MATHOMS]:

-rwxr-xr-x  1 nick  admin  1154360 18 Sep 10:49 microperl
-rwxr-xr-x  1 nick  admin  1258448 18 Sep 10:50 perl

$ perl -le 'print 1154360/1258448'

And we need to be very careful about how we autoload - eg, don't push all
the socket builtins out into Socket. It's tempting to do this (obviously
when autoloading one does not have Socket import anything), but it's a trap,
because it will mean that future people write use v5.12 code which they test
and works, but will break on a "real" v5.12, because they didn't realise
that they'd assumed that Socket would be loaded.

> * TL;DR
> 	- New versions of Perl 5 should not break your existing software
> 	- Backward compatibility must not stop Perl 5 from evolving
> 	- From 'use v5.16' forward, Perl should start treating 'use v5.x'
> 	  statements as "try to give me a Perl that looks like v5.x" rather
> 	  than "give me at least v5.x"
> 	- We're awesome at modules. Where possible, we should be
> 	  modularizing core features.

We're trying to do something which I think no other dynamic language has
done before - try to support more than one "version" at runtime from the
same codebase. To the best of my knowledge, no Python implementation can
support even close versions simultaneously, such as 2.6 and 2.7 or 3.1 and
3.2 together.

In 5 or 10 years, is the hope that Perl 5 is free to evolve as far from
v5.14 as Python 3 is from Python 2? Because even the current proposal for
supporting Python 3 in PyPy is an "either/or", not a "concurrently":

[estimate $70,000, and that's not for a complete transition - that's for
augmenting the codebase to be able to provide a Python3 VM/interpreter to
the end user. The VM is still implemented in Python2. See
and the reply. I wonder who likes PyPy as much as likes Perl?]

I think to make this plan work sustainably, we're going to have to
conceptually split

* Perl 5 VM
* compile time (lexer/parser/opcode generator)
* runtime builtins (whether they are "built in", or in a module)

so, Unicode version, copying semantics, etc, are that of the VM, and the VM
may change in newer releases.

Likewise introspection is a function of the VM, and older code may see
something new. For example, if we're able to switch to providing NFC for
symbol tables, then existing code will not be hidden from this.

And older code can't be isolated from things arriving "from the future",
such as exceptions thrown to it, or the nuances of objects returned from
code it calls.

The parser is built atop the VM, and converts language source code to
opcodes according to the relevant "grammar" for the version requested,
using the correct "builtin"s for that version of the language, loading
them as necessary.

The runtime functions as documented for the version requested. But it
provides the same level of change/non-change as currently XS code can
across perl (release) versions.

Nicholas Clark

Thread Previous | Thread Next Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at | Group listing | About