Front page | perl.perl5.porters |
Postings from July 2007
Re: VCS for Perl 5, refocus please
From: Nicholas Clark
July 20, 2007 10:17
Re: VCS for Perl 5, refocus please
Message ID: 20070720171717.GT20876@plum.flirble.org
On Thu, Jul 12, 2007 at 02:15:29PM +0200, Rafael Garcia-Suarez wrote:
> The current discussion on VCS goes nowhere, because it focuses on
> technicalities rather than on the practical side of the situation.
> What we want, technically, is:
> 1. something that does branching and merging at least as well as
> 2. something in which we can import the complete Perforce repository,
> including branching metadata. (and not forgetting metaconfig.)
> Side note. I see lots of technical arguments going on about VCSes on the
> list, but none that are actually critical to the proper operation of a
> repository for Perl 5. (Remember that the development of Perl 5 is not
> distributed. Everything in the core is tightly inderdependent. We're not
> Linux or X.org. We don't have lots of public branches, nor the need for.
> Neither have we to deploy often something based on the trunk. We don't
> need git. Private branches are sufficient.)
> Actually, the proper technical arguments, that seem to be forgotten,
> are: do all clients properly handle binary files (think about
> uupacktool.pl) and CRLF translation? Do they know about the executable
> permissions on files; or, do they clobber or retain mtimes over
> checkouts; or, do they clutter the working copy with horrible hidden
> files or directories? That's the kind of stuff you think about, when you
> have to produce a portable tarball.
I think that it's not been completely documented, but the current workflow of
perl development has two distinct parts:
Development perl, where multiple people commit changes, their own and third
party, to the VCS head.
Maintenance perl, where one person cherry picks (sets of) (parts of) changes
applied to the VCS head, and merges those across to the branch.
[More than once maintenance branch can exist, in which case substitute 'head'
with 'the next more recent maintenance branch']
The repository is central and available from well known public servers
The people involved in the development jobs change, but the public servers
endure. The people committing to the VCS neither administer the machine, nor
use it for general development.
The parts are distinct. When I'm wearing "maint" hat I want to work online
against the master repository. I'm sufficiently paranoid that I don't want
to trust any sort of mirror/sync from the master maint to the "public"
repository. I don't desire to do merging to maint offline.
Whereas when I've been working on "blead" it's been different.
To answer Yves' question, the first big thing that I did was the IV/NV
code. At that time (2001) I'd used SCCS, RCS and CVS, and wasn't
particularly sold on the benefits of local version control, so was more than
happy to deal with files locally, and merely send patches upstream, which
(usually) Jarkko applied.
When I was working on the SV structure re-arranging, I had a (work) laptop,
svk had appeared, and clkao was promoting it. (Well, at least, I noticed
him and it. It helped that he was sitting opposite me).
At the time either Robert or clkao (I forget) had a svn mirror of Perforce
head, and I synced my local svk repository to that, and then experimented
with various things in local branches.
However I never thought to want to import my change history back to perforce -
effectively I sent myself patches. But in some ways this was also reasonable -
not a huge amount of information was "lost" because most of my effort was
going sideways (down failed approaches) rather than forwards (sequential,
One thing that was definitely useful was that I didn't have to put any effort
into getting perl into a local distributed system. Or tracking it. At the
points where upstream mirrors failed, I didn't update my svk. And work I
did was on my perforce checkout on my laptop, and checked in at the end of
the train journey.
To my mind there are two halves to the VCS discussion
1: The repository
2: The client
The client and the repository may well be on the same machine. But might not.
[pinching shamelessly from (at least) Yves, Schwern and John Peacock]
1: Must be reliable.
2: Must have a viable future. (Meaning it must be well supported).
3: Must be able to handle a project the size of Perl reasonably efficiently.
4: Should be open source.
5: Should not require tight client/server version coupling (ie. a 3 year old
client should be able to talk to an up-to-date server).
6: Should be something we have a local expert for.
7: Should be capable of having a remote central server as a master repository
1: Must import the full version history from the Perforce repository back to
change 1. On all the branches. 
There are times when we need to inspect the code back a long way.
2: Must have some way of pulling out byte perfect copies of each perforce
branch by perforce revision number. 
This does not need to be a computationally cheap operation, if that helps.
3: Must import perforce metadata about changes not yet integrated between
trunk and active branches. (Which I think currently is just blead and 5.8,
but might be the two 5.6 branches too)
4: Must import perforce metadata about file datestamps and executable files.
5: Must support partial integration of changes between branches, at the file
level. This is a requirement, not a nice-to-have, brought about by how
perl development is done. 
6: Tagging of the existing perforce changes on import must not preclude their
partial merging. (as described in 5)
If we don't do all of the above, we irrecoverably loose history.
If we do the above, we can migrate to a better repository format in future,
if circumstances change.
The "seriously nice to have" about "distributed" systems is that change
history can be imported, rather than just patches.
7: Should be able to import change history. 
8: Does not need to be portable, if it is capable of client/server operation
and portable clients exist for it.
1: Must run on all platforms that we have committers using, specifically
Win32, Linux, *BSD, Solaris, AIX, HP/UX. I'm not sure of the situation on
VMS currently with Perforce. Irix and Tru64 would be nice to have.
2: Must be capable of cherry picking merges at the file level. 
3: Must be capable of accessing the not-yet-merged metadata, such that there
is no behaviour difference between merging a change committed pre-migration
and a change post-migration
It is not a concern what language(s) the client or server are written in,
provided that they meet the requirements.
If we don't actually demand open source software, and don't actually
demand that we can import revision history rather than just patches,
then up-to-date git or svn mirrors tracking blead from perforce will
meet the "must" requirements, and most of the "should"s.
0: I've done merging from blead to maint on at least 4 machines. The "master"
machine from which the release tarballs have been rolled has changed once.
It may well change again. None are my machine, and none are capable of
serving out that master repository. In turn, I don't want it to be a
requirement that either one has to have a shell account on the master
repository server, or that that server is actually only a mirror.
1: Further back is nice to have, but not having it we are no worse than we are
currently. Keeping the existing perforce repository live as a way to dig
back into the past that we didn't import is not sufficient. We would end
up needing to write tools to annotate files cross system, and we would need
to continue be locked into the requirement to have a licence and a host
2: onto a case sensitive, case preserving file system on an operating system
where the text file line ending is a single newline
3: As John Peacock notes at some points "binary" files have been checked into
4: Certainly there are times when I've messed up and committed to 2 branches
at the same time. I don't see a need to pull out the exact //depot/...
at a particular perforce revision, with all branches perfect. Not having
this requirement might make things easier.
5: Perforce records other things. I don't think we have any need for its data
about which files are checked out read/write. I also don't think we actually
need to track what is "text" and what is "binary" - the CR/LF fixups for
release are done with a script
6: As best I can tell from using the command line interface, Perforce tracks
what has(n't) been integrated only as the tuple (change number, file)
Whilst it gives stats in terms of diff hunks merged/not merged, I can't
see a way to get it to record conflict resolution or merge cherry picking
within a file. That would be a nice to have.
7: I believe that this gets us "distributed" version control.
8: The requirement crops up for at least 2 scenarios:
a: Some style of changes is applied across all of blead, including to dual
life modules. Patches are sent upstream to those module maintainers.
The changes to core can be merged to maintenance as soon as the pumpking
desires. But policy and sanity dictate that the changes to the dual life
modules aren't merged to a stable release until they are also in a
released upstream version
b: i) Some part of blead is changed in a complex way
ii) Some style of changes is applied across all of blead including that
part. A lot of Andy's work was like this
iii) A bug fix is made to an area that has changes from (ii)
To merge (iii) requires merging some parts of (ii), but not all.
But if (i) might be merge-able after a longer review period, then when
it is merged it is important still to be able to easily merge the
remaining parts of (ii)