Front page | perl.perl5.porters |
Postings from March 2022
managing AUTHORS and .mailmap from git commit history mostlyautomatically.
Thread Next
From:
demerphq
Date:
March 1, 2022 15:52
Subject:
managing AUTHORS and .mailmap from git commit history mostlyautomatically.
Message ID:
CANgJU+WWEQb8-hO4OFyKFaJJ=ZRxOmwTvW_qpLzW0jzK8G6b5w@mail.gmail.com
Hi All,
I have pushed a PR: https://github.com/Perl/perl5/pull/19477 which
adds a tool which automatically updates and maintains AUTHORS and
.mailmap based on the commit history in the repo.
It is still a work in progress, I pushed without MANIFEST changes for
instance, and I haven't checked to see how nicely it plays with
Porting/checkAUTHORS.pl. But it has taken me a lot of time and effort
so I thought I would push it and get some feedback before I do more
work on it.
The basic idea is to use AUTHORS and .mailmap and git log as the sole
and only truth about our developers, and to make it possible to update
both with a simple execution of:
Porting/UpdateAuthors.pl
Unlike checkAUTHORS.pl it does not contain its own database of
information, it is all in either AUTHORS itself or .mailmap.
As part of this I added a dozen or so people to AUTHORS who were not
listed previously, and I added and reviewed about 1500 mailmap
entries. I also fixed about a half-dozen to a dozen AUTHORS entries to
reflect the individuals preference for their email address, generally
by contacting them directly via email, but sometimes by applying
personal knowledge, or even just typing their email address into
gmail.
It turns out our .mailmap file was a mess. I think people had
misunderstood the format, and there were people who entries like this:
A Name <email1> Other Name <email2>
A Name <email2> Another Name <email1>
alone with variants thereof, there were also entries where AUTHORS
showed one thing and the preferred data showed the other. It was
obvious to me that people were confused about the format, and were
putting things in at times backwards. To be clear, the format for the
file is that the data on the left is the *preferred form*, this is the
form that git will show. The data on the right is the "other form",
this is the data that will be changed.
The idea of my script and the patch generally is that we keep the
preferred form and authors in sync, and that we list each and every
email ever used by someone (including weird stuff like RT entries and
whatnot) mapping to their preferred form. Thus after my efforts the
file contains about 1700 entries or so.
There are some subtleties, for instance, in some cases we dont want to
show an email address, or we dont have an email address for someone
who has provided changes to the project, but we do have their name.
There is also a very small number of commits where we simply dont have
any name and the email is ambiguous and gives us no information about
who created the commit. Presumably we won't have these cases in the
future but to make this work I needed to deal with it all.
The idea is that you can just run the script and at the end everything
will be updated based on the data in the commit history and life will
be good.
This actually been a much bigger can of worms than I expected. Humans
are messy, and they do weird stuff like create commits with strange
author data. For instance, Paul Evans has commits in the history with
three different forms. The .mailmap now includes an entry for all
three mapping them to the name and email that was in AUTHORS. (I have
11.)
As part of this I found that we had not listed a fair number of people
in either file. For instance Alexey Borzenkov was not listed. I
discovered his name from his email by typing it into gmail (it's a
gmail address) and google helpfully told me. Another interesting case
is Ævar Arnfjörð Bjarmason, who has 4 listings in .mailmap, some of
which are a munged version of his name and an email of
<perlbug-followup@perl.org>.
With the mailmap changes I have made all of those weird commits will
now show his correct name and preferred email.
In some cases we do not want to show an email for someone, perhaps
because they are deceased. In this case I set up mappings from the
emails they used in the past to their name at <unknown>. Such entries
are included in AUTHORS but we do not show the email. So if you want
to hide someones email then we will do it consistently, by simply
setting their preferred email to <unknown>.
In a small number of cases we have commits where the only source info
is a random looking email address from the host they were working on,
in those cases the mailmap mapping is set to have an "unknown" name.
Those entries are not included in AUTHORS at all.
The tools should not ever remove someone from AUTHORS.
Review and comments appreciated. It is possible there are last minute
bugs I created while cleaning up the code just now. Please let me
know.
cheers,
Yves
--
perl -Mre=debug -e "/just|another|perl|hacker/"
Thread Next
-
managing AUTHORS and .mailmap from git commit history mostlyautomatically.
by demerphq