Front page | perl.gedcom |
Postings from August 2011
Re: A draft proposal for UUIDs
From: Ron Savage
August 17, 2011 00:31
Re: A draft proposal for UUIDs
Message ID: firstname.lastname@example.org
On Tue, 2011-08-16 at 21:11 -0400, Stephen Woodbridge wrote:
> On 8/16/2011 7:14 PM, Ron Savage wrote:
> > Hi Steve
> > On Tue, 2011-08-16 at 09:48 -0400, Stephen Woodbridge wrote:
> > [snip]
> >>> Now, if some interface code allows the user to create INDIs, say, them
> >>> they have to be flagged as having a different UUID. Or do they?
> >>> If the original UUID belonged to the source, then yes, since the new
> >>> INDIs are coming from a different source.
> >> I think this is the correct answer. the UUID belongs to the source of
> >> the import action the created the data when the import did not have
> >> UUIDs of its own.
> >> Adding a UUID for the import action would then allow all the data to be
> >> later purged if it needed to be so there might be value in adding a UUID
> >> to the import even if the imported data already has UUIDs.
> > I think we're getting things clearer now.
> > To summarize: For a given db, each source which contributes records must
> > be separately identified by a UUID, with that UUID attached (somehow) to
> > each record imported.
> > That means various types of reports:
> > o Pick 2 UUIDs and process (e.g. compare, update, export, delete) just
> > the records belonging to those UUIDs.
> > o Pick 2 UUIDs and flag records such that data with (from) UUID #1 is
> > deemed more reliable that data with UUID # 2. Clearly both datasets are
> > preserved.
> > o Pick 1 UUID and process (e.g. update, export, delete, ...) just the
> > records belonging to it.
> > o Many others possibilities ...
> > Good stuff!
> Hi Ron,
> Yes exactly, but I not sure that it should be limited to sources because
> from a single source you might have some good data and some bad data. So
> it is fine to say this is more reliable than that, but you also need to
> be able to say this item is flat out wrong.
> Obviously you can go crazy with this and tag everything a UUID, but here
> are the things I think are most important:
> 1. INDI, FAMI, and NOTE and individual items attached to these
> 2. actions performed on these or on the database
> Oh! I just realized something we are mixing two different needs here
> 1. INDI, FAMI, SOUR and NOTE records need a UUID that does not change
> this is a persistent object identifier. In a given system INDI::UUID=27
> should always get me the same INDI regardless.
> 2. for history and object version tracking so you can merge or re-merge
> a data set, you need a version number that gets incremented every time
> the object gets changed. So say I import Joe's GEDCOM and merge it with
> my file in January and then in August I get an update. I can ignore all
> the UUIDs from Joe that have the same version as in the new import and
> only UUIDs that I do not have or have new versions, need to be merged.
> So I think we have two separate needs here that should not get merged to
> avoid confusion: Object need UUIDs and Actions (add, import, edit
> delete, etc) cause version changes to Objects. Is a version just another
> UUID? If an object like an INDI or INDI::BIRT is in two separate systems
> and is edited in both systems you would not what them to be able to have
> the same version number.
> So a possible use case: I create an INDI in a system A and it has UUID=x
> and this is exported to a GEDCOM and imported into a another system B, I
> assume it retains it UUID=x but also has some additional information
> that it was imported attached to it. Now the BIRT record is
> added/updated separately in both systems. Later I import import the
> system B back into system A.
> I'm just trying to think this through. There are obviously a lot of
> additional nuances that can be put on this, like each BIRT record could
> reference a SOUR record that would have a UUID and later identical or
> similar SOUR records could be merged keeping both UUIDs.
> Maybe this is getting to be overkill? Is anyone else following this
> thread? I see these things as being a significant aid to managing data
> and merging and updating data in an automated way. But then again maybe
> no one else cares.
We can't possibly design a mechanism for fiddling UUID in order to
emulate a version control system such as git. That's utterly futile.
So, we need to design UUIDs to serve whatever purpose people need which
can't be provided by git/etc.
(I'll probably answer your email's points separately. I still have to
think about the non-version control aspects of UUIDs :-).
Ph: 0421 920 622