Front page | perl.gedcom |
Postings from August 2011
Re: Validation of genealogy data
From: Ron Savage
August 6, 2011 21:29
Re: Validation of genealogy data
Message ID: firstname.lastname@example.org
On Sat, 2011-08-06 at 22:44 -0400, Stephen Woodbridge wrote:
> Hi Ron,
> I think you have summed this up correctly. Using a UID or UUID and
Yeah. I just made up the acronym UID, but of course a UUID would be just
> collecting a stream of them probably does the job. I think that one
> subtle point is that given an object identified by a UID it can change
> over time as information is added to it. I think version tracking on the
> object is a good idea. That would allow you to merge and object at some
> version and later deal with updating that object from its source to a
> new version.
> Overall, it sounds like you have the ideas and it sounds like it could
> be a very enabling tool.
Now I see the /real/ problem.
My email client has plenty of buttons with icons on them, but needs a
new one which, when clicked, scans the text of the email looking for
'ideas' and auto-generates the source code of the implementation...
> On 8/6/2011 7:30 PM, Ron Savage wrote:
> > Hi Steve
> > On Sat, 2011-08-06 at 11:32 -0400, Stephen Woodbridge wrote:
> >> On 8/6/2011 2:28 AM, Ron Savage wrote:
> >>> Hi Steve
> >>> On Sat, 2011-08-06 at 00:39 -0400, Stephen Woodbridge wrote:
> >>>> On 8/5/2011 9:04 PM, Darren Duncan wrote:
> >>>>> Ron Savage wrote:
> >>>> http://swoodbridge.com/family/Woodbridge/index.php?indi=I2921
> >>>> I keep all the data in Family Tree Maker, export that to a GEDCOM, then
> >>>> load it use a Gedcom.pm script into Postgresql database and serve the
> >>>> pages via php. The photos are integrated by a separate web app that
> >>>> allows loading, editing and linking them to the genealogy tables in the
> >>>> database.
> >>>> I really big requirement is persistent IDs for individuals. I have to be
> >>>> very careful to not do anything that might renumber them.
> >>> Noted.
> >>> Is there some specific action with programs we've mentioned which does
> >>> renumber them?
> >> Well the obvious one is a renumber command ;), but merging files and
> >> merge individuals some times creates a new individual and then copies
> >> the data from the two merged ones into the new which causes the new one
> >> to be a new number.
> > OK. This is what I wanted spelled out. Saves me having to make baseless
> > assumptions :-).
> > Yes, I see the difficulty. Thinking aloud...
> > Let's say we try to solve this by giving each INDI a unique # (UID)
> > besides the # in the INDI statement itself.
> > Renumbering changes INDI # but not UID.
> > When combining data from 2 parties, INDI can be set to anything, but we
> > haven't solved the problem, since the question now is: Which UID is
> > definitive? A: Neither. We've gained nothing. Right? But read on...
> > Or is it enough to record a trail of UIDs, so a set of UIDs can be
> > attached to the final INDI? This allows backtracking from the combined
> > data to the 2 source data sets (by ignoring the actual value of INDI,
> > and working off the UID). Would that suffice?
> > No. See below.
> >> But from a more general point of view and talking about versioning of
> >> data, if 100 people create an INDI record for the same person in
> >> separate research projects and later some of them merge their research
> >> at various points in time it would be nice to know if my INDI includes
> >> one or more of those other INDIs and it might be nice to know at what
> >> version of those INDI(s) got merged into my work.
> > I think this is just the above, extended such that each assertion about
> > an individual, not just each individual, owns a set of UIDs. Make sense?
> >> I suppose one way of thinking about this would be like SVN or GIT source
> >> code respository, where files were INDIs or FACTs and there exists a
> >> link like item the connects facts to INDIs or "LINK"s and INDIs to other
> >> INDIs. I'm not suggesting this as a technical design but as a way of
> >> thinking about the problem of revisioning and history.
> > It might be tempting to use git, but I think not:
> > o With git (etc) the end result is, for each assertion, 1 definitive
> > value (after a merge), but with a history managed by git to enable
> > tracing of where that value came from, i.e. what the alternative values
> > were at the time(s) of the merge(s).
> > o My feeling from the discussion so far is that what's wanted is to
> > carry forward all versions of the assertion, in parallel so to speak.
> > I really think this means a set of UIDs per assertion, with the UIDs'
> > purposes being pointers back in time to the multiple sources leading to
> > the 'current' state.
Ph: 0421 920 622