Front page | perl.gedcom |
Postings from August 2011
Re: Validation of genealogy data
From: Ron Savage
August 5, 2011 17:43
Re: Validation of genealogy data
Message ID: firstname.lastname@example.org
On Fri, 2011-08-05 at 16:33 -0400, Stephen Woodbridge wrote:
> On 8/5/2011 2:13 PM, Darren Duncan wrote:
> > Mikkel Eriksen wrote:
> >> A big thing is to differentiate between "errors" and "warnings".
> >> Most things should be warnings, only literally impossible things
> >> should be errors. Thus, a parent having a child at age 10 should be a
> >> warning (even younger parents have been recorded after all), but a
> >> parent being younger than their child, that should be an error.
> >> Also, such age checks should only matter for biological children, I've
> >> gotten warnings from software because a step mother was "too old" to
> >> have a child at that age (no matter that she didn't actually have that
> >> child).
> > I would be much looser than that, even to the point that it is nearly
> > impossible to have importing errors.
> > And here's one of the main ideas from my own genealogy database project.
> > A good database needs to be able to store and organize contradictory
> > information, such as a parent being younger than their child, rather
> > than throwing up its hands and saying it is an error.
> > As you probably know, in real life there can be many sources for related
> > subjects, and it is often the case that they may contradict each other.
> > We need to be able to record all sources and what they say, even if they
> > don't agree.
> > A related main idea of my own project is:
> > Conceive that the database is not recording actual facts, but rather
> > assertions or statements. We are never completely sure that something is
> > true or false, but rather that just there is agreement or not. So the
> > database is not saying "this is true", but rather it says "X says Y and
> > W says V and so we (the database) say "M is N", where some of those
> > letters may be the same.
> > -- Darren Duncan
> Absolutely, one of my most frustrating Genealogical mis-adventures was
> add some data from source A then later deleting it and adding different
> data from source B, etc from 3-4 sources, until I finally figured out
> that one of the early sources was flat wrong and many people had used
> and referenced that source. So in my notes I also documented all the BAD
> sources and tree information to document the fiasco and to avoid changes
> in the future.
OK. So we're saying we should be able to store multiple versions of any
assertion, since - at least at first - it's all 'just data'.
Then, we should be able to report on various matters regarding this
data, such as the error/warning (of type 'inconsistent') when someone
says today (Aug 6th) is a Sunday, when in fact it's a Saturday, since
they may have meant Aug 7th.
To me this suggests that all the data be preserved indefinitely, even
after the user (or the program as the user's behest) has somehow chosen
one version to be promoted as the definitive version (not necessarily
making it correct), even though future research could change the status
of that version. This in turn suggests the need for a multi-valued flag
attachable to each version indicating the researchers degree of faith in
the veracity of that version.
Do any current programs support such labelling of assertions?
Ph: 0421 920 622