Front page | perl.gedcom |
Postings from August 2011
Re: How about Gedcom2 for a new namespace?
From: Mikkel Eide Eriksen
August 4, 2011 22:27
Re: How about Gedcom2 for a new namespace?
Message ID: 43AEE6BB-F7AB-4745-8682-B360C02A534D@gmail.com
On 05/08/2011, at 02.05, Ron Savage wrote:
> Hi Mikkel
> On Thu, 2011-08-04 at 22:44 +0200, Mikkel Eide Eriksen wrote:
>> Hi Ron,
>> Speaking of genealogy formats, I'm working some on a completely source-based format: http://carthag.github.com/sourcemarkup/ (don't mind the ugly color scheme, it was just a random one I chose).
> I'm glad to know you're doing this sort of work, but I have a complex
> set of reactions to it:
> o Isn't Cocoa an Apple-proprietary software thing? This implies anyone
> trying to use data in this format outside the Apple cocoon must have a
> separate set of code to import and manipulate the data. Who's going to
> write that? Only someone who chooses to support your format.
There are a number of open source technologies that would help, such as GNUStep and cocotron.org - but mostly I'm writing it in Cocoa since I'm a little fed up with Mac genealogy software and want to try my hand at writing a free, open app myself.
> o XML definitely handles nested data, so it can certainly be used as a
> communication format between users, but is the idea to keep the data in
> XML at all times? This requires an XML parser (which is a big topic I
> don't wish to pursue), and either using something like XML::Twig to
> access small parts of the file, or storing all the XML in a DOM-based
> structure, which normally takes up 100 (sic) times the space of the file
> itself (another big topic). This in turn leads to a discussion of speed
> of access for practical web-based display, and hence deployment under
> web servers such as Starman so that the code never exits, meaning the
> slow startup costs for the XML processor, etc, are avoided.
> o I'll say again I understand XML has its uses, but its proponents are
> still trying to live down the XML fanaticism of the early days, when
> every little thing was put in a XML file (the format of choice for
> control freaks :-), which required a huge parser to be fired up just to
> read even a 3 line file. As always, it's up to the proponents to support
> their suggestions, rather than choosing it first and then afterwards
> claiming it's appropriate. That applies to my suggestions too :-(!.
The format is not for internal storage, but for the exchange of the data between apps/users/services. So the extra cost of an XML parser (which isn't so bad these days) would only be needed when actually exporting or importing data from whatever internal format is used. In my app, I'm using Core Data to handle the data, which ensures referential integrity, predicated fetches, etc. Someone else could use Postgres, or perl hashes, or whatever they want.
Since the format is transcribed text of sources with a set of semantic metadata on top, it lends itself to a markup language, hence I've chosen XML. That is only required for the transcriptions themselves. It might be possible to do a mixed format where only the transcription itself was XML, and the "external" info (source quality, crossrefs, etc) was in some other format, but in my opinion that would needlessly complicate things.
> o So why go your own way anyway? Why not join the - very interesting -
> Better Gedcom group? I do thank you for the reference. We should all
> think about how that group and we Perl users can interact.
I've actually requested access to the BetterGedcom wiki (which I only discovered after starting my own thing), but have not yet had time to follow up on that. It seems like there are a lot of people out there fed up with the idiosyncrasies and shortcomings of the gedcom format. Hopefully we can create something extraordinary :-)
>> The idea is to use transcribed sources and mark them up with all info that is contained therein, so as to force all information to be referable to an actual source. From this data, it should be possible to build family trees, data sheets for individuals, etc. It is still very much a work in progress and is just at this point an idea and a very fluid definition of what I want it to be able to do. The site has two unrelated examples, a birth (source1, recored as prose) and a marriage (source2, recorded in a table).
> It's good you've directly focused on one of the major issues - how to
> handle textual material.
> I should say I have a strong suspicion an ideal solution (if there is
> one!) will end up being:
> o Have basic info (individuals, families, and hence relationships
> [i/f/r]) in a db (i.e. such as Postgres). This means rapid access in a
> viewport-like way so as to display a fragment of a family tree in a web
> page, and
> o Have all other material in either (potentially huge) text/binary
> fields in the db, or even in external files, all accessible via the
> i/f/r records.
This is internal storage, which hasn't concerned me as much. I'm much more interested in the exchange of gedcom data and how to accomplish this in a clean & nonambiguous manner. That said, splitting the data along those lines seems sound.
>> Obviously it would be impossible to generate this data from a gedcom file, but it would be possible to (lossily) export from this format to gedcom.
> Sure, but the whole point of my current attempt to stir people into
> responding is to think outside the 'Gedcom-ordained square', and to
> focus on what's needed, not what was defined in the past.
Agreed, but even if/when we do think up the be-all, end-all genealogy file format of the future, there would still be a transitionary period where people would need to interact with gedcom :-)
>> Additionally, this might also interest you, I came across it last month: http://bettergedcom.wikispaces.com/
> Yes, indeed. Probably they're way ahead of me on this matter... I'd
> better lie low until I study their material.
Let me know what you think!
> Ron Savage
> Ph: 0421 920 622