On Sun, Aug 5, 2012 at 3:20 AM, Aristotle Pagaltzis <pagaltzis@gmx.de> wrote: > That is a weakness in the scheme I outlined. Lines need to be fixed > length for the unpack sleight of hand to work, which means adding a perl (Without actually reading your code) my gut reaction is "don't do unpack slight of hand" then. Store lines in sorted module name order like this: "$name $json\n" # where $json has no newlines in it Then use Search::Dict to bisect to the right line, split the line into two fields, decode the JSON part into a hash and dig into it as needed. The bisection means it will be faster than reading every line and loading the whole thing into memory anyway. Only in the rarest of cases do you need to load the whole file -- most uses are still "corelist Foo", which benefits hugely from bisection. And if the JSON data for each module is in some delta format and only changes when a module is updated, a line only gets touched when it actually changes in a release, so the diff is annoying (a very long line) but not horrible (every line changing). Why JSON vs some other ascii delimited format? Because it allows structured data instead of just fields. (Plus anyone with JSON:XS installed gets a speed boost for free.) -- DavidThread Previous | Thread Next