On Sun, Jul 29, 2012 at 11:08 AM, Chris 'BinGOs' Williams <chris@bingosnet.co.uk> wrote: > All the data is compressed and uuencoded. > > This does have an affect on performance. The real win might be to get to a format that can be searched on demand so we don't load the whole thing into memory just to check one module. Ideally, we'd pivot everything to make module name the primary key and have a data structure per module with the per-perl-release data. I think it might be possible to adapt some of my Search::Dict wranging from the Paris QA hackathon to do it. E.g. have a data file with "$module::name $json_data\n" per line. Then Search::Dict the data file and convert the JSON data part and that would give answers to 99% of questions people ask with corelist. For the handful of users that need full data per perl, the time cost of loading it all up should be bearable. N.B. I'm not planning on doing that work, but if someone is motivated, it's another way to do it. Eliminating the repeated module names from the file probably accomplishes a substantial size reduction. Delta representation could be added at a per-module basis as well, of course. -- DavidThread Previous | Thread Next