On 4 August 2012 21:29, Aristotle Pagaltzis <pagaltzis@gmx.de> wrote: [snip details of space separated table format] > But the vast expanses of whitespace gzip-compress to peanuts (<25Kb). > Do we have gunzip in core? Yes, see the approach BinGOs mentioned, code here: https://github.com/bingos/module-snorelist/blob/master/Data.PL (Note this is uuencoding the gzipped data, so a tiny bit larger, but I'm not sure that's really needed.) [snip details of searching data file] > So we’re looking at <1MB in memory (incl. all overheads), a pittance on > disk, near zero load time, most parsing work done in a few heavy-weight > builtins with almost no looping in Perl code, and equally fast access to > the data by either axis, with no spin-up key index generation for either > of them. Putting a tie interface on top of that might not be that nice. I do wonder if the next step is to make a cleaner API and deprecate the access via hash "API" as mentioned earlier in the thread. > Will a patch be accepted if I try this and find the results live up to > the promise? Did I miss any reason why this is a bad idea? Sounds sane in general (not my call to accept patches though). With this approach I assume generation will be involved somewhere and it's worth thinking about that aspect: * Where will the generation will take place? (First thought is a .PL file run as part of the core perl build process and Module-CoreList build process -- in that case we wouldn't reduce the size of those distributions, but may reduce the size of built packages) * What the diff for a release manager will look like (Presumably the data can be in a nicer format than a table with very long lines if it's generated from something else). * How this fits in with scripts in Porting, etc. (e.g. test_porting at the moment will act as a sanity test on CoreList.pm due to the strict version check there, may be worth having an explicit test for the data consistency). DavidThread Previous | Thread Next