develooper Front page | perl.module-authors | Postings from March 2023


Bob Rogers
March 7, 2023 03:58
Message ID:
   I am currently finishing up a hack to extract objects from a
spreadsheet, tentatively called Spreadsheet::Table::Extract.  Here's the

    =head1 SYNOPSIS

	# say we have a checkbook with five columns that we want to
	# turn into a set of Checkbook::Entry objects:
	my $slot_from_heading = {
		# column      => object slot name
		'number'      => 'check_no',
		'date'        => 'date_days',
		'description' => 'description',
		'descr.'	  => 'description',	# synonym
		'payment'     => 'payment_value',
		'deposit'     => 'deposit_value'

	use Spreadsheet::Table::Extract;
	use Spreadsheet::Read;
	use Checkbook::Entry;	# for example

	my $book = ReadData('some-spreadsheet.xlsx');
	my $sheet1 = $book->[1];
	my @tables = Spreadsheet::Table::Extract->find_tables
	    ($sheet, $slot_from_heading,
	     validate_object => Checkbook::Entry->can('validate'));
	# assume we have only one table; if there are multiple heading
	# rows, there could be several.
	my @entries = $checkbook_table->extract($tables[0]);
	say "have ", scalar(@entries), " checkbook transactions.";

    =head1 DESCRIPTION

    C<Spreadsheet::Table::Extract> identifies subsets of spreadsheets
    and/or CSV files read with C<Spreadsheet::Read> that are conceptually
    equivalent to database tables.  Each such table is found by looking
    for its heading row(s) in the spreadsheet, and results in a
    C<Spreadsheet::Table::Extract> instance.  The instance can then
    extract the table into an arrayref of objects of a designated object
    class.  Multiple tables of different types can be combined side by
    side or above one another on the same sheet.

    The full process goes like this:


    =item 1.

    Call L</find_tables> to extract table definitions from a sheet,
    passing it a hashref that tells it how to identify the heading row.
    Do this once for each combination of sheet and table type.

    =item 2.

    Call L</make_disjoint> on all tables to adjust their boundaries for
    potential overlap.

    =item 3.

    Call L</extract> on each table to produce a list of objects of the
    desired application class.


(If you want to see it all, it's at .)

   The thing is, I'm not completely happy with the name.  It extracts
tables from spreadsheets, sure enough, but "extract" is more of a verb
than a noun, and the key method makes instances of itself that represent
table-like subsets of the spreadsheet.  Which are in fact extracts, but
it might be more accurate to make those Spreadsheet::Table instances.
But that seems overly grandiose for such a small module (it's 550 lines
all told).  Spreadsheet::ExtractedTable?  Suggestions?

					-- Bob Rogers Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at | Group listing | About