develooper Front page | perl.moose | Postings from February 2009

ANNOUNCE - Set::Relation version 0.6.0 for Perl 5

From:
Darren Duncan
Date:
February 10, 2009 00:28
Subject:
ANNOUNCE - Set::Relation version 0.6.0 for Perl 5
Message ID:
49913AA3.1030108@darrenduncan.net
All,

I am pleased to announce the first (widely announced, and the 9th actual)
release of Set::Relation, the official/unembraced version 0.6.0 for Perl 5, on
CPAN.  You can see it now, with nicely HTMLized documentation, at:

   http://search.cpan.org/dist/Set-Relation/

A short summary description with synopsis code is further below in this message.

While new, Set::Relation is effectively done (enough for a first major version),
with a full feature set and with everything fully documented in POD, and you can
start actually using it now.  That said, this module is officially in alpha
release status so you should take caution with it.  While its API is unlikely to
change much, and the code appears correct, a lot of it has not yet actually been
executed, and the current test suite is almost empty.  The module will probably
work now but might have breaks.  See further below if you'd like to help out
with this module's future development.

Also expected in the near future, though not today, is a corresponding version
for Perl 6, which was intended from day one.

The official discussion forums for Set::Relation currently are just the email
based ones listed at http://mm.darrenduncan.net/mailman/listinfo and labeled
'muldis-db'; the FORUMS pod section in Relation.pm itself also lists these.  Any
protracted discussion following this announcement would ideally take place
there, so it is easy to find aggregate information resulting from said
discussions.  As for replying in other forums, use your discretion as usual.  No
official IRC forums for Set::Relation or other Muldis database-related things
exist yet, though in the near future I expect I would get one setup on perl.org
or freenode.org, preferably I would want a logged channel.

--------

Set::Relation provides a simple Perl-native facility for an application to
organize and process information using the relational model of data, without
having to employ a separate DBMS, and without having to employ a whole separate
sub-language (such as Muldis Rosetta does).  Rather, it is integrated a lot more
into the Perl way of doing things, and you use it much like a Perl array or
hash, or like some other third-party Set:: modules available for Perl.  This is
a standalone Perl 5 object class that represents a Muldis D quasi-relation
value, and its methods implement all the Muldis D relational operators.

A simple working example:

     use Set::Relation;

     my $r1 = Set::Relation->new( [ [ 'x', 'y' ], [
         [ 4, 7 ],
         [ 3, 2 ],
     ] ] );

     my $r2 = Set::Relation->new( [
         { 'y' => 5, 'z' => 6 },
         { 'y' => 2, 'z' => 1 },
         { 'y' => 2, 'z' => 4 },
     ] );

     my $r3 = $r1->join( $r2 );

     my $r3_as_nfmt_perl = $r3->members();
     my $r3_as_ofmt_perl = $r3->members( 1 );

     # Then $r3_as_nfmt_perl contains:
     # [
     #     { 'x' => 3, 'y' => 2, 'z' => 1 },
     #     { 'x' => 3, 'y' => 2, 'z' => 4 },
     # ]
     # And $r3_as_ofmt_perl contains:
     # [ [ 'x', 'y', 'z' ], [
     #     [ 3, 2, 1 ],
     #     [ 3, 2, 4 ],
     # ] ]

This is the initial complement of public routines; besides the "new" constructor
submethod, there are these 68 object methods: "clone", "export_for_new",
"has_frozen_identity", "freeze_identity", "which", "members", "heading", "body",
"slice", "attr", "evacuate", "insert", "delete", "degree", "is_nullary",
"has_attrs", "attr_names", "cardinality", "is_empty", "is_member", "empty",
"insertion", "deletion", "rename", "projection", "cmpl_projection", "wrap",
"cmpl_wrap", "unwrap", "group", "cmpl_group", "ungroup", "transitive_closure",
"restriction", "restriction_and_cmpl", "cmpl_restriction", "extension",
"static_extension", "map", "summary", "is_identical", "is_subset",
"is_proper_subset", "is_disjoint", "union", "exclusion", "intersection",
"difference", "semidifference", "semijoin_and_diff", "semijoin", "join",
"product", "quotient", "composition", "join_with_group", "rank", "limit",
"substitution", "static_substitution", "subst_in_restr",
"static_subst_in_restr", "subst_in_semijoin", "static_subst_in_semijoin",
"outer_join_with_group", "outer_join_with_undefs",
"outer_join_with_static_exten", "outer_join_with_exten".

It is important to note that practically anything you can do in a SQL SELECT
(and in various other kinds of SQL), for any vendor of DBMS, you can do with the
Set::Relation routines (and ordinary Perl); in the short term a "how do I" kind
of FAQ or tutorial will be made, but it doesn't exist yet; meanwhile you should
be able to figure it out using the routines' reference documentation.  For
examples: 1. the  "SELECT ... FROM $foo" query portion is handled by any of
[projection, extension, rename, map, substitution, etc]; 2. the "WHERE" and
"HAVING" clauses are handled by [restriction, semijoin, semidifference, etc]
which includes "IN" and "NOT IN"; 3. the "GROUP BY" is handled by [group,
cmpl_group, etc]; 4. aggregation operators combined with "GROUP BY" are handled
by [summary, etc]; 5. ranking, sorting and quota queries like "RANK", "ORDER BY"
and "LIMIT" are handled by [rank, limit, etc]; 6. inner joins are handled by
[join, product, intersection, etc]; 7. outer joins are handled by the various
[outer_join_*, etc]; 8. union, intersection, difference, etc are handled by the
same; 9. "COUNT(*)" is handled by [cardinality]; 10. recursive queries are
handled by [transitive_closure, etc]; 11. sub-queries are supported everywhere
simply as the normal way of doing things; 12. other features like relational
divide, composition, etc are given by [quotient, composition, etc].

Set::Relation is a generic tool and can be widely applied.  It has been
developed according to a rigorously thought out API and behaviour specification,
and it should be easy to learn, to install and use, and to extend.  But in the
short term at least, this module is still assumed to be very un-optimized for
its conceptually low level task of data crunching, and you may want to avoid it
if your top concern is execution (CPU, RAM, etc) performance.  Set::Relation is
best used in situations where you either want to just get some correct solution
up and working quickly (conserving developer time), such as because it is a
prototype or proof of concept, or where your data set is relatively small, or
where your task is one that is less time sensitive like a batch process.  Some
suggested uses for Set::Relation include applying it to help with: flat file
processing, SQL generation, database APIs, testing database related code,
teaching databases, and general list or set operations.  See
http://search.cpan.org/dist/Set-Relation/lib/Set/Relation.pm#Appropriate_Uses_For_Set::Relation 

for more details.  Set::Relation's performance will be improved over time so
some of these issues should go away later, or the sibling project
Muldis::Rosetta (still under construction) will have much better performance
anyway due to its greater complexity to address such matters.

Set::Relation requires Perl 5.8.1+, Moose 0.68+, version.pm 0.74+,
namespace::clean 0.09+, and List::MoreUtils 0.22+; it has no other direct
external dependencies.  This module is pure Perl and a single file.  It is now
maintained in a Git repository; see http://utsl.gen.nz/gitweb/?p=Set-Relation or
the distribution's README file.

If you like Set::Relation, either as it is now or as you see it becoming, and
you would like to help improve it, I welcome any and all kinds of assistance as
you would like to offer such.

Probably the greatest help I can get if people want to is to supply test files
to confirm correct behaviour and expose current or regression bugs; other Set::
modules or database-related modules may be an inspiration for copying/adapting
tests from.

I would also like to build up a set of usage examples and basic tutorials, meant
to answer the sort of questions "how do I do this?".  For example, within the
context of a relational database represented as a Perl Hash whose elements are
Set::Relation objects representing SQL/etc tables/relvars, I would like a number
of brief problem descriptions, such as that provide example database schemas and
data (multiple questions/examples can share the same schema/data), saying first
in a sentence what a query is trying to find out, then example SQL/etc to do it;
for each example I/we would then supply Perl code for how to do the same thing
with Set::Relation; we have a side-by-side comparison.

Otherwise, I invite feedback on all aspects of the module's design,
implementation, and documentation.  For example, What sorts of changes do you
suggest to the criteria Set::Relation uses to determine whether 2 arbitrary Perl
values are to be considered identical or not (that's a big one); what sorts of
typical module serialization hooks should I or should I not be using as object
identifiers?  Is the documentation structured the best way it could be.  Is the
module making as much use of Moose's features as it can be, or making as much
use of the lesser known power features of Perl 5 itself as it should be?  Do you
think details of the module's API or semantics should change, such as to better
integrate it into typical or best practice ways of using Perl?  What additional
prior art such as other Perl modules should I be looking at, either that
Set::Relation should use as a dependency, or that it should copy/adapt
functionality or techniques from?  How are you applying, or would you consider
applying, Set::Relation to your work and what changes if any might help you
adopt it more easily?  Do you propose different internal syntax for the module's
code, or propose a different factoring of the code?  Can you suggest a better
way to package the module; eg would you propose an alternative to the simple
Makefile.PL?  Do you propose a particular structure for the test suite? What
about examples and tutorials; how might those best be organized and what sorts
of things should they contain?  What can you suggest for helping performance?
And then there's Perl 6; do you have suggestions for particular Perl 6 features
that should be exploited for Set::Relation's Perl 6 native version?  Or do you
have ideas for the Perl 6 language itself to adapt distinct Set::Relation
features into Perl 6 itself as if a relation were just another generic
collection type (which it is)?

Note that the work done on Set::Relation and in improving it and testing it will
later feed back into implementing Muldis::Rosetta, whose design overlaps.  It is
very helpful to me if Set::Relation can be made the best it can be, as soon as
possible, so to make said feedback more timely.

Thank you and have a good day. -- Darren Duncan






nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About