develooper Front page | perl.perl6.language | Postings from July 2005

Re: DBI v2 - The Plan and How You Can Help

Thread Next
From:
Darren Duncan
Date:
July 4, 2005 18:31
Subject:
Re: DBI v2 - The Plan and How You Can Help
Message ID:
p06230900beef5db21767@[192.168.1.101]
Tim et al,

Following are some ideas I have for the new DBI, that were thought 
about greatly as I was both working on Rosetta/SQL::Routine and 
writing Perl 6 under Pugs.  These are all language-independent and 
should be implemented at the Parrot-DBI level for all Parrot-hosted 
languages to take advantage of, rather than just in the Perl 6 
specific additions.  I believe in them strongly enough that they are 
in the core of how Rosetta et al operates (partly released, partly 
pending).

0. There were a lot of good ideas in other people's replies to this 
topic and I won't repeat them here, for the most part.

1. Always use distinct functions/methods to separate the declaration 
and destruction of a resource handle / object from any of its 
activities.  With a database connection handle, both the 
open/connect() and close/disconnect() are $dbh methods; the $dbh 
itself is created separately, such as with a DBI.new_connection() 
function.  With a statement handle, the prepare() is also a $sth 
method like with execute() et al; the $sth itself is created 
separately, such as with a $dbh.new_statement() method.  If new 
handle types are created, such as a separate one for cursors, they 
would likewise be declared and used separately.

With this separation, you can re-use the resource handles more 
easily, and you don't have to re-supply static descriptive 
configuration details each time you use it, but rather only when the 
handle is declared.  At the very least, such static details for a 
connection handle include what DBI implementor/driver module to use; 
as well, these details include what database product is being used, 
and locating details for the database, whether internet address or 
local service name or on-disk file name and so on.  This can 
optionally include the authorization identifier / user name and 
password, or those details can be provided at open() time instead if 
they are likely to be variable.

2. Always separate out any usage stages that can be performed apart 
from the database itself.  This allows an application to do those 
stages more efficiently, consuming fewer resources of both itself and 
the database.

For example, a pre-forked Apache process can declare all of the 
database and statement handles that it plans to use, and do as much 
of the prepare()-type work that can be done internally as possible, 
prior to forking; all of that work can be done just once, saving CPU, 
and only one instance of it consumes RAM.  All actual invocations of 
a database, the open()/connect() and execute() happen after forking, 
and at that point all of the database-involving work is consolidated.

Or even when you have a single process, most of the work you have to 
do, including any SQL generation et al, can be more easily be 
pre-performed and the results cached for multiple later uses.  Some 
DBI wrappers may do a lot of work with SQL generation et al and be 
slow, but if this work is mainly preparatory, they can still be used 
in a high-speed environment as that work tends to only need doing 
once.  Most of the prep work of a DBI wrapper can be done effectively 
prior to ever opening the database connection.

3. Redefine prepare() and execute() such that the first is expressly 
for activities that can be done apart from a database (and hence can 
also be done for a connection handle that is closed at the time) 
while all activities that require database interaction are deferred 
to the second.

Under this new scheme, when a database has native prepared statements 
support that you want to leverage, the database will be invoked to 
prepare said statements the first time you run execute(), and then 
the result of this is cached by DBI or the driver for all subsequent 
execute() to use.  In that case, any input errors detected by the 
database will be thrown at execute() time regardless of their nature; 
only input errors detected by the DBD module itself would be thrown 
at prepare() time.  (Note that module-caught input errors are much 
more likely when the module itself is handling SQL in AST form, 
whereas database-caught input errors are much more likely when SQL is 
always maintained in the program as string form.)  Note also that the 
deferal to execute() time of error detection is what tends to happen 
already with any databases that don't have native prepared statement 
support or for whom the DBI driver doesn't use them; these won't be 
affected by the official definition change.

Now I realize that it may be critically important for an application 
to know at prepare() time about statically-determinable errors, such 
as mal-formed SQL syntax, where error detection is handled just by 
the database.  For their benefit, the prepare()+execute() duality 
could be broken up into more methods, either all used in sequence or 
some alternately to each other, so users get their errors when they 
want them.  But regardless of the solution, it should permit for all 
database-independent preparation to be separated out.

4. All host parameters should be named (like ":foo") rather than 
positional (like "?"), meeting with the SQL:2003 standard.  The named 
format is a lot easier to use and flexible, making programmers a lot 
less error prone, more powerful, and particularly more resource 
efficient when the same parameter is conceptually used multiple times 
in a SQL statement (it only has to be bound once).  If anyone wants 
to use positional format, it could easily be emulated on top of this. 
Or, if native positional support is still important, then it should 
be a parallel option that can be used at the same time as named in 
any particular SQL statement.  See the native API of SQLite 3 for one 
example that (I believe) supports both in parallel.  This also means 
that execute() et al should take arguments in a hash rather than an 
array.

5. All details used to construct a connection handle should be 
completely decomposed rather than shoved into an ungainly "data 
source".  Examples of what should be distinct (not all being 
applicable at once) are: 1. the DBI driver module to use; 2. the 
internet server IP address or domain name and port; 3. the locally 
defined server device socket; 4. the locally defined service (eg, 
ODBC or SQL*Net) name; 5. the file system file name; 6. the file 
system directory name; 7. some other detail if any for fully in-RAM 
databases; 8. the authorization identifier / user name; 9. the 
password; 10. some other authorization credential, or channel 
encryption details, or whatever else; 11. what kind of database or 
what database product is being used, if known.  If the DBI driver 
talks to a client-configurable DBI proxy server, then, it should be 
possible to nest a set of the above settings (eg, as a hash-ref) as 
one part of the main settings given to the proxy client.

6. DBI drivers should always be specified by users with their actual 
package name, such as 'DBD::SQLite', and not some alternate or 
abbreviated version that either leaves the 'DBD::' out or is spelled 
differently.  Similarly, the DBI driver loader should simply try to 
load exactly the driver name it is given, without munging of any 
type.  This approach is a lot more simple, flexible and lacks the 
cludges of the current DBI.  DBI driver implementers can also name 
their module anything they want, and don't have to name it 'DBD::*'. 
A DBI driver should not have to conform to anything except a specific 
API by which it is called, which includes its behaviour upon 
initialization, invocation, and destruction.

7. Error conditions should *always* be thrown as exceptions by DBI; 
no exception thrown means that the request succeeded, even if its 
result was nothing/undef.  This is a lot simpler to implement or use 
than any alternative.  If people don't like that, then some wrapper 
should be employed to block the exceptions.  Or, if it is really 
important to have a non-exception alternative, then that should be an 
alternative, with thrown exceptions being the default behaviour.

8. Split off the proxy server/client stuff into a separate 
distribution; they are conceptually add-ons anyway and could benefit 
from independent development.  Split off any SQL parser utilities 
(eg, SQL::Nano, SQL::Statement) into a separate distribution, since 
only a small fraction of potential drivers would use them, and they 
are better off to just require them separately.  Split off all 
bundled DBI drivers (DBD::File, etc) into separate distributions, 
unless they exist soley to provide an example of how to make a DBI 
driver and are not actually useful in themselves.  The DBI 
distribution should focus simply on defining an interface, and let 
anything that will help with implementing the drivers to be optional 
and separate.

9. As Sam Vilain suggested, prepare() type methods should accept both 
SQL strings and any type of object as input, so that drivers have the 
option to directly accept AST forms; particularly useful when the 
drivers themselves would otherwise have to parse the SQL into an AST 
anyway.

And now ...

Here's an example of some things that implementing some of the above 
suggestions will let an application do (code may not compile as is):

   method init($self) {
     $self.db = DBI.new_connection( driver => 'DBD::SQLite', host => 'test' );

     my $sth1 = $self.db.new_statement(
       "select * from baz where abc = :bar or def = :bar" );
     $sth1.prepare();
     my $sth2 = $self.db.new_statement(
       "insert into baz (abc, def) values (:p_abc, :p_def)" );
     $sth2.prepare();

     $self.routines = (
       'get_all_baz' => -> ($bar) {
         $sth1.execute( { bar => $bar } );
         return $sth1.fetch_all_hashref();
       },
       'add_one_baz' => -> ($abc, $def) {
         $sth2.execute( { p_abc => $abc, p_def => $def } );
       },
    );
   }

   method main($self) {
     try {
       $self.db.open( user => 'jane', pass => 'k34l5jr' );

       try {
         $self.routines.{'add_one_baz'}.('hello','world');

         my $results = $self.routines.{'get_all_baz'}.('world');

         my $sth3 = $self.db.new_statement(
           "delete from baz where def = :foo" );
         $sth3.prepare();
         $sth3.execute( { foo => 'blarch' } );
       };
       $! and say "dag nabit!";

       $self.db.close();
     };
     $! and say "dog gone!";
   }

In the above example, only main() actually invokes a database; init() 
does load the DBI driver, though.  You can also invoke main() as many 
times as you want, and you can run init() prior to forking without 
trouble.

What I've said in this email is not exhaustive and I may add or amend 
items later; but, its a good start.  Feedback is welcome of course.

Thank you. -- Darren Duncan

Thread Next


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About