[PEAK] DB type conversion (for inputs to execute() and callproc())

Tue Dec 16 18:33:44 EST 2003

Currently, peak.storage.SQL supports automatic type conversion on all data 
*received from* the database.  It can do this because, for a given database 
type there is arguably one "best" Python type to render that value as, for 
the given application.

However, it does not currently support type conversion for values that are 
being *sent to* the database.  One must manually render the values as the 
correct datatype (using e.g. 'db.API.DateFromTicks()'), and then either 
convert them to strings and embed them in SQL, or supply them as parameters 
to the cursor.execute() method.

But for stored procedures, this will be inconvenient at best.  One of the 
goals for stored procedures is that they should be able to accept 
application types (such as the Python 2.3 'datetime' types included with 
PEAK) and convert them to the necessary database type(s).

Therefore, it would seem a conversion mechanism is necessary.  Here are the 
scenarios I think it might need to cover:

* convert this Python object to "something that works with the DB" in a 
general sense (i.e., the specific DB target type is unknown)

* convert to a DBAPI-level type, such as NUMBER, STRING, BINARY, etc.

* convert to a specific DB native type such as NUMBER(10,2) or some such.

The first item is critical.  The second may be needed to eliminate 
conversion ambiguities for stored procedure arguments.  The third item 
seems like a YAGNI, especially since correct implementation could be 
complex.  (And I didn't even list supporting per-connection configuration 
for conversions, since the assumption here is that a given Python type 
should have a single unambiguous representation for a given DB backend.)

So, what I propose is to create a 'types' attribute on DB connections, such 
that e.g. 'db.types.NUMBER' would represent the protocol to adapt to to get 
a "number" value from some python type.  In addition to the DBAPI types 
supported by that driver, there would also be an '_ANY_' protocol.

These protocols would be implemented as 'protocols.protocolForURI()' 
objects within a driver-specific namespace.  For databases with multiple 
drivers (like Oracle and PostgreSQL), I'm not sure if we'll have any 
overlap.  If so, we may create a URI namespace for the "generic" DB, and 
then have implied protocols for the specific drivers.

Anyway, by using protocolForURI(), it means that application code can 
declare adapters for specific backends without those backends needing to be 
installed.  That is, if I want to declare an adapter from my application's 
date class to "DCOracle2 datetime", I can do so by using the URI for 
"DCOracle2 datetime", whether DCOracle2 is installed or not.  (Of course, 
you could use the 'whenImported()' function to set up a hook to do this, 
too, but I think that's more awkward.)

In order to simplify the declaration process, I'll probably have 
peak.storage.SQL export a utility function like 
'storage.dbType(driver,type="_ANY_")' that returns the right 
protocolForURI() object.  Thus, one would declare one's adapter as 
providing e.g. 'storage.dbType("DCOracle2","DATETIME")' to declare an 
adapter to DCOracle2's DATETIME DBAPI type.  And, the 'types' attribute of 
a db connection would simply set its attributes to the dbType() objects for 
the connection's driver.

Finally, functions that wanted to convert a Python object to a suitable 
object for the database type would use 'adapt(anObject, 
db.types.DBAPITYPENAME)' to get an object suitable for passing as an 
execute() or callproc() parameter value.

Comments, anyone?