[PEAK] DB type conversion (for inputs to execute() and callproc())

Tue Dec 16 20:09:56 EST 2003

At 05:48 PM 12/16/03 -0600, Ian Bicking wrote:
>A difficult case to consider: supporting boolean columns, for Postgres, 
>MySQL, and Python, where each has very different notions of boolean.
>MySQL is probably just TINYINT, using 0 and 1 (there's no formal type).

Similar for Sybase.

>   Postgres has a BOOLEAN column, that accepts 't'/'f', or '1'/'0', but 
> not 1/0 (just to mess with your head).

[rapid head shaking and insane-person noises]  Yeeowch!

I think the safest thing to do here would be to use a model.Enumeration 
type to represent booleans; the easiest way to do this would be using the 
already-supplied model.Boolean type.  This would then be unambiguously 
convertible to any database's weird boolean types.

On the other hand, this presents an interesting problem, in that there may 
not be a single unambiguous rendering of model.Boolean for backends that 
don't possess a boolean type.  At that point, we're talking about a schema 
dependency.  Hmmm.

>   Python of course has a looser notion of booleans, with many possible 
> true and false values.

For specific conversion to a boolean type, this doesn't really matter.  One 
would just define something like:

declareAdapter( lambda ob,proto: ob and 't' or 'f',
     provides = [storage.dbType('pgdb','BOOLEAN')],
     forTypes = [object]
)

And this would be called for any Python object type without a 
more-specifically defined adaptation.

>So declaring a column in your model to be boolean can effect several 
>layers, which should be accounted for.  And all for such a seemingly 
>simple type.  Important, though -- having worked with code that used 't' 
>and 'f' for booleans (because of database interaction), I can say that 
>such situations are best to be avoided.

Bleah.  AFAICT, the only really messy bit, though, is mapping a Python type 
that might have multiple possible representations in the target DB, when 
the target protocol isn't specifically known.  For example, if you don't 
have a 'db.types.BOOLEAN' for the driver, and/or don't know that 'BOOLEAN' 
is the target type in the DB.

I think that that could be addressed, though, by allowing there to be 
"private" types for a backend, which could be set up as implied-by other types.

Hm.  Actually, you just made me realize that the idea of using DBAPI types 
for this is wrong.  What we *really* want is true driver types.  For 
example, the idea of Sybase's "tinyint" is a meaningful dbType, but 
"number" really isn't.  For procedure wrappers and the like, we should 
adapt to the type the DB really calls for.  Thus, if we pass a boolean to a 
procedure that takes a tinyint, we will have an appropriate conversion.

Thanks for the feedback, Ian.  You just saved me some dead-end development, 
even if it does now open up issues of precision and scale for numeric 
types, as well as size limits for string types.  (Yuck.)  Probably I'll 
punt on those for now.  I do rather wonder if the CORBA typecode facility 
in peak.model.datatypes would be useful here, and if so how to integrate it 
with PyProtocols (which it predates).  Maybe we need a 
'model.protocolForTypeCode()' mechanism to handle such things.  OTOH, it 
would likely just magnify the complexity of dealing with each DB's types.

Okay, here's what I think we'll have to do.  Each database will need a 
published mapping for how it's going to map built-in types for that 
database, and an explanation of how it will map user-defined-types from the 
underlying database (if it has any introspective capability).  In each 
case, the mapping is to a URI namespace, managed by storage.dbType().  I'm 
going to leave the issue of string length and numeric precision open for 
the moment, until I've had a chance to think it out further.  For now, 
we'll assume that no such types exist, and types such as VARCHAR(40) will 
be treated as simply VARCHAR by the drivers for conversion purposes.  We'll 
then extend the mechanism for parameterized types at a later stage.