[PEAK] The O-R mapping layers
Phillip J. Eby
pje at telecommunity.com
Mon Nov 10 13:39:10 EST 2003
Physical Layer
--------------
The lowest layer of the O-R mapping system will consist of a "database"
component, with subcomponents representing tables, or properly speaking,
relation variables, since they may actually be views or queries.
A "database" in this context does not mean a database connection. A
database component may contain (encapsulate) connections to more than one
physical database, be they LDAP or SQL or something else altogether. The
actual connection objects are irrelevant outside of the db component,
because all usage of the database is through reference to its "tables".
These tables will probably be something like 'storage.SQLTable' and
'storage.LDAPTable', derived from the existing peak.query.AbstractRV class,
once peak.query has gone through a bit more evolution. In particular,
they'll need to evolve methods such as:
* __iter__ (so that you can do e.g. 'for row in db.someTable:')
* makeQuery() (to create a callable that then returns an iterable; this is
so that queries can be pre-compiled and then cached as methods of the DB or
of other objects. Dynamic SQL generation involves quite a few calls, so we
want to be able to reuse them without regenerating them all the time.)
* insert(), delete(), and update() methods of some sort, to allow data
manipulation.
When combined with the existing '(where=..., join=..., etc.)' capability of
peak.query relvars, some per-backend function mapping capabilities, and
good ol' peak.binding, it should be possible to easily create a DB
component for any given application's database needs. Indeed, it should be
relatively straightforward within this layer to mask minor backend-specific
schema differences (such as the different date/time arithmetic functions
supplied by Sybase vs. Oracle, for example, or columns that have to be
renamed due to different reserved words on different backends).
All in all, this will give us a physical layer whose capabilities and
simplicity of setup should either match or exceed those of most other
existing "SQL mapping" libraries for Python. However, ours will also work
with LDAP, will allow constructing arbitrary joins and aggregate queries,
and allow abstraction of functions and data type conversion across multiple
backends --even at the same time. And, to top it all off, it'll be
workable with any data source that can be rendered as rows and columns of
atomic data values according to some stable schema.
This alone will be a useful thing to have. But then we'll add...
The Mapping Layer
-----------------
The mapping layer is responsible for defining the relationship between a
peak.model type or feature, and a pair of projections over relvars provided
by a database component. Okay, now let me explain that in English...
A mapping component will have a binding to a database component, and more
importantly, it will have bindings to various tables (relvars) provided by
the database component. Then, for each feature (attribute or method) of a
type that maps to the database, the mapping component must provide three
things:
1. The relvar (table, join, or other query construct) where the feature is
found
2. A projection (ordered collection of column names) over the relvar,
expressing where to find the type's primary key within the relvar.
3. A projection over the relvar expressing where the feature may be found.
Since relvars can contain arbitrary joins, aggregates, expressions, etc.,
this is an extremely general mapping mechanism. You might think, by the
way, that we could leave out the primary key projection and simply track
each table's primary key, but it really can't work that way. We also can't
eliminate the projections by simply performing a projection on the relvar,
because we don't want to join a table repeatedly (once for each feature
that appears there). Thus, we need to be able to know when two features
are based on the same relvar and primary-key projection, in order to reuse
that relvar+key.
For each type, we'll also need this relvar+projections data, intended to
represent how to find canonical instances of the type. That is, if we are
trying to find "employees", where do we query for a list of their primary
keys? This information is need to form the root of conceptual queries, but
is also needed if one wishes to obtain information from a particular
subtype of some general type in a database.
It's possible that each mapping component will be intended for a specific
object type, rather than being just a schema-wide mapping. That would
probably offer more opportunity for component reuse. So, a given mapping
class would just list the features of the peak.model type (probably as
bindings), defining the relvar+projection lists. Probably there would also
be bindings to reference the relvars, and a class attribute or two to
represent primary key projections, to make it easier for the other bindings
to do their thing. That is, most features would probably look something like:
aFeature = binding.Make(lambda self:
PathSegment(self.someTable, self.pk, ('a_column',))
)
(A relvar+projection makes up a "path segment" in a conceptual query.) In
addition to the raw mapping data, it's likely that a mapping component
would also indicate what features (if any) should be lazily loaded when the
object is loaded from the database. This is important for e.g. avoiding
loading CLOBs or BLOBs unnecessarily, and would also allow the easy
construction of a default query for the object.
Indeed, it might be possible for the base class of these mapping components
to provide bindings for query methods (see 'makeQuery' in the physical
layer notes) that would provide "get" and "find" operations on the object's
"default state" query.
At this point it begins to sound as though the mapping components are in
fact data managers, and I can see that caching policy perhaps should live
with the mapping components. But I'm not wholly convinced that DM's and
mapping components are the same thing, nor of what shape DM's will
metamorphose into by the time this design (and its implementation) are done.
Conceptual Queries
------------------
The conceptual query layer isn't really a component layer like the
others. Instead, it will be services provided by peak.model classes, like
our old 'Employee.where()' example. The final query syntax will actually
resemble our original proposed syntax, with keyword arguments being a
shortcut to express feature information, if they are being "and"ed
together. IOW,
Employee.where(foo=bar,baz=spam,__as__='emp')
is a shortcut for the following lower-level query expression:
DEFVAR('emp',True,
TYPE(Employee,
AND(
FEATURE('foo',EQ(bar)),
FEATURE('baz',EQ(spam))
)
)
)
(or something similar). Anyway, it should be possible to convert the
resulting object into an executable query, perhaps by passing it to the
container that holds all the mapping components. Actually, it should also
be possible to convert it into just a relvar, where it might then be
possible to do insert/update/delete operations, depending on the nature of
the underlying query.
And that's it. Those are our layers. The higher we go in the layers, the
less precisely we know how they'll work, because they depend on things that
are still uncertain in the lower layers. But, as we move forward these
things should be able to be sorted out pretty clearly.
I caution you, however, that if you currently depend on the existing "data
manager" framework, you should be aware that it may change significantly by
the time this O-R mapping system is complete. That's because we're likely
to move away from the notion of "DM as mapper to arbitary data source" and
more towards "DM as a user of relvar(s)". Given the new tools for relvar
manipulation, I believe we'll wind up in a place where such manipulation is
something you do outside the DM, which will simply receive a ready-to-use
data structure. And, if you are mapping to other kinds of external systems
(e.g. Ulrich's IMAP work), you'll simply work on exposing those systems as
relvars. That is, table-like constructs of atomic values, that can
potentially be joined with other data sources.
More information about the PEAK
mailing list