[PEAK] ORM with the Trellis
Phillip J. Eby
pje at telecommunity.com
Mon Feb 18 19:07:21 EST 2008
So, this post is a brain dump of my current plan(s) for developing
object-relational mapping support for the Trellis.
Basically, the idea is that you will write simple rules to transform
objects into flat records, akin to what one might use in a relational
database. Other rules, in turn, will map back from the flat records
into objects.
Of course, anything other than flat values or simple 1:1 references
will be slightly more complex, requiring the use of virtual data
structures that translate operations on the collections into set
operations on record sets.
(Being trellis-based, of course, these virtual structures should only
need to be implemented once, and made to work with any underlying
trellisized collection of records.)
However, this actual "object" layer is actually the least
"interesting" part of what needs to be done, because it's really
nothing more than an API facade, wrapping a record-oriented API with
an object-oriented one. And the trellis should make that pretty
straightforward.
So, the more interesting parts, in my view, have to do with
developing the record- and recordset-oriented APIs, and the
lower-level mappings of these APIs onto storage back-ends.
My plan is for the record-level APIs to be based on the tuple
relational calculus (TRC), with extensions for grouping and the
like. The TRC is also the basis of SQL, and it should be relatively
straightforward to turn TRC specifications into SQL queries.
Equally important, the TRC is also a good intermediate form for
translating object-based list comprehensions. List comprehensions
can be considered a superset of the domain relational calculus (DRC),
a close neighbor of the TRC.
The idea is to be able to take a string like:
[(p, c) for p in Persion for c in p.children]
And turn it into a TRC representation like:
c = ResultVar('c')
p = ResultVar('p')
PersonRecord(c, parent=p)
And from there to SQL like:
select person.parent_id as p, person.id as c
from Person
And then turn the results from that query, back into an iterable of
pairs of Person objects.
Ideally, it should be possible to use such comprehension expressions
to define parameterized queries and "views".
Indeed, the TRC layer should make it possible to have writable views,
expressed as TRC expressions. This would let you keep your
object-record mappings the same, regardless of your ultimate back-end
storage schema, since a TRC expression like PersonRecord may in
practice expand out to a join of multiple tables (for example).
However, the TRC layer in and of itself will not really "do" anything
much. It will primarily be a means of representing (and
algebraically manipulating) records, queries, and the associated
schemas. It will be the responsibility of a "back end" layer to
actually map TRC expressions into recordsets, and to support the
addition, deletion, or modification of records.
So, it looks like there will be at least three segments to the ORM
roadmap. An object-record layer, the TRC layer, and the "back
end". The TRC layer will probably depend heavily on PEAK-Rules,
which is approaching its first alpha release. The back end will
depend on Contextual (in order to offer storage services).
The TRC layer will also use Contextual, in that I want records and
queries to offer direct operations that then map to back-end
operations. In other words, it should be possible to do adds and
updates using "+RecordType(value, value, ...)" and deletes using
"-RecordType(somekey=someValue)". It should also be possible
to iterate over TRC queries. All of these operations will be
implemented in the TRC layer by delegation to context-specific
back-end services, so that you can do things like:
with someDB:
+Foo(...)
-Bar(...)
# etc.
to make changes in a specific database.
There will also need to be a default, in-memory back-end service for
these operations. The memory back-end will be the first back-end implemented.
Once there's a TRC layer and a memory back-end, it will then make
sense to begin work on the object-record mapping layer, although
there will probably be some points at which I'll need to do some
preliminary design on that layer in order to drive decisions being
made at other levels. Next, mapping object-based comprehension
queries to TRC queries will get us a complete ORM... albeit a
memory-based one!
At that point, work on transactions, sessions, and the like will need
to take place, as well as mapping to an SQL-based backend. SQLite
will be the initial mapping target.
There is some possibility of using existing SQL libraries (such as
SQLAlchemy) as backends for the TRC layer. I haven't investigated
this very much at this point, but it would likely be
helpful. Unfortunately, it appears at first glance that SQLAlchemy
is based on the relational algebra, rather than either of the
relational calculi, which makes it less suitable (IMO) as a direct
target for automated translations, especially of one schema to another.
It's one of those cases where a product is "infuriatingly good" --
i.e., so potentially useful that the remaining barrier(s) to using it
are quite frustrating. It's frankly quite tempting to just glue the
Trellis directly to SQLAlchemy as closely as possible, and be done
with it. There'd be no in-memory support (except by using SQLite in
memory mode, I suppose), but it'd certainly be a lot less complex of a project.
However, the major disadvantage to doing that, is that Chandler's
sharing framework depends on EIM -- a record-oriented type
system. So we will want to be able to define EIM records and use
them as a basis for queries and other types of operations. And the
TRC layer in the design above is basically a fancier version of EIM,
that allows querying and add/remove/update operations.
So, the catch is, we want to define a mapping from objects to records
once and only once, so that we get both storage and sharing for the
price of one schema definition. And I don't think that SQLAlchemy's
schema information is sufficient for what we need in EIM.
That means that a TRC layer pretty much has to exist, even if it
isn't much more than a wrapper over SQLAlchemy's schema definition
system. And, because it needs to be able to transform records
directly, it almost certainly isn't going to be able to rely on
SQLAlchemy for schema translation.
However, it may be that I can simply not bother with implementing an
in-memory backend, at least in the short run. It might make as much
sense, or more, to use SQLAlchemy from the start, with an SQLite
memory db for testing. EIM, after all, is based on simple records
and record sets, not complex queries. On the other hand, Chandler's
existing EIM library does have to do some limited in-memory
indexing... so that aspect of things still has to be implemented.
Some preliminary conclusions:
* An EIM-like TRC layer appears to be a must; SQLAlchemy doesn't have
one, and it's needed for Chandler sharing. In addition, databases
aren't the only source of records that Trellis applications will want
to work with. (For example, I've been thinking about writing a GUI
to control my VOIP conferencing system in real-time, displaying
events as people go on/off line and speak, etc. )
* SQLAlchemy looks very promising as a back-end, and even potentially
as a front-end mapper. It supports inheritance, polymorphism,
lazy/eager loading, multi-object rows, cascades, and just about every
other SQL feature I can think of.
So, next steps should probably be working on the design of the TRC
layer, and investigation of (and/or stealing ideas from) SQLAlchemy.
More information about the PEAK
mailing list