[TransWarp] PEAK persistence based on ZODB4, continued...
Phillip J. Eby
pje at telecommunity.com
Fri Jun 28 22:41:43 EDT 2002
It's interesting to think of a PEAK persistence service as a single
object... let's call it a "Jar", since objects' _p_jar pointer will
usually reference it.
A jar needs to be able to handle various types of keys, and perform various
sorts of mappings between keys and the referenced objects, not to mention
vice versa. Specifically, a jar must:
* Given a key, return an object
* Given an object, return a key of a requested type (for saving in a
foreign key reference).
* Given an object, retrieve a collection of objects referencing it along a
given association. (Probably implemented as a kind of key that retrieves a
persistent collection object, whose state is then loaded by a query.)
* Track "dirty" objects per underlying storage mechanism and flush them
when a query is issued against that storage (or at least pass on the event
information to the storages to manage themselves), and register them with
the current transaction, if appropriate.
* Implement the ZODB4 Persistence.IPersistentDataManager protocol for any
of its objects that call upon it (which mostly means the immediately
preceding item, of dirtyness management).
* Implement the ZODB4 Transaction.IDataManager protocol on behalf of any of
its objects that are registered with a transaction (presumably by
delegating the responsibility right back to them).
In order to make the jar mechanism generic, it's necessary to delegate all
of these responsibilities back to other objects. The IDataManager "commit"
responsibility is quite straightforward; it can be delegated to a _p_save()
mechanism on the objects themselves, and "abort" can be implemented in the
standard way for ZODB, i.e. deactivating the objects. The yucky part is
dealing with aborting commited objects upon tpc_abort, but that can be managed.
The entire IPersistentDataManager interface can also be delegated back to
methods on the persistent objects themselves. If it turns out later that
we get more flexibility by having some kind of marshalling object do these
things instead, well, the design will change at that point. Presumably
we'll know this long before we get to writing actual code.
This leaves the jar's responsibility as mostly maintaining an oid->object
cache, and doing a lot of type-based dispatching between objects providing
storage services and objects providing mapping services.
The two primary services we're after are key-to-class (and perhaps
marshaller) and object-to-key mappers. It seems logical that the latter
could be a responsibility of the object itself, since that's where the most
contextual information is.
So that leaves key-to-class/key-to-marshaller. There are two kinds of
keys, identity (primary) keys and alternate keys. Alternate keys are
searched in an underlying DB in order to identify the primary key, which is
then looked up to retrieve the desired object. Primary keys are looked up
in cache, or in the DB, or not at all, depending on the situation. (DB
lookup is necessary to determine the class for the "ghost" persistent
object in case of keys which can refer to more than one class. No DB
lookup is needed for collection/query keys or for keys which can refer to
only one class. Note that whenever the DB lookup actually does take place,
we'll need to have some kind of ReferenceError that can take place for
accessing a non-existent object.)
Obviously, the lookup mechanism is heavily dependent upon the type of key -
indeed it's so tightly tied that one wonders if we should say that the key
type itself is responsible for the lookup! This could be seen as akin to
the Index objects in Andy Dustman's SqlDict package.
So we could view the master jar as containing a set of "indexes", each
representing a type of key, each accessible as a mapping from key data to
persistent objects. Hm. Actually, it's beginning to seem as though
there's no point in having the master jar, except as a container/context
for these "indexes".
Okay, so maybe the "indexes" should actually be "jars", and the "jar"
should be a "shelf". :) Each jar is _p_jar for its contents, has its own
cache, and implements both the transactional and persistence management
protocols. There will be a few stereotypical kinds of jars:
* Query/collection jars - the setstate() method will first ask any jars
containing objects the query is based on, to flush any dirty objects. Then
it'll load the data by creating an appropriate iterator or lazy list that
maps the query results over to objects from the correct jar based on loaded
keys. The register() method will raise an exception, since queries aren't
directly modifiable. These jars also won't need to implement transaction
behavior, except in order to possibly invalidate loaded query objects at
transaction end.
* Primary key jars - the straighforward case in general, but lots of
options/details in particular! Cache may be of whatever duration is
appropriate. Transactional behavior is required.
* Alternate key jars - lookup in the DB, then refer to another jar for the
dirty work. No transactional or persistent behavior as such, this is just
a redirection mechanism. Can keep a cache, but it should map to the
primary key, not the objects themselves, and it should probably be short-lived.
Jars will all need to implement some kind of "getKeyFor(object)" protocol
so that key references can be saved. For primary key jars, it should be
able to notice when an object is a new instance of the correct type that
has not been saved in the jar yet, and do the necessary work to create and
save it immediately (thus ensuring referential integrity if it's an RDBMS).
Another issue that needs to be addressed is partial loading of state, and
re-use of partially loaded data. For example, queries should not cause
objects' state to be read from the DB twice, especially not on a
one-record-at-a-time basis. This might be addressable using yet another
kind of jar: a state jar, which given a full or partial state, can identify
and load the object.
For an example of a state jar, let's take the WarpCORE "objects" table. It
contains some basic info like the primary key and object class. A state
jar could extract the primary key, check its cache, and if the object isn't
there, construct a ghost using the class info. If enough information is in
the state to fully load the object, it could do so.
The "enough information" is a bit tricky, especially as to how it's
specified what constitutes "enough", and even what kind of state is being used.
For objects based on WarpCORE, data interfaces can be treated as
attributes, and lazily loaded based on primary key, with only the "object"
data interface loaded in the main object. To do this, the Features would
have to each delegate to the correct data interface object, which means a
lot of metadata specification, or else some kind of mapping in the
Element's class, derived from jar-provided information about available
interfaces.
(Just to clarify, a WarpCORE data interface is effectively a table or view,
joined by primary key to "parent class" data interface(s). Each row can be
thought of as "adapting" a WarpCORE object to a particular interface, by
providing the fields of that interface. So, each of these "adapters" could
be attributes of an Element, and the Element's Features could store or
retrieve state (in the form of individual field values) from them.)
Using data interface objects is rather handy because a query can just hand
off anything it loads to the jars for the interfaces being queried, and the
"real" objects will automatically sync up. (This suggests, however, that
query objects need a way to work as pure records, without doing any object
or state loading, because otherwise a lot of cache memory could get chewed
up unnecessarily during processing. There's also a certain question in my
mind regarding the ability of different DBAPI implementations to handle
multiple open cursors without choking...)
So, it sounds as though primary key jars should have an
"objectFromState(state)" method that bypasses the key machinery, for use by
queries and alternate key jars (since looking up the alternate key will
presumably look up and return a state). No need for state jars as a
separate mechanism.
Hm. So what would a WarpCORE "shelf" look like? Presumably a primary key
jar for each WarpCORE interface. Alternate key jars for all alternate keys
in the database. Most of the primary key jars would load simple primitive
"state" objects, except for base interfaces like "wc_object" and
"wc_event", which would know how to look up classes from the base interface
data in order to create their ghosts. The classes would need to know what
sub-interfaces it supported, in order to request ghosts for them. These
ghosts would act as though they had default field values, and would only
get saved if explicitly modified. The Elements wouldn't have any state of
their own; it'd be the interface state objects that got saved.
Probably there'd be two distinct primary key jars for base types: one to
retrieve the Element, and one for the base interface's state. The Element
jar would just ask the base interface jar for its object, pull out the
necessary metadata to determine the element class, and then create an
Element populated with state objects from all the right
interfaces. Interestingly, these Elements could be cached pretty well
indefinitely, since they don't contain any state that isn't managed by
something else.
Could we now do a cross-DB Element? Let's see. Load a base state from
LDAP, and try to look up a corresponding base state from WarpCORE. If the
WarpCORE part isn't found, create a new instance of the appropriate state
class for the WarpCORE part, with suitable default data, its _p_jar
pointing to the right WarpCORE jar, and put it in the Element's slot for
WarpCORE state. If anything gets stored in the WarpCORE state, it'll get
saved back to the DB (generating an OID if needed). If anything is changed
in the LDAP state, it'll likewise get saved.
Okay, now do it the other way... load the base state from WarpCORE, then
extra state from LDAP afterwards... Yep, that works too. So when a
WarpCORE object references a stub for an external DB, *pow*, we have
transparent locate-and-load.
Hm. Definitely sounds like we're on to something here. The only messy
part is the delegation of features to attributes of state attributes of the
element. This probably will require subclassing at the dispatch layer to
specify the delegation rules, but it might be doable with some kind of
mapping mechanism. The trick is how to do it without adding more layers of
calls to the feature operations...
Aha! I think I have it. The Smalltalk ValueModel pattern. Load a state
as a dictionary of objects, each with get/set methods. They can be "cells"
that contain a value, or "delegated cells" that know an attribute name on a
state object they reference. The features look for objects in __dict__ and
call the get/set methods on them instead of storing directly in
__dict__. Alternatively, they could check if an item in __dict__ was a
getter-setter instance, and either work directly on the __dict__ or on the
getter-setter, as appropriate. One could then interchangeably use the same
Model framework classes with any persistence mechanism, as it would be the
responsibility of the jar to populate their state dictionary.
Now the jars can manage the mapping rules from feature to states and
substates, which means it might be possible to assemble those rules from
metadata provided by individual jars, and register them in the "shelf" for
each Element class.
Whew. This is a lot of stuff to implement, but it mostly sounds like the
tasks can be separated from one another to some extent. And after enough
practice, it should be possible to define metaclasses that make table (or
WarpCORE interface) definitions straightforward, and arrange them such that
one can easily subclass a "shelf" and add more "jars" to it, or
extend/replace the class factories used by the Element jars.
I can hardly believe it... I think I've just defined a world-class
business object persistence framework, supporting polymorphism,
multi-database objects, and mappings to arbitrary legacy schemas, complete
with tunable and replaceable caching policies.
So I guess I'll stop here for tonight, before I find something wrong with
any of the ideas. :)
Perhaps tomorrow I'll try to more rigorously define the "aspects" of a
jar's behavior (e.g. state mapping, caching, transaction support, etc.) and
work out how to parameterize them with strategy objects or something like that.
More information about the PEAK
mailing list