[TransWarp] Basic "storage jar" design
Phillip J. Eby
pje at telecommunity.com
Sat Jun 29 16:34:33 EDT 2002
Abstract "Storage Jar"
This is a basic design for an abstract implementation of the "storage jar"
concept for PEAK/ZODB4. It can be used as the basis for either a primary
key-driven object jar, or a query jar, with appropriate method
overrides. "Alternate key" jars won't have much use for this as a base
class, since they don't manage object states but just offer a convenient
front-end to retrieving an object from its primary key jar (possibly using
the preloadState() mechanism described below).
(Note: 'state' as used herein refers to an argument of __setstate__() or
the return value of __getstate__(), as used by ZODB and Python's
pickle. That is, an arbitrary object which represents the persistable
"state" of another, more complex object. Its meaning and implementation
are only relevant to the object jar maintaining it, and any query jars
whose queries are based on it.)
Jar API Methods
* __getitem__ -- Called by application to retrieve objects, and by load()
operations of other jars to get ghost references. Implementation: check
cache, return object if found. Otherwise get ob = self.ghost(oid); and do
ob._p_oid = oid; ob._p_jar = self; self.cache[oid] = ob before returning ob.
* preloadState(oid, state) -- Called by query objects and "alternate key"
jars to prevent re-querying the DB for state information that they've
already retrieved. The caller will have to keep a reference to 'ob' until
it has retrieved the object that it actually wants (assuming 'ob' isn't
it), so that 'ob' won't be dropped from the jar's cache (since the cache
might be weak-reference based). Implementation: identical to __getitem__,
except that self.ghost(oid, state) is called instead of self.ghost(oid).
* oidFor(ob) -- Called by save() operations of other jars to get foreign
key values for objects referenced in their states. Implementation: if
ob._p_jar is self, return ob._p_oid, unless _p_oid is None, in which case
save the object using oid = ob._p_oid = self.new(ob), and return the
oid. If the _p_jar is NOT self, return self.thunk(ob) to try to translate
the reference or create a stub.
* flush() -- Called by multi-row query jars that are about to issue a query
against states managed by this jar, to ensure that any changed objects are
written to the backing DB, thus preventing queries against stale
data. This method simply walks the jar's "dirty" set and calls
self.commit() on the objects, which will write them back and also ensure
that they'll be invalidated if the transaction later aborts. This is
similar to using subtransaction commits in ZPatterns, but it will happen
transparently at the application level, and it doesn't actually issue a
Abstract Methods and Attributes
(to be redefined as needed in concrete subclasses of AbstractJar)
* ghost(oid, state=None) -- given an oid and optional state, return a ghost
(empty instance) of the correct class. If 'state' is supplied, load it
into the object with ob.__setstate__() before returning it. Note that if
'state' is needed to determine the correct class, but it isn't supplied,
your implementation can always call self.load(oid) first, examine the
state, then create the class instance and stick the state in it. It's not
a ghost at that point, but what else can you do if you need the state? The
reason this method *must* accept an optional state, even if it doesn't need
it, is so that multi-row queries and alternate key lookups can provide
their results to preloadState(), preventing a re-retrieval of the same data
from the underlying DB.
* new(ob) -- save new object 'ob' and return its oid (by generating it or
extracting it from state)
* save(ob) -- given an object, save it
* load(oid) -- given an oid, return a state; it is explicitly allowed to
throw an exception if the oid is invalid or the state is non-existent in
the source DB. Some jars, however, may wish to treat their backing store
as "infinite" and simply return a default state for not-found oids.
* cache -- a Once binding to create a cache with appropriate retention
policy, e.g. a cache which deactivated all contents when a transaction was
finished, or a simple weak reference cache.
* thunk(ob) -- create and save an external DB reference stub for 'ob', then
return its oid, or if ob._p_jar is actually part of the same database,
translate its oid to this jar's corresponding oid. The default
implementation of this method raises an exception to indicate that
references to objects in other jars can't be converted or stubbed. Most
jars probably won't override the default, either.
(implementations supplied by AbstractJar, similar in nature to the ones in
the TW.Database.DataModel.Database class)
* tpc_begin(txn), tpc_vote(txn) -- "pass".
* commit(ob,txn=None) -- If object's change flag isn't set, just
return. Otherwise, add it to the 'committed' set, remove it from the
'dirty' set, and reset its 'changed' flag, after calling self.save() to
save it. (Unless _p_oid is None, in which case do oid = ob._p_oid =
self.new(ob); self.cache[oid]=ob instead of calling self.save().)
* abort(ob,txn) -- deactivate the object, then call self.tpc_abort()
(because we may have pre-committed some objects during a flush() call).
* tpc_finish(txn) -- clear the 'committed' set ('dirty' should be empty).
* tpc_abort(txn) -- deactivate everything in the 'committed' and 'dirty'
sets, then clear them both.
(implementations supplied by AbstractJar)
* setstate(ob) -- ob.__setstate__(self.load(ob._p_oid)).
* register(ob) -- register the object with the transaction, and add it to
the 'dirty' set.
* mtime(ob) -- "pass", but subclasses could override. Not much point in
doing so, however, because if you want to do "edit conflict" checking, you
can just do it in the save() method, which can then throw an exception if
the state to be saved is based on an out-of-date version of data in the
The 'dirty' and 'committed' sets will probably be implemented as
dictionaries mapping from id(ob) -> ob, set up as binding.Once attributes.
=== fin ===
Whew! I think that ought to take care of 95%+ of the boilerplate code that
I can think of right now, while providing all the required interfaces,
including what's needed for queries and alternate key jars to help storage
jars avoid re-loading of state that's already available, and what's needed
to let query jars ensure that their searches are always against up-to-date
data in the transaction.
This design complies with all of the requirements I posted a few days ago,
minus one "nice to have" and the "intensional state" requirement that I've
since dropped. The idea of using mutable cache keys was "inherited" from
the TW.Database.DataModel design, and not really relevant here. Roche'
Compaans' confusion at my confusion helped clear up some of the confusion. :)
I imagine that query jars and alternate key jars will probably have their
own boilerplate, with the former being a subclass of AbstractJar and the
latter being a different base class. Also, there will probably be SQLJars
and LDAPJars with default implementations for ghost/new/save/etc.,
replacing them with other more-specific abstract methods to be
overridden. And finally WarpCORE jars, which should be 100%
metadata-driven in normal use.
I love the smell of storage jars in the morning... they smell like
persistence. :) Only it's afternoon now, hours after I started working on
this, so I think I'll go do something else now before my wrists give out. :)
More information about the PEAK