[TransWarp] Basic "storage jar" design

Sat Jun 29 16:34:33 EDT 2002

Abstract "Storage Jar"
======================

This is a basic design for an abstract implementation of the "storage jar" 
concept for PEAK/ZODB4.  It can be used as the basis for either a primary 
key-driven object jar, or a query jar, with appropriate method 
overrides.  "Alternate key" jars won't have much use for this as a base 
class, since they don't manage object states but just offer a convenient 
front-end to retrieving an object from its primary key jar (possibly using 
the preloadState() mechanism described below).

(Note: 'state' as used herein refers to an argument of __setstate__() or 
the return value of __getstate__(), as used by ZODB and Python's 
pickle.  That is, an arbitrary object which represents the persistable 
"state" of another, more complex object.  Its meaning and implementation 
are only relevant to the object jar maintaining it, and any query jars 
whose queries are based on it.)

Jar API Methods
---------------

* __getitem__ -- Called by application to retrieve objects, and by load() 
operations of other jars to get ghost references.  Implementation: check 
cache, return object if found.  Otherwise get ob = self.ghost(oid); and do 
ob._p_oid = oid; ob._p_jar = self; self.cache[oid] = ob before returning ob.

* preloadState(oid, state) -- Called by query objects and "alternate key" 
jars to prevent re-querying the DB for state information that they've 
already retrieved.  The caller will have to keep a reference to 'ob' until 
it has retrieved the object that it actually wants (assuming 'ob' isn't 
it), so that 'ob' won't be dropped from the jar's cache (since the cache 
might be weak-reference based).  Implementation: identical to __getitem__, 
except that self.ghost(oid, state) is called instead of self.ghost(oid).

* oidFor(ob) -- Called by save() operations of other jars to get foreign 
key values for objects referenced in their states.  Implementation: if 
ob._p_jar is self, return ob._p_oid, unless _p_oid is None, in which case 
save the object using oid = ob._p_oid = self.new(ob), and return the 
oid.  If the _p_jar is NOT self, return self.thunk(ob) to try to translate 
the reference or create a stub.

* flush() -- Called by multi-row query jars that are about to issue a query 
against states managed by this jar, to ensure that any changed objects are 
written to the backing DB, thus preventing queries against stale 
data.  This method simply walks the jar's "dirty" set and calls 
self.commit() on the objects, which will write them back and also ensure 
that they'll be invalidated if the transaction later aborts.  This is 
similar to using subtransaction commits in ZPatterns, but it will happen 
transparently at the application level, and it doesn't actually issue a 
subtransaction commit.

Abstract Methods and Attributes
-------------------------------

(to be redefined as needed in concrete subclasses of AbstractJar)

* ghost(oid, state=None) -- given an oid and optional state, return a ghost 
(empty instance) of the correct class.  If 'state' is supplied, load it 
into the object with ob.__setstate__() before returning it.  Note that if 
'state' is needed to determine the correct class, but it isn't supplied, 
your implementation can always call self.load(oid) first, examine the 
state, then create the class instance and stick the state in it.  It's not 
a ghost at that point, but what else can you do if you need the state?  The 
reason this method *must* accept an optional state, even if it doesn't need 
it, is so that multi-row queries and alternate key lookups can provide 
their results to preloadState(), preventing a re-retrieval of the same data 
from the underlying DB.

* new(ob) -- save new object 'ob' and return its oid (by generating it or 
extracting it from state)

* save(ob) -- given an object, save it

* load(oid) -- given an oid, return a state; it is explicitly allowed to 
throw an exception if the oid is invalid or the state is non-existent in 
the source DB.  Some jars, however, may wish to treat their backing store 
as "infinite" and simply return a default state for not-found oids.

* cache -- a Once binding to create a cache with appropriate retention 
policy, e.g. a cache which deactivated all contents when a transaction was 
finished, or a simple weak reference cache.

* thunk(ob) -- create and save an external DB reference stub for 'ob', then 
return its oid, or if ob._p_jar is actually part of the same database, 
translate its oid to this jar's corresponding oid.  The default 
implementation of this method raises an exception to indicate that 
references to objects in other jars can't be converted or stubbed.  Most 
jars probably won't override the default, either.

Transaction.IDataManager Methods
--------------------------------

(implementations supplied by AbstractJar, similar in nature to the ones in 
the TW.Database.DataModel.Database class)

* tpc_begin(txn), tpc_vote(txn) -- "pass".

* commit(ob,txn=None) -- If object's change flag isn't set, just 
return.  Otherwise, add it to the 'committed' set, remove it from the 
'dirty' set, and reset its 'changed' flag, after calling self.save() to 
save it.  (Unless _p_oid is None, in which case do oid = ob._p_oid = 
self.new(ob); self.cache[oid]=ob instead of calling self.save().)

* abort(ob,txn) -- deactivate the object, then call self.tpc_abort() 
(because we may have pre-committed some objects during a flush() call).

* tpc_finish(txn) -- clear the 'committed' set ('dirty' should be empty).

* tpc_abort(txn) -- deactivate everything in the 'committed' and 'dirty' 
sets, then clear them both.

Persistence.IPersistentDataManager Methods
------------------------------------------

(implementations supplied by AbstractJar)

* setstate(ob) -- ob.__setstate__(self.load(ob._p_oid)).

* register(ob) -- register the object with the transaction, and add it to 
the 'dirty' set.

* mtime(ob) -- "pass", but subclasses could override.  Not much point in 
doing so, however, because if you want to do "edit conflict" checking, you 
can just do it in the save() method, which can then throw an exception if 
the state to be saved is based on an out-of-date version of data in the 
underlying DB.

Miscellaneous
-------------

The 'dirty' and 'committed' sets will probably be implemented as 
dictionaries mapping from id(ob) -> ob, set up as binding.Once attributes.

=== fin ===

Whew!  I think that ought to take care of 95%+ of the boilerplate code that 
I can think of right now, while providing all the required interfaces, 
including what's needed for queries and alternate key jars to help storage 
jars avoid re-loading of state that's already available, and what's needed 
to let query jars ensure that their searches are always against up-to-date 
data in the transaction.

This design complies with all of the requirements I posted a few days ago, 
minus one "nice to have" and the "intensional state" requirement that I've 
since dropped.  The idea of using mutable cache keys was "inherited" from 
the TW.Database.DataModel design, and not really relevant here.  Roche' 
Compaans' confusion at my confusion helped clear up some of the confusion.  :)

I imagine that query jars and alternate key jars will probably have their 
own boilerplate, with the former being a subclass of AbstractJar and the 
latter being a different base class.  Also, there will probably be SQLJars 
and LDAPJars with default implementations for ghost/new/save/etc., 
replacing them with other more-specific abstract methods to be 
overridden.  And finally WarpCORE jars, which should be 100% 
metadata-driven in normal use.

I love the smell of storage jars in the morning...  they smell like 
persistence.  :)  Only it's afternoon now, hours after I started working on 
this, so I think I'll go do something else now before my wrists give out.  :)