Versioned Storage Re: [TransWarp] Basic "storage jar" design

Tue Jul 9 15:09:01 EDT 2002

What are your thoughts on versioning at the storage level?  Do you have
very many actual requirements for versioning business objects?

I've been working with hyperdocuments (XML and XLink kind of stuff) the
past few years and have gotten pretty interested in versioning linked
information objects with the ability to flexibly manage the
configurations of those objects. It's my opinion that stuff like
workflow builds on lifecycle capabilities and lifecycle builds on
versioning capabilities.  While I think these are incredibly interesting
technical problems, I haven't yet determined if there is a strong
business need for solutions based on these ideas.

The versioning model that I've written up [1] and extended to
hyperdocuments [2] is designed to enable policied version lookup based
on the context of that resolution. Objects would association to each
other not just through raw oids, but rather through some resourceId and
other data to specify how to resolve the correct version.

An easy example would be 1) show me the latest public version of my
website, 2) show me what my website publicly contained on March 30th,
and 3) show me Eliot's draft branch of the website on March 30th. In all
of these cases I want the hyperlinks between objects to resolve to the
versions of resources that were effective at the right point in time and
specific "branch".

I've been trying to keep up with what you've been writing to the list (I
really like your "Image Streaming" thought process, but it's kind of
like drinking from a fire hose!). As far as what it would mean for a Jar
API, I think it could be as simple as :
1) allowing the oid of an object to contain (resourceId,
resolutionPolicy=None, *args). I think you already would allow this, but
I'm being explicit. Here the *args would be policy dependent data, like
a specific versionId. Examples of resolutionPolicy include OnSnapshot
and Fixed. (See papers for details.)

2) and some "versioning" Jars that can act as facades for the series of
Snapshots and Branches from a Versioning Storage. A "versioning" Jar
could then be set to some particular timestamp and branch to always
return versions relative to that context. This could yield read-only
content from those snapshots. The "version" jar could also point to the
HEAD on some branch and provide read-write access. (This is pretty much
what my team called a "Sandbox", but we haven't written up anything
about it yet. This thing also does conflict detection and merging
support.)

These are just pretty much a raw dump of my ideas based on your
mailings, so take it with a grain of salt.  ;-)

[1] SnapCM: Versioning Object Model.  This papers defines in UML and OCL
an object model for precisely describing the branches, snapshots and
version link resolution behavior.  (Very abstract model, just the
concepts).
http://www.isogen.com/papers/snapCM/index.html
http://www.isogen.com/papers/snapCM.pdf

[2] Versioned Hyperdocuments: Support for Lifecycle Models. This paper
extends the SnapCM model to Documents and Hyperlinking. It includes a
much more complete narrative of how/why/when kind of stuff.
http://www.isogen.com/papers/versioned-hyperdocuments/index.html
http://www.isogen.com/papers/versioned-hyperdocuments.pdf

John Heintz

ps - the system we built on top of the ZODB isn't available anywhere.
We've asked about at least open sourcing it but have received no
response. ;-(

On Sat, 2002-06-29 at 15:34, Phillip J. Eby wrote:
> Abstract "Storage Jar"
> ======================
> 
> This is a basic design for an abstract implementation of the "storage jar" 
> concept for PEAK/ZODB4.  It can be used as the basis for either a primary 
> key-driven object jar, or a query jar, with appropriate method 
> overrides.  "Alternate key" jars won't have much use for this as a base 
> class, since they don't manage object states but just offer a convenient 
> front-end to retrieving an object from its primary key jar (possibly using 
> the preloadState() mechanism described below).
> 
> (Note: 'state' as used herein refers to an argument of __setstate__() or 
> the return value of __getstate__(), as used by ZODB and Python's 
> pickle.  That is, an arbitrary object which represents the persistable 
> "state" of another, more complex object.  Its meaning and implementation 
> are only relevant to the object jar maintaining it, and any query jars 
> whose queries are based on it.)
> 
> 
> Jar API Methods
> ---------------
> 
> * __getitem__ -- Called by application to retrieve objects, and by load() 
> operations of other jars to get ghost references.  Implementation: check 
> cache, return object if found.  Otherwise get ob = self.ghost(oid); and do 
> ob._p_oid = oid; ob._p_jar = self; self.cache[oid] = ob before returning ob.
> 
> * preloadState(oid, state) -- Called by query objects and "alternate key" 
> jars to prevent re-querying the DB for state information that they've 
> already retrieved.  The caller will have to keep a reference to 'ob' until 
> it has retrieved the object that it actually wants (assuming 'ob' isn't 
> it), so that 'ob' won't be dropped from the jar's cache (since the cache 
> might be weak-reference based).  Implementation: identical to __getitem__, 
> except that self.ghost(oid, state) is called instead of self.ghost(oid).
> 
> * oidFor(ob) -- Called by save() operations of other jars to get foreign 
> key values for objects referenced in their states.  Implementation: if 
> ob._p_jar is self, return ob._p_oid, unless _p_oid is None, in which case 
> save the object using oid = ob._p_oid = self.new(ob), and return the 
> oid.  If the _p_jar is NOT self, return self.thunk(ob) to try to translate 
> the reference or create a stub.
> 
> * flush() -- Called by multi-row query jars that are about to issue a query 
> against states managed by this jar, to ensure that any changed objects are 
> written to the backing DB, thus preventing queries against stale 
> data.  This method simply walks the jar's "dirty" set and calls 
> self.commit() on the objects, which will write them back and also ensure 
> that they'll be invalidated if the transaction later aborts.  This is 
> similar to using subtransaction commits in ZPatterns, but it will happen 
> transparently at the application level, and it doesn't actually issue a 
> subtransaction commit.
> 
> 
> Abstract Methods and Attributes
> -------------------------------
> 
> (to be redefined as needed in concrete subclasses of AbstractJar)
> 
> * ghost(oid, state=None) -- given an oid and optional state, return a ghost 
> (empty instance) of the correct class.  If 'state' is supplied, load it 
> into the object with ob.__setstate__() before returning it.  Note that if 
> 'state' is needed to determine the correct class, but it isn't supplied, 
> your implementation can always call self.load(oid) first, examine the 
> state, then create the class instance and stick the state in it.  It's not 
> a ghost at that point, but what else can you do if you need the state?  The 
> reason this method *must* accept an optional state, even if it doesn't need 
> it, is so that multi-row queries and alternate key lookups can provide 
> their results to preloadState(), preventing a re-retrieval of the same data 
> from the underlying DB.
> 
> * new(ob) -- save new object 'ob' and return its oid (by generating it or 
> extracting it from state)
> 
> * save(ob) -- given an object, save it
> 
> * load(oid) -- given an oid, return a state; it is explicitly allowed to 
> throw an exception if the oid is invalid or the state is non-existent in 
> the source DB.  Some jars, however, may wish to treat their backing store 
> as "infinite" and simply return a default state for not-found oids.
> 
> * cache -- a Once binding to create a cache with appropriate retention 
> policy, e.g. a cache which deactivated all contents when a transaction was 
> finished, or a simple weak reference cache.
> 
> * thunk(ob) -- create and save an external DB reference stub for 'ob', then 
> return its oid, or if ob._p_jar is actually part of the same database, 
> translate its oid to this jar's corresponding oid.  The default 
> implementation of this method raises an exception to indicate that 
> references to objects in other jars can't be converted or stubbed.  Most 
> jars probably won't override the default, either.
> 
> 
> Transaction.IDataManager Methods
> --------------------------------
> 
> (implementations supplied by AbstractJar, similar in nature to the ones in 
> the TW.Database.DataModel.Database class)
> 
> * tpc_begin(txn), tpc_vote(txn) -- "pass".
> 
> * commit(ob,txn=None) -- If object's change flag isn't set, just 
> return.  Otherwise, add it to the 'committed' set, remove it from the 
> 'dirty' set, and reset its 'changed' flag, after calling self.save() to 
> save it.  (Unless _p_oid is None, in which case do oid = ob._p_oid = 
> self.new(ob); self.cache[oid]=ob instead of calling self.save().)
> 
> * abort(ob,txn) -- deactivate the object, then call self.tpc_abort() 
> (because we may have pre-committed some objects during a flush() call).
> 
> * tpc_finish(txn) -- clear the 'committed' set ('dirty' should be empty).
> 
> * tpc_abort(txn) -- deactivate everything in the 'committed' and 'dirty' 
> sets, then clear them both.
> 
> 
> Persistence.IPersistentDataManager Methods
> ------------------------------------------
> 
> (implementations supplied by AbstractJar)
> 
> * setstate(ob) -- ob.__setstate__(self.load(ob._p_oid)).
> 
> * register(ob) -- register the object with the transaction, and add it to 
> the 'dirty' set.
> 
> * mtime(ob) -- "pass", but subclasses could override.  Not much point in 
> doing so, however, because if you want to do "edit conflict" checking, you 
> can just do it in the save() method, which can then throw an exception if 
> the state to be saved is based on an out-of-date version of data in the 
> underlying DB.
> 
> 
> Miscellaneous
> -------------
> 
> The 'dirty' and 'committed' sets will probably be implemented as 
> dictionaries mapping from id(ob) -> ob, set up as binding.Once attributes.
> 
> 
> === fin ===
> 
> Whew!  I think that ought to take care of 95%+ of the boilerplate code that 
> I can think of right now, while providing all the required interfaces, 
> including what's needed for queries and alternate key jars to help storage 
> jars avoid re-loading of state that's already available, and what's needed 
> to let query jars ensure that their searches are always against up-to-date 
> data in the transaction.
> 
> This design complies with all of the requirements I posted a few days ago, 
> minus one "nice to have" and the "intensional state" requirement that I've 
> since dropped.  The idea of using mutable cache keys was "inherited" from 
> the TW.Database.DataModel design, and not really relevant here.  Roche' 
> Compaans' confusion at my confusion helped clear up some of the confusion.  :)
> 
> I imagine that query jars and alternate key jars will probably have their 
> own boilerplate, with the former being a subclass of AbstractJar and the 
> latter being a different base class.  Also, there will probably be SQLJars 
> and LDAPJars with default implementations for ghost/new/save/etc., 
> replacing them with other more-specific abstract methods to be 
> overridden.  And finally WarpCORE jars, which should be 100% 
> metadata-driven in normal use.
> 
> I love the smell of storage jars in the morning...  they smell like 
> persistence.  :)  Only it's afternoon now, hours after I started working on 
> this, so I think I'll go do something else now before my wrists give out.  :)
> 
> _______________________________________________
> TransWarp mailing list
> TransWarp at eby-sarna.com
> http://www.eby-sarna.com/mailman/listinfo/transwarp
> 
-- 
John D. Heintz | Senior Developer

1016 La Posada Dr. | Suite 240 | Austin TX 78752
T 512.380.0347 | jheintz at isogen.com

http://www.isogen.com