[TransWarp] PEAK persistence based on ZODB4, continued...

Fri Jun 28 22:41:43 EDT 2002

It's interesting to think of a PEAK persistence service as a single 
object...  let's call it a "Jar", since objects' _p_jar pointer will 
usually reference it.

A jar needs to be able to handle various types of keys, and perform various 
sorts of mappings between keys and the referenced objects, not to mention 
vice versa.  Specifically, a jar must:

* Given a key, return an object

* Given an object, return a key of a requested type (for saving in a 
foreign key reference).

* Given an object, retrieve a collection of objects referencing it along a 
given association.  (Probably implemented as a kind of key that retrieves a 
persistent collection object, whose state is then loaded by a query.)

* Track "dirty" objects per underlying storage mechanism and flush them 
when a query is issued against that storage (or at least pass on the event 
information to the storages to manage themselves), and register them with 
the current transaction, if appropriate.

* Implement the ZODB4 Persistence.IPersistentDataManager protocol for any 
of its objects that call upon it (which mostly means the immediately 
preceding item, of dirtyness management).

* Implement the ZODB4 Transaction.IDataManager protocol on behalf of any of 
its objects that are registered with a transaction (presumably by 
delegating the responsibility right back to them).

In order to make the jar mechanism generic, it's necessary to delegate all 
of these responsibilities back to other objects.  The IDataManager "commit" 
responsibility is quite straightforward; it can be delegated to a _p_save() 
mechanism on the objects themselves, and "abort" can be implemented in the 
standard way for ZODB, i.e. deactivating the objects.  The yucky part is 
dealing with aborting commited objects upon tpc_abort, but that can be managed.

The entire IPersistentDataManager interface can also be delegated back to 
methods on the persistent objects themselves.  If it turns out later that 
we get more flexibility by having some kind of marshalling object do these 
things instead, well, the design will change at that point.  Presumably 
we'll know this long before we get to writing actual code.

This leaves the jar's responsibility as mostly maintaining an oid->object 
cache, and doing a lot of type-based dispatching between objects providing 
storage services and objects providing mapping services.

The two primary services we're after are key-to-class (and perhaps 
marshaller) and object-to-key mappers.  It seems logical that the latter 
could be a responsibility of the object itself, since that's where the most 
contextual information is.

So that leaves key-to-class/key-to-marshaller.  There are two kinds of 
keys, identity (primary) keys and alternate keys.  Alternate keys are 
searched in an underlying DB in order to identify the primary key, which is 
then looked up to retrieve the desired object.  Primary keys are looked up 
in cache, or in the DB, or not at all, depending on the situation.  (DB 
lookup is necessary to determine the class for the "ghost" persistent 
object in case of keys which can refer to more than one class.  No DB 
lookup is needed for collection/query keys or for keys which can refer to 
only one class.  Note that whenever the DB lookup actually does take place, 
we'll need to have some kind of ReferenceError that can take place for 
accessing a non-existent object.)

Obviously, the lookup mechanism is heavily dependent upon the type of key - 
indeed it's so tightly tied that one wonders if we should say that the key 
type itself is responsible for the lookup!  This could be seen as akin to 
the Index objects in Andy Dustman's SqlDict package.

So we could view the master jar as containing a set of "indexes", each 
representing a type of key, each accessible as a mapping from key data to 
persistent objects.  Hm.  Actually, it's beginning to seem as though 
there's no point in having the master jar, except as a container/context 
for these "indexes".

Okay, so maybe the "indexes" should actually be "jars", and the "jar" 
should be a "shelf".  :)  Each jar is _p_jar for its contents, has its own 
cache, and implements both the transactional and persistence management 
protocols.  There will be a few stereotypical kinds of jars:

* Query/collection jars - the setstate() method will first ask any jars 
containing objects the query is based on, to flush any dirty objects.  Then 
it'll load the data by creating an appropriate iterator or lazy list that 
maps the query results over to objects from the correct jar based on loaded 
keys.  The register() method will raise an exception, since queries aren't 
directly modifiable.  These jars also won't need to implement transaction 
behavior, except in order to possibly invalidate loaded query objects at 
transaction end.

* Primary key jars - the straighforward case in general, but lots of 
options/details in particular!  Cache may be of whatever duration is 
appropriate.  Transactional behavior is required.

* Alternate key jars - lookup in the DB, then refer to another jar for the 
dirty work.  No transactional or persistent behavior as such, this is just 
a redirection mechanism.  Can keep a cache, but it should map to the 
primary key, not the objects themselves, and it should probably be short-lived.

Jars will all need to implement some kind of "getKeyFor(object)" protocol 
so that key references can be saved.  For primary key jars, it should be 
able to notice when an object is a new instance of the correct type that 
has not been saved in the jar yet, and do the necessary work to create and 
save it immediately (thus ensuring referential integrity if it's an RDBMS).

Another issue that needs to be addressed is partial loading of state, and 
re-use of partially loaded data.  For example, queries should not cause 
objects' state to be read from the DB twice, especially not on a 
one-record-at-a-time basis.  This might be addressable using yet another 
kind of jar: a state jar, which given a full or partial state, can identify 
and load the object.

For an example of a state jar, let's take the WarpCORE "objects" table.  It 
contains some basic info like the primary key and object class.  A state 
jar could extract the primary key, check its cache, and if the object isn't 
there, construct a ghost using the class info.  If enough information is in 
the state to fully load the object, it could do so.

The "enough information" is a bit tricky, especially as to how it's 
specified what constitutes "enough", and even what kind of state is being used.

For objects based on WarpCORE, data interfaces can be treated as 
attributes, and lazily loaded based on primary key, with only the "object" 
data interface loaded in the main object.  To do this, the Features would 
have to each delegate to the correct data interface object, which means a 
lot of metadata specification, or else some kind of mapping in the 
Element's class, derived from jar-provided information about available 
interfaces.

(Just to clarify, a WarpCORE data interface is effectively a table or view, 
joined by primary key to "parent class" data interface(s).  Each row can be 
thought of as "adapting" a WarpCORE object to a particular interface, by 
providing the fields of that interface.  So, each of these "adapters" could 
be attributes of an Element, and the Element's Features could store or 
retrieve state (in the form of individual field values) from them.)

Using data interface objects is rather handy because a query can just hand 
off anything it loads to the jars for the interfaces being queried, and the 
"real" objects will automatically sync up.  (This suggests, however, that 
query objects need a way to work as pure records, without doing any object 
or state loading, because otherwise a lot of cache memory could get chewed 
up unnecessarily during processing.  There's also a certain question in my 
mind regarding the ability of different DBAPI implementations to handle 
multiple open cursors without choking...)

So, it sounds as though primary key jars should have an 
"objectFromState(state)" method that bypasses the key machinery, for use by 
queries and alternate key jars (since looking up the alternate key will 
presumably look up and return a state).  No need for state jars as a 
separate mechanism.

Hm.  So what would a WarpCORE "shelf" look like?  Presumably a primary key 
jar for each WarpCORE interface.  Alternate key jars for all alternate keys 
in the database.  Most of the primary key jars would load simple primitive 
"state" objects, except for base interfaces like "wc_object" and 
"wc_event", which would know how to look up classes from the base interface 
data in order to create their ghosts.  The classes would need to know what 
sub-interfaces it supported, in order to request ghosts for them.  These 
ghosts would act as though they had default field values, and would only 
get saved if explicitly modified.  The Elements wouldn't have any state of 
their own; it'd be the interface state objects that got saved.

Probably there'd be two distinct primary key jars for base types: one to 
retrieve the Element, and one for the base interface's state.  The Element 
jar would just ask the base interface jar for its object, pull out the 
necessary metadata to determine the element class, and then create an 
Element populated with state objects from all the right 
interfaces.  Interestingly, these Elements could be cached pretty well 
indefinitely, since they don't contain any state that isn't managed by 
something else.

Could we now do a cross-DB Element?  Let's see.  Load a base state from 
LDAP, and try to look up a corresponding base state from WarpCORE.  If the 
WarpCORE part isn't found, create a new instance of the appropriate state 
class for the WarpCORE part, with suitable default data, its _p_jar 
pointing to the right WarpCORE jar, and put it in the Element's slot for 
WarpCORE state.  If anything gets stored in the WarpCORE state, it'll get 
saved back to the DB (generating an OID if needed).  If anything is changed 
in the LDAP state, it'll likewise get saved.

Okay, now do it the other way...  load the base state from WarpCORE, then 
extra state from LDAP afterwards...  Yep, that works too.  So when a 
WarpCORE object references a stub for an external DB, *pow*, we have 
transparent locate-and-load.

Hm.  Definitely sounds like we're on to something here.  The only messy 
part is the delegation of features to attributes of state attributes of the 
element.  This probably will require subclassing at the dispatch layer to 
specify the delegation rules, but it might be doable with some kind of 
mapping mechanism.  The trick is how to do it without adding more layers of 
calls to the feature operations...

Aha!  I think I have it.  The Smalltalk ValueModel pattern.  Load a state 
as a dictionary of objects, each with get/set methods.  They can be "cells" 
that contain a value, or "delegated cells" that know an attribute name on a 
state object they reference.  The features look for objects in __dict__ and 
call the get/set methods on them instead of storing directly in 
__dict__.  Alternatively, they could check if an item in __dict__ was a 
getter-setter instance, and either work directly on the __dict__ or on the 
getter-setter, as appropriate.  One could then interchangeably use the same 
Model framework classes with any persistence mechanism, as it would be the 
responsibility of the jar to populate their state dictionary.

Now the jars can manage the mapping rules from feature to states and 
substates, which means it might be possible to assemble those rules from 
metadata provided by individual jars, and register them in the "shelf" for 
each Element class.

Whew.  This is a lot of stuff to implement, but it mostly sounds like the 
tasks can be separated from one another to some extent.  And after enough 
practice, it should be possible to define metaclasses that make table (or 
WarpCORE interface) definitions straightforward, and arrange them such that 
one can easily subclass a "shelf" and add more "jars" to it, or 
extend/replace the class factories used by the Element jars.

I can hardly believe it...  I think I've just defined a world-class 
business object persistence framework, supporting polymorphism, 
multi-database objects, and mappings to arbitrary legacy schemas, complete 
with tunable and replaceable caching policies.

So I guess I'll stop here for tonight, before I find something wrong with 
any of the ideas.  :)

Perhaps tomorrow I'll try to more rigorously define the "aspects" of a 
jar's behavior (e.g. state mapping, caching, transaction support, etc.) and 
work out how to parameterize them with strategy objects or something like that.