[TransWarp] Basic "storage jar" design

Sun Jun 30 17:05:12 EDT 2002

On Sun, 2002-06-30 at 16:43, Phillip J. Eby wrote:
> At 04:11 PM 6/30/02 +0200, Roché Compaan wrote:
> >Hi Phillip
> >
> >I didn't understand enough of your previous post "PEAK persistence based
> >on ZODB4, continued" because my brain exploded every second paragraph. I
> >wasn't too concerned because it seemed that you and yourself first
> >needed to talk through it :)
> 
> Yes, I've started using letters to the mailing list as a substitute for 
> talking to Ty to work out my ideas, when he's not readily available.  :)
> 
> Not too long ago, I found out there's actually a name for the way I do my 
> thinking; it's called "Image Streaming".  The idea is that you dump out the 
> contents of your brain to another human being with the intent of having 
> them understand the ideas you're putting forth, and it frees you from 
> having to hold on tightly to any one idea as you go.  It also creates a 
> kind of feedback loop that helps you refine and clarify the initially vague 
> intuitive concepts that come to mind.  Anyway, I've been doing it for many 
> many years without having a name for it.  It's only been in the last month 
> or so, however, that I've realized I can do a form of it by writing down 
> the ideas in the form of a letter or proposal or whatever to someone else.  :)

I won't try that technique publicly just yet because I will probably
"stream" a lot of white space.  For now I'll use my limited mental
bandwidth for incoming streams, but don't be surprised if I stream a
whole lotta question marks back :)

Seriously though, I use a similar technique when trying to understand
other people's ideas.  I just dump any questions that comes to mind and
quite often the answer lies in the question itself. Or I just try
formulate the original idea in my own words.

> >What will a query jar do?  I assume they will remember query results to
> >prevent re-querying the underlying database?
> 
> They do several things, none of which I really ever explained thoroughly.  :)
> 
> Think about a two-way association between objects - say your 
> person/department example.  If the person  table has a foreign key 
> reference [1->1] to department, then department has an implicit [1->n] 
> relationship to person.  A query jar could be used to represent this 
> inverse relationship, so that when a department object's state is loaded, a 
> "ghost" from the query jar (with the department ID as its oid) is placed as 
> the "people" attribute of the loaded department object.  Any attempt to 
> *use* this people attribute will cause its state to be loaded from the 
> query jar - a list of ghosts of person objects, retrieved by a query 
> against the persons table.  Of course, since you're querying the persons 
> table, you may as well pass that state through to 'preloadState()' on the 
> person jar, so the person jar won't reload that data when you access one of 
> the ghosts.  (Of course, if the state is loaded they won't be ghosts, but 
> anyway...)

Awesome! Awesome! Awesome!

> > > * oidFor(ob) -- Called by save() operations of other jars to get foreign
> > > key values for objects referenced in their states.  Implementation: if
> > > ob._p_jar is self, return ob._p_oid, unless _p_oid is None, in which case
> > > save the object using oid = ob._p_oid = self.new(ob), and return the
> > > oid.  If the _p_jar is NOT self, return self.thunk(ob) to try to translate
> > > the reference or create a stub.
> >
> >So if I need to save an instance of "Person" which references an
> >instance of "Deparment" I can call "oidFor(ADepartment)" on the
> >DepartmentJar to get the department's id.  When will _p_jar not be self?
> >Won't all objects returned by the DepartmentJar have their _p_jar set to
> >the DepartmentJar?
> 
> Yes, *but* it is not necessarily the case that you'll be putting a 
> department object from *that* department jar there.  Suppose you were 
> working in an RDBMS, but the source of department existence was an LDAP 
> directory.  You might set aPerson.department = 
> aDepartmentFromAnLDAPJar.  When saving aPerson, you ask the 
> SQLDepartmentJar for an oid, and it has to create a thunk or stub reference 
> in the SQL database that is referenceable as a department key, but has some 
> kind of linkage to the LDAP-based department info.  That's what the thunk() 
> method is for.  As I noted, it's not something you'll support often, but Ty 
> and I have multiple apps which do this sort of cross-DB referencing for one 
> or two object types.

I thought it had something to do with cross-DB referencing.  But is the
SQLDepartmentJar really necessary?  Can't PersonJar (SQL-based) just ask
DepartmentJar (LDAP-based) for an oid? PersonJar doesn't really have to
know that DepartmentJar is LDAP-based, it is only asking DepartmentJar
for an oid.

> >So if an object's state is set to "loaded" by __setstate__ you still
> >have an empty instance.  The only difference being that it's state is
> >set.  When does data retrieval happen for this instance, especially
> >since its "loaded" state will prevent it.  What am I missing?
> 
> If the state is loaded, it's not a ghost, and it has everything it needs.

> >"__getitem__" returns an object from the cache or a ghost if its not in
> >the cache.
> 
> Yes.  preloadState() is similar, except that it *may* return a non-ghost, 
> fully loaded object.

Then I think one can actually drop the method "ghost" and just call 
preloadState(oid, state="ghost").

> > > * new(ob) -- save new object 'ob' and return its oid (by generating it or
> > > extracting it from state)
> >
> >What about foreign key constraints in the underlying db?  Not that I
> >really use them - I think it is the application's responsibility to
> >govern relationships between objects.
> 
> I presume you're talking about ensuring that the referenced object exists 
> before it's referred to?  That's actually handled by way of 
> 'oidFor()'.  Think about it.  When you save the state for 'aPerson', it has 
> to get the 'oidFor()' of all its foreign key references before it can do an 
> SQL "UPDATE" to save them.  If any of them need new ID's, oidFor() will 
> cause them to be created and saved *before* the update can point the 
> foreign key to them.  Thus, relational integrity is guaranteed by the 
> normal operation of the framework, which is just beautiful, IMHO.  :)

It is.  I've seen very few persistence frameworks that don't get bitched
around by foreign keys - that's why I try to avoid them.

> >For those who don't know, "Jar" comes straight from your fridge.  When
> >you want to preserve food, you pickle it and put it in a Jar.  The same
> >goes for objects that you want to persist: you pickle it and put it in a
> >Jar.  Sometimes it helps to explain what was obvious once an has since
> >been forgotten.
> 
> Actually, "storage jars" for me is a reference to a Monty Python 
> sketch!  But I did start with the term "jar" since the ZODB persistence 
> framework has the _p_jar concept, which does come from "pickle jars" as 
> used by Jim Fulton, which came from Python pickle, which I think came from 
> some other language's notion of pickling.  The politically correct term for 
> a jar is now a "persistent data manager", as expressed by the 
> IPersistentDataManager interface and lots of references to "dm's" and "data 
> manager" in the C and Python code of ZODB 4.

I'm sure between StarTrek and Monty Python we will find names galore :)

Btw, where can I look if I want to have a look at the ZODB 4 code?  I
have a Zope3 checkout and I noticed the interfaces you mentioned are in
there.  Is there any other place I should be looking as well?

> But I like "storage jars" better, at least as a working term.  I'm not sure 
> it really belongs in the businesslike terminology of PEAK, and we might 
> actually be better off calling them "Racks", as they are very close in 
> concept and function to the Racks in ZPatterns.  The main difference is 
> that there were no "alternate key" racks or "query" racks in ZPatterns, at 
> least as a promoted concept.  Which isn't to say that nobody ever 
> implemented query or alternate key racks; I'm sure they did.  There just 
> weren't names for the concepts.

Well maybe "DataManager" is not loaded metaphorically but it is
intuitively understandable (which was a big barrier for ZPatterns). 

> Anyway, final terminology can wait a bit, since there's no code as yet.

But let's not neglect it :)

-- 
Roché Compaan
Upfront Systems                 http://www.upfrontsystems.co.za