[PEAK] DM refactoring and asynchrony (was Re: PEAK and Twisted, revisited)

Sun Apr 18 00:23:22 EDT 2004

Hi all,

I've been meaning to send these links to the list for a while but have 
been trying to understand the content well enough to be confident in 
recommending anyone read it ;-)

This email though has nugged me into action.

I think that there are some underlying similarities between peak.query, 
FO and Set oriented data.

I've been doing a lot of reading about ReiserFS and the "Future 
Vision":http://www.namesys.com/whitepaper.html which led me to a Russian 
authored book:
The Set Model for Database and Information Systems by Mikhail M. Gilula.

This book describes the mathematical model and a programming language 
StarSet. Not much available besides the book and it's no longer published.

I stumbled onto a Java extension language along the same lines:
"JavaSet - extending Java by persistent 
sets.":http://www.complang.tuwien.ac.at/markus/publications/IAITS99.ps

Also, a Microsoft language X# tries to accomplish much of the same 
thing.  Arrg! I can't find the url. Dang, I'll have to send it tomorrow 
after I find the white paper I know I printed off. Well, from memory 
this language tries to smooth over the difference between row and 
hierachical data by using streams and paths as the underlying structures.

Anyway, I thought this stuff might be of interest to the group.

John Heintz

Phillip J. Eby wrote:

> At 09:46 AM 4/16/04 +0200, Ulrich Eck wrote:
> 
>> hmm, afaik there is no simple peak<->twisted-demo around. we should
>> work one out i think comparable to the bulletins example. junction
>> from peakplace also uses twisted.pb, but not yet peak.events afaik.
> 
> 
> Last I heard, John was working on it.  Certainly, he submitted a number 
> of bug reports that would've required him to be using peak.events with 
> Twisted in Junction in order to find the problems he did.  :)
> 
> 
>> the reason why i didn't spend too much time on it is,
>> that pje plans to completly refactor the DataManager API to integrate
>> with peak.events and peak.query. a concrete timetable and design for
>> this refactoring is not yet available i think.
> 
> 
> The design is getting a bit more concrete, but not the timetable.  :(  
> It may not so much be a refactoring of the current API, as addition of 
> another completely different API.  But there is some theoretical work to 
> sort out first.
> 
> The future peak.query API will get rid of individual DM's in favor of an 
> EditingContext, similar to those in various other systems, such as ZODB 
> connection objects.  The container metaphor of DM's will fall by the 
> wayside here, in favor of a more integrated approach.
> 
> The part that is not yet clear to me is how OO, and how async the EC 
> interface will be.  I want EC's to support *conceptual* queries, and be 
> able to yield asynchronous events representing changes to objects.
> 
> Mostly what I'm doing now on the design is trying to get a metamodel in 
> my head that unites "Fact Orientation" with "Object Orientation".  Fact 
> orientation has no impedance mismatch between relational databases, 
> business or AI rules, or much of anything else.  But, it's quite a 
> mismatch with the current DM/domain object paradigm.  Indeed, it's as 
> inside-out as Twisted versus peak.events.
> 
> In a fact orientation approach, one deals with "elementary facts" that 
> relate 1 or more objects in some way.  For example, "Order #5573 totals 
> $127" is a "binary fact" relating the order object referenced "#5573" 
> and the dollar amount object referenced by "$127".  Most facts are 
> binary (i.e. involve two objects), but unaries and ternaries are also 
> possible.  Quaternaries and higher-order facts are uncommon, when 
> dealing with elementary (non-decomposable) facts.
> 
> (By the way, 'x[y]==z' can be considered a ternary fact, which may later 
> play into the implementation of mapping types in peak.model.)
> 
> Superficially, this fact orientation looks a lot like normal OO.  After 
> all, one might say 'orders[5573].total==127'.  Except there's no way to 
> go from the '127' back to '.total' let alone 'orders[5573]'.  In OO, one 
> can meaningfully look at the objects by themselves.  In FO, objects 
> don't really exist: only facts regarding objects are real.
> 
> In a sense, you could say that FO doesn't believe in objects, only 
> labels.  It assumes that the "objects" we are dealing with are in the 
> "real world", so what's in the computer is just a collection of facts 
> *about* the objects.  So, from the FO point of view, it's silly to talk 
> about having an instance of "order", because the "order" isn't in the 
> computer, and in fact may not even be a real thing at all!
> 
> FO *does* believe in object types, however, including inheritance and 
> polymorphism.  But one uses a "reference scheme" (such as order numbers) 
> to refer to an instance of an object.  For all practical purposes, 
> reference schemes are either strings, numbers, or tuples thereof.  Very 
> elementary data, in other words.
> 
> The tremendous advantage of this approach is that it has no impedance 
> mismatch with databases, rule systems, or form-based UIs, and only a low 
> mismatch with direct-manipulation UIs.  By contrast, OO-paradigm storage 
> systems like peak.storage and ZODB have a high impedance mismatch with 
> rule systems and form-based UIs, a medium mismatch with relational DBs, 
> and a low mismatch with direct-manipulation UIs.
> 
> Designing good constraint validation and business rules frameworks for 
> an OO model is *hard*.  Until I discovered FO (by way of ORM -- Object 
> Role Modelling), I assumed that it was because the problems themselves 
> were hard.  But from an FO perspective, it's actually pretty easy.  From 
> the OO viewpoint, plain attributes and sequences and mappings are all 
> very different ways of storing/using/managing an object's features.  
>  From the FO viewpoint, these are all just facts of different arity and 
> types.
> 
> This unifying characteristic makes the FO viewpoint a perfect 
> "intermediate representation" or "lingua franca" between different 
> levels of a system.  In fact, because of the ease of expressing complex 
> constraints in FO systems, and because FO models and constraints can be 
> specified or rendered as user-friendly English (or other language) 
> expressions, I believe it is in fact preferable to such modelling 
> languages as UML for gathering end-user requirements and defining the 
> domain model for an application.
> 
> The hard part of the design, that I'm chewing on in my spare time, is 
> how to create a Pythonic FO notation, that won't end up looking like 
> Prolog or Lisp!  It would seem that you could just generate current 
> peak.model-style classes from an FO model.  However, there's a potential 
> pitfall there.  Minor changes to a conceptual model (such as changing a 
> link from "one" to "many") result in more significant changes to a 
> peak.model-type system.  In other words, one would go from having 
> existing objects with 'x.y=z' to 'x.y=[z]'.  Code that was written based 
> on one access method would have to change to use the other.
> 
> If instead one creates form tools that are based on an FO perspective, 
> however, this issue vanishes.  One isn't doing 'x.y' in the first place, 
> but rather searching for 'x has y of ?', and expecting a collection of 
> matching facts.  A little further thought on this for such things as 
> business rules, DB access, and so on, quickly reveals that the actual 
> use cases for an "object oriented" model for business objects are 
> becoming rather scarce.
> 
> Indeed, the more I think about it the more I realize that trying to 
> implement a lot of typical business functions by direct mapping to the 
> OO paradigm actually *adds* complexity and effort.  Business in the real 
> world actually does consist mostly of data and rules.  Objects in the 
> real world are not at all encapsulated and they certainly do not enforce 
> their own business rules!
> 
> So, the fundamental paradigm for peak.query applications will be to have 
> an EditingContext against which queries are made, and that facts are 
> added to or removed from.  EditingContexts will also publish validation 
> facts.  That is, one source of facts from an editing context will be 
> facts about what fact changes you've made are violating business rules 
> or other validation.  But it's likely that these "constraint metafacts" 
> (to coin a phrase) will not be anything "special" in the system; it's 
> just that business rules will be able to be sources of facts.  Indeed, 
> anything that you would compute in an object's methods could be moved to 
> a business rule implementing a fact type.  For that matter, it becomes 
> silly to even talk about business rules: there are really only fact 
> types, and a fact type can be implemented by any computation you wish.
> 
> This is both more and less granular than a DM.  When you have related 
> objects in two DMs, right now you must write code in both to save and 
> load the linking attribute(s).  The same thing in an FO approach would 
> correspond to a single fact type, pairing the two objects.  So, there 
> would be only *one* implementation of that fact type, handling both 
> sides of the relationship.
> 
> As a practical matter, however, you won't write code to create fact 
> types for every possible attribute of every object type!  Instead, for 
> most applications you'll define relatively trivial mappings from fact 
> types to 2 or more columns from a relational DB, using relational 
> algebra operators similar to the prototypes that are already in 
> peak.query today.
> 
> But, unlike the peak.query of today, fact retrieval will be able to be 
> asynchronous.  That is, you'll be able to "subscribe" to a query, or 
> yield a task's execution until a new fact is asserted.  Even if your 
> application isn't doing event-driven I/O or using a reactor loop, you 
> could use these subscriptions to e.g. automatically raise an error when 
> a constraint is violated.  (In practice, the EditingContext will do this 
> when you ask that it commit your changes to its parent EditingContext, 
> if any.)  If you're writing a non-web GUI, you'd more likely subscribe 
> to such events in order to display status bar text or highlight an input 
> error.
> 
> Writing this email has actually helped me sort out quite a few of the 
> implementation details I had been wondering about, such as how business 
> rules and constraints should work.  Now I see that they do not require 
> anything "special", as they are just the same as any other kind of 
> fact.  They only appear to be "meta", but in truth they are constructed 
> by the same fundamental joining of facts as any other compound fact!  
> Very nice.
> 
> For a given fact type, one needs to know only how to assert it, retract 
> it, and query it.  And the EditingContext needs to have a source for 
> facts that would prevent a transaction from committing.  A possible API 
> for EditingContext might simply be to provide three methods, each 
> accepting a single fact object as its parameter.  For queries, one would 
> fill in unknown roles with "variables", ala Prolog.  So, something like:
> 
> amt = Variable()
> results = ec.find( Order.total(5573, amt) )
> 
> would make 'results' an event source that yields the fact 
> 'Order.total(5573,127)'.
> 
> In my imagination I can see Ty going "Eeew!" in disgust at this 
> hideous-seeming API.  Making queries for every attribute of an object?  
> Am I out of my mind?
> 
> No.  Here I'm merely illustrating an elementary fact type.  A compound 
> fact type might encompass quite a few items of data.  In fact, in can 
> encompass as many facts as you want.  It solves the OO "reporting 
> problem", in fact, because one can create queries of arbitrary 
> complexity without introducing duplicated implementation for the more 
> elementary components of those queries.  So, an application can define 
> its own compound fact types, and an EC will implement the operations by 
> delegation to the elementary fact types if there isn't a custom 
> implementation defined.  Such custom implementations can be useful for 
> providing specially hand-tuned SQL or other algorithms, if needed for 
> some critical aspect of an application.
> 
> Hm.  It's almost possible to go full circle to an OO approach, at least 
> for "flat" attributes like strings and numbers.  We could simply make 
> facts returned by queries *mutable*.  Setting an attribute on the fact 
> could automatically retract the previous version of the fact and assert 
> the new one.  We could possibly also provide some kind of convenience 
> feature to navigate to related objects, instead of just seeing their 
> keys, but this would have to be well thought-out in order to avoid the 
> traps of synchrony and container mismatch.  That is, it would have to be 
> capable of being asynchronous, and it would have to always provide a 
> sequence, rather than a single object, even if the role in principle is 
> required to always be exactly one object.  (Because between transaction 
> boundaries, invalid conditions are allowed to exist, to avoid 
> chicken-and-egg update problems.)
> 
> Speaking of the chicken-and-egg problem, mapping to an RDBMS could be 
> rather interesting.  The model as I've described it so far has no 
> built-in way to ensure that inserts and updates get done in a way that 
> honors RDBMS-level constraints.  Hm.  Actually, the mapping from fact 
> types to tables actually has to incorporate information about joins, so 
> the information to do it is there.  The details might get a bit 
> "interesting", but it should be possible to implement it one time, 
> rather than sweating the details in every DM with foreign keys, as one 
> must do now.  (Not that it's hard to do in today's DMs, but it's still 
> boilerplate or dead-chicken-waving or whatever you want to call it.)
> 
> Hm.  This is sounding simpler than I thought it would be, at least in 
> principle.  But I still need to:
> 
> * Work out the metamodel for fact types, how to map from constraints to 
> derived fact types that signal the violation of those constraints, and a 
> Pythonic notation for defining a fact-oriented domain model.  (For 
> example, to define a constraint on a fact type or set of fact types, one 
> must have a way to reference those types in code.)
> 
> * Define a mechanism for implementations of fact types, that allows for 
> the possibility of segmenting compound fact implementations (e.g. 
> database tables) as well as compounding elementary fact implementations, 
> in a way that doesn't lose the connectedness.  That is, if I issue a 
> compound query that has 3 facts answerable by a DB table, I don't want 
> to generate 3 queries against that table, just one.  And if the other 
> needed facts are in another table in the same DB, I want to generate a 
> query that joins them!  As it happens, a lot of the ground work for this 
> has already been done in peak.query.algebra.
> 
> * Define a framework for defining concrete schema mappings, in a way 
> that allows reuse.  That is, I might want to define mappings from a fact 
> schema to two different database schemas, but share an implementation of 
> various business rules between them.
> 
> Oh yeah, and then I need to actually implement all that.  Does that 
> maybe answer any questions you might have about whether this will be 
> done soon?  :)
> 
> One final thought...  that earlier example of the API could probably use 
> a little rework.  Maybe:
> 
>     # Query a fact type, given a context
>     results = order_total(ec).find(5573, Variable())
> 
>     # Assert a fact
>     order_total(ec).add(5573, 127)
> 
>     # Retract a fact
>     order_total(ec).remove(5573, 127)
> 
>     # shortcut for repeated use w/same editing context
>     order_total = order_total(ec)
> 
> Here, the idea is that a fact type such as 'order_total' is a protocol 
> that you can adapt the editing context to.  Thus, one would create 
> different EC subclasses to define a specific schema mapping.  The EC 
> itself doesn't actually implement anything, however.  Instead, you 
> simply declare adapters (probably sticky ones) for the fact types that 
> can be implemented in the schema.  And if a compound fact type doesn't 
> have a specific adaptation, it just creates a generic compound 
> implementation over the adaptation of the fact types it comprises.
> 
> This still isn't perfect, in that this API doesn't yet explicitly deal 
> with asynchronous results, or how to "listen" to assertions or 
> retractions, or how to deal with queries when there are assertions or 
> retractions that haven't been "flushed" to the physical storage scheme.  
> But it's getting there.  In fact, it's probably close enough to allow 
> prototyping some toy schema implementations (sans metamodels) to explore 
> how editing contexts and fact type implementations could/should work.
> 
> _______________________________________________
> PEAK mailing list
> PEAK at eby-sarna.com
> http://www.eby-sarna.com/mailman/listinfo/peak
>