[PEAK] DM refactoring and asynchrony (was Re: PEAK and Twisted, revisited)

Sun Apr 18 14:58:46 EDT 2004

Found the Microsoft research paper finally!

Via Lambda the Ultimate:
http://lambda.weblogs.com/discuss/msgReader$6427

John

John D. Heintz wrote:

> Hi all,
> 
> I've been meaning to send these links to the list for a while but have 
> been trying to understand the content well enough to be confident in 
> recommending anyone read it ;-)
> 
> This email though has nugged me into action.
> 
> I think that there are some underlying similarities between peak.query, 
> FO and Set oriented data.
> 
> I've been doing a lot of reading about ReiserFS and the "Future 
> Vision":http://www.namesys.com/whitepaper.html which led me to a Russian 
> authored book:
> The Set Model for Database and Information Systems by Mikhail M. Gilula.
> 
> This book describes the mathematical model and a programming language 
> StarSet. Not much available besides the book and it's no longer published.
> 
> I stumbled onto a Java extension language along the same lines:
> "JavaSet - extending Java by persistent 
> sets.":http://www.complang.tuwien.ac.at/markus/publications/IAITS99.ps
> 
> Also, a Microsoft language X# tries to accomplish much of the same 
> thing.  Arrg! I can't find the url. Dang, I'll have to send it tomorrow 
> after I find the white paper I know I printed off. Well, from memory 
> this language tries to smooth over the difference between row and 
> hierachical data by using streams and paths as the underlying structures.
> 
> Anyway, I thought this stuff might be of interest to the group.
> 
> John Heintz
> 
> Phillip J. Eby wrote:
> 
>> At 09:46 AM 4/16/04 +0200, Ulrich Eck wrote:
>>
>>> hmm, afaik there is no simple peak<->twisted-demo around. we should
>>> work one out i think comparable to the bulletins example. junction
>>> from peakplace also uses twisted.pb, but not yet peak.events afaik.
>>
>>
>>
>> Last I heard, John was working on it.  Certainly, he submitted a 
>> number of bug reports that would've required him to be using 
>> peak.events with Twisted in Junction in order to find the problems he 
>> did.  :)
>>
>>
>>> the reason why i didn't spend too much time on it is,
>>> that pje plans to completly refactor the DataManager API to integrate
>>> with peak.events and peak.query. a concrete timetable and design for
>>> this refactoring is not yet available i think.
>>
>>
>>
>> The design is getting a bit more concrete, but not the timetable.  :(  
>> It may not so much be a refactoring of the current API, as addition of 
>> another completely different API.  But there is some theoretical work 
>> to sort out first.
>>
>> The future peak.query API will get rid of individual DM's in favor of 
>> an EditingContext, similar to those in various other systems, such as 
>> ZODB connection objects.  The container metaphor of DM's will fall by 
>> the wayside here, in favor of a more integrated approach.
>>
>> The part that is not yet clear to me is how OO, and how async the EC 
>> interface will be.  I want EC's to support *conceptual* queries, and 
>> be able to yield asynchronous events representing changes to objects.
>>
>> Mostly what I'm doing now on the design is trying to get a metamodel 
>> in my head that unites "Fact Orientation" with "Object Orientation".  
>> Fact orientation has no impedance mismatch between relational 
>> databases, business or AI rules, or much of anything else.  But, it's 
>> quite a mismatch with the current DM/domain object paradigm.  Indeed, 
>> it's as inside-out as Twisted versus peak.events.
>>
>> In a fact orientation approach, one deals with "elementary facts" that 
>> relate 1 or more objects in some way.  For example, "Order #5573 
>> totals $127" is a "binary fact" relating the order object referenced 
>> "#5573" and the dollar amount object referenced by "$127".  Most facts 
>> are binary (i.e. involve two objects), but unaries and ternaries are 
>> also possible.  Quaternaries and higher-order facts are uncommon, when 
>> dealing with elementary (non-decomposable) facts.
>>
>> (By the way, 'x[y]==z' can be considered a ternary fact, which may 
>> later play into the implementation of mapping types in peak.model.)
>>
>> Superficially, this fact orientation looks a lot like normal OO.  
>> After all, one might say 'orders[5573].total==127'.  Except there's no 
>> way to go from the '127' back to '.total' let alone 'orders[5573]'.  
>> In OO, one can meaningfully look at the objects by themselves.  In FO, 
>> objects don't really exist: only facts regarding objects are real.
>>
>> In a sense, you could say that FO doesn't believe in objects, only 
>> labels.  It assumes that the "objects" we are dealing with are in the 
>> "real world", so what's in the computer is just a collection of facts 
>> *about* the objects.  So, from the FO point of view, it's silly to 
>> talk about having an instance of "order", because the "order" isn't in 
>> the computer, and in fact may not even be a real thing at all!
>>
>> FO *does* believe in object types, however, including inheritance and 
>> polymorphism.  But one uses a "reference scheme" (such as order 
>> numbers) to refer to an instance of an object.  For all practical 
>> purposes, reference schemes are either strings, numbers, or tuples 
>> thereof.  Very elementary data, in other words.
>>
>> The tremendous advantage of this approach is that it has no impedance 
>> mismatch with databases, rule systems, or form-based UIs, and only a 
>> low mismatch with direct-manipulation UIs.  By contrast, OO-paradigm 
>> storage systems like peak.storage and ZODB have a high impedance 
>> mismatch with rule systems and form-based UIs, a medium mismatch with 
>> relational DBs, and a low mismatch with direct-manipulation UIs.
>>
>> Designing good constraint validation and business rules frameworks for 
>> an OO model is *hard*.  Until I discovered FO (by way of ORM -- Object 
>> Role Modelling), I assumed that it was because the problems themselves 
>> were hard.  But from an FO perspective, it's actually pretty easy.  
>> From the OO viewpoint, plain attributes and sequences and mappings are 
>> all very different ways of storing/using/managing an object's 
>> features.   From the FO viewpoint, these are all just facts of 
>> different arity and types.
>>
>> This unifying characteristic makes the FO viewpoint a perfect 
>> "intermediate representation" or "lingua franca" between different 
>> levels of a system.  In fact, because of the ease of expressing 
>> complex constraints in FO systems, and because FO models and 
>> constraints can be specified or rendered as user-friendly English (or 
>> other language) expressions, I believe it is in fact preferable to 
>> such modelling languages as UML for gathering end-user requirements 
>> and defining the domain model for an application.
>>
>> The hard part of the design, that I'm chewing on in my spare time, is 
>> how to create a Pythonic FO notation, that won't end up looking like 
>> Prolog or Lisp!  It would seem that you could just generate current 
>> peak.model-style classes from an FO model.  However, there's a 
>> potential pitfall there.  Minor changes to a conceptual model (such as 
>> changing a link from "one" to "many") result in more significant 
>> changes to a peak.model-type system.  In other words, one would go 
>> from having existing objects with 'x.y=z' to 'x.y=[z]'.  Code that was 
>> written based on one access method would have to change to use the other.
>>
>> If instead one creates form tools that are based on an FO perspective, 
>> however, this issue vanishes.  One isn't doing 'x.y' in the first 
>> place, but rather searching for 'x has y of ?', and expecting a 
>> collection of matching facts.  A little further thought on this for 
>> such things as business rules, DB access, and so on, quickly reveals 
>> that the actual use cases for an "object oriented" model for business 
>> objects are becoming rather scarce.
>>
>> Indeed, the more I think about it the more I realize that trying to 
>> implement a lot of typical business functions by direct mapping to the 
>> OO paradigm actually *adds* complexity and effort.  Business in the 
>> real world actually does consist mostly of data and rules.  Objects in 
>> the real world are not at all encapsulated and they certainly do not 
>> enforce their own business rules!
>>
>> So, the fundamental paradigm for peak.query applications will be to 
>> have an EditingContext against which queries are made, and that facts 
>> are added to or removed from.  EditingContexts will also publish 
>> validation facts.  That is, one source of facts from an editing 
>> context will be facts about what fact changes you've made are 
>> violating business rules or other validation.  But it's likely that 
>> these "constraint metafacts" (to coin a phrase) will not be anything 
>> "special" in the system; it's just that business rules will be able to 
>> be sources of facts.  Indeed, anything that you would compute in an 
>> object's methods could be moved to a business rule implementing a fact 
>> type.  For that matter, it becomes silly to even talk about business 
>> rules: there are really only fact types, and a fact type can be 
>> implemented by any computation you wish.
>>
>> This is both more and less granular than a DM.  When you have related 
>> objects in two DMs, right now you must write code in both to save and 
>> load the linking attribute(s).  The same thing in an FO approach would 
>> correspond to a single fact type, pairing the two objects.  So, there 
>> would be only *one* implementation of that fact type, handling both 
>> sides of the relationship.
>>
>> As a practical matter, however, you won't write code to create fact 
>> types for every possible attribute of every object type!  Instead, for 
>> most applications you'll define relatively trivial mappings from fact 
>> types to 2 or more columns from a relational DB, using relational 
>> algebra operators similar to the prototypes that are already in 
>> peak.query today.
>>
>> But, unlike the peak.query of today, fact retrieval will be able to be 
>> asynchronous.  That is, you'll be able to "subscribe" to a query, or 
>> yield a task's execution until a new fact is asserted.  Even if your 
>> application isn't doing event-driven I/O or using a reactor loop, you 
>> could use these subscriptions to e.g. automatically raise an error 
>> when a constraint is violated.  (In practice, the EditingContext will 
>> do this when you ask that it commit your changes to its parent 
>> EditingContext, if any.)  If you're writing a non-web GUI, you'd more 
>> likely subscribe to such events in order to display status bar text or 
>> highlight an input error.
>>
>> Writing this email has actually helped me sort out quite a few of the 
>> implementation details I had been wondering about, such as how 
>> business rules and constraints should work.  Now I see that they do 
>> not require anything "special", as they are just the same as any other 
>> kind of fact.  They only appear to be "meta", but in truth they are 
>> constructed by the same fundamental joining of facts as any other 
>> compound fact!  Very nice.
>>
>> For a given fact type, one needs to know only how to assert it, 
>> retract it, and query it.  And the EditingContext needs to have a 
>> source for facts that would prevent a transaction from committing.  A 
>> possible API for EditingContext might simply be to provide three 
>> methods, each accepting a single fact object as its parameter.  For 
>> queries, one would fill in unknown roles with "variables", ala 
>> Prolog.  So, something like:
>>
>> amt = Variable()
>> results = ec.find( Order.total(5573, amt) )
>>
>> would make 'results' an event source that yields the fact 
>> 'Order.total(5573,127)'.
>>
>> In my imagination I can see Ty going "Eeew!" in disgust at this 
>> hideous-seeming API.  Making queries for every attribute of an 
>> object?  Am I out of my mind?
>>
>> No.  Here I'm merely illustrating an elementary fact type.  A compound 
>> fact type might encompass quite a few items of data.  In fact, in can 
>> encompass as many facts as you want.  It solves the OO "reporting 
>> problem", in fact, because one can create queries of arbitrary 
>> complexity without introducing duplicated implementation for the more 
>> elementary components of those queries.  So, an application can define 
>> its own compound fact types, and an EC will implement the operations 
>> by delegation to the elementary fact types if there isn't a custom 
>> implementation defined.  Such custom implementations can be useful for 
>> providing specially hand-tuned SQL or other algorithms, if needed for 
>> some critical aspect of an application.
>>
>> Hm.  It's almost possible to go full circle to an OO approach, at 
>> least for "flat" attributes like strings and numbers.  We could simply 
>> make facts returned by queries *mutable*.  Setting an attribute on the 
>> fact could automatically retract the previous version of the fact and 
>> assert the new one.  We could possibly also provide some kind of 
>> convenience feature to navigate to related objects, instead of just 
>> seeing their keys, but this would have to be well thought-out in order 
>> to avoid the traps of synchrony and container mismatch.  That is, it 
>> would have to be capable of being asynchronous, and it would have to 
>> always provide a sequence, rather than a single object, even if the 
>> role in principle is required to always be exactly one object.  
>> (Because between transaction boundaries, invalid conditions are 
>> allowed to exist, to avoid chicken-and-egg update problems.)
>>
>> Speaking of the chicken-and-egg problem, mapping to an RDBMS could be 
>> rather interesting.  The model as I've described it so far has no 
>> built-in way to ensure that inserts and updates get done in a way that 
>> honors RDBMS-level constraints.  Hm.  Actually, the mapping from fact 
>> types to tables actually has to incorporate information about joins, 
>> so the information to do it is there.  The details might get a bit 
>> "interesting", but it should be possible to implement it one time, 
>> rather than sweating the details in every DM with foreign keys, as one 
>> must do now.  (Not that it's hard to do in today's DMs, but it's still 
>> boilerplate or dead-chicken-waving or whatever you want to call it.)
>>
>> Hm.  This is sounding simpler than I thought it would be, at least in 
>> principle.  But I still need to:
>>
>> * Work out the metamodel for fact types, how to map from constraints 
>> to derived fact types that signal the violation of those constraints, 
>> and a Pythonic notation for defining a fact-oriented domain model.  
>> (For example, to define a constraint on a fact type or set of fact 
>> types, one must have a way to reference those types in code.)
>>
>> * Define a mechanism for implementations of fact types, that allows 
>> for the possibility of segmenting compound fact implementations (e.g. 
>> database tables) as well as compounding elementary fact 
>> implementations, in a way that doesn't lose the connectedness.  That 
>> is, if I issue a compound query that has 3 facts answerable by a DB 
>> table, I don't want to generate 3 queries against that table, just 
>> one.  And if the other needed facts are in another table in the same 
>> DB, I want to generate a query that joins them!  As it happens, a lot 
>> of the ground work for this has already been done in peak.query.algebra.
>>
>> * Define a framework for defining concrete schema mappings, in a way 
>> that allows reuse.  That is, I might want to define mappings from a 
>> fact schema to two different database schemas, but share an 
>> implementation of various business rules between them.
>>
>> Oh yeah, and then I need to actually implement all that.  Does that 
>> maybe answer any questions you might have about whether this will be 
>> done soon?  :)
>>
>> One final thought...  that earlier example of the API could probably 
>> use a little rework.  Maybe:
>>
>>     # Query a fact type, given a context
>>     results = order_total(ec).find(5573, Variable())
>>
>>     # Assert a fact
>>     order_total(ec).add(5573, 127)
>>
>>     # Retract a fact
>>     order_total(ec).remove(5573, 127)
>>
>>     # shortcut for repeated use w/same editing context
>>     order_total = order_total(ec)
>>
>> Here, the idea is that a fact type such as 'order_total' is a protocol 
>> that you can adapt the editing context to.  Thus, one would create 
>> different EC subclasses to define a specific schema mapping.  The EC 
>> itself doesn't actually implement anything, however.  Instead, you 
>> simply declare adapters (probably sticky ones) for the fact types that 
>> can be implemented in the schema.  And if a compound fact type doesn't 
>> have a specific adaptation, it just creates a generic compound 
>> implementation over the adaptation of the fact types it comprises.
>>
>> This still isn't perfect, in that this API doesn't yet explicitly deal 
>> with asynchronous results, or how to "listen" to assertions or 
>> retractions, or how to deal with queries when there are assertions or 
>> retractions that haven't been "flushed" to the physical storage 
>> scheme.  But it's getting there.  In fact, it's probably close enough 
>> to allow prototyping some toy schema implementations (sans metamodels) 
>> to explore how editing contexts and fact type implementations 
>> could/should work.
>>
>> _______________________________________________
>> PEAK mailing list
>> PEAK at eby-sarna.com
>> http://www.eby-sarna.com/mailman/listinfo/peak
>>
> 
> 
>