[PEAK] DM refactoring and asynchrony (was Re: PEAK and Twisted,
revisited)
John D. Heintz
jheintz at pobox.com
Sun Apr 18 00:23:22 EDT 2004
Hi all,
I've been meaning to send these links to the list for a while but have
been trying to understand the content well enough to be confident in
recommending anyone read it ;-)
This email though has nugged me into action.
I think that there are some underlying similarities between peak.query,
FO and Set oriented data.
I've been doing a lot of reading about ReiserFS and the "Future
Vision":http://www.namesys.com/whitepaper.html which led me to a Russian
authored book:
The Set Model for Database and Information Systems by Mikhail M. Gilula.
This book describes the mathematical model and a programming language
StarSet. Not much available besides the book and it's no longer published.
I stumbled onto a Java extension language along the same lines:
"JavaSet - extending Java by persistent
sets.":http://www.complang.tuwien.ac.at/markus/publications/IAITS99.ps
Also, a Microsoft language X# tries to accomplish much of the same
thing. Arrg! I can't find the url. Dang, I'll have to send it tomorrow
after I find the white paper I know I printed off. Well, from memory
this language tries to smooth over the difference between row and
hierachical data by using streams and paths as the underlying structures.
Anyway, I thought this stuff might be of interest to the group.
John Heintz
Phillip J. Eby wrote:
> At 09:46 AM 4/16/04 +0200, Ulrich Eck wrote:
>
>> hmm, afaik there is no simple peak<->twisted-demo around. we should
>> work one out i think comparable to the bulletins example. junction
>> from peakplace also uses twisted.pb, but not yet peak.events afaik.
>
>
> Last I heard, John was working on it. Certainly, he submitted a number
> of bug reports that would've required him to be using peak.events with
> Twisted in Junction in order to find the problems he did. :)
>
>
>> the reason why i didn't spend too much time on it is,
>> that pje plans to completly refactor the DataManager API to integrate
>> with peak.events and peak.query. a concrete timetable and design for
>> this refactoring is not yet available i think.
>
>
> The design is getting a bit more concrete, but not the timetable. :(
> It may not so much be a refactoring of the current API, as addition of
> another completely different API. But there is some theoretical work to
> sort out first.
>
> The future peak.query API will get rid of individual DM's in favor of an
> EditingContext, similar to those in various other systems, such as ZODB
> connection objects. The container metaphor of DM's will fall by the
> wayside here, in favor of a more integrated approach.
>
> The part that is not yet clear to me is how OO, and how async the EC
> interface will be. I want EC's to support *conceptual* queries, and be
> able to yield asynchronous events representing changes to objects.
>
> Mostly what I'm doing now on the design is trying to get a metamodel in
> my head that unites "Fact Orientation" with "Object Orientation". Fact
> orientation has no impedance mismatch between relational databases,
> business or AI rules, or much of anything else. But, it's quite a
> mismatch with the current DM/domain object paradigm. Indeed, it's as
> inside-out as Twisted versus peak.events.
>
> In a fact orientation approach, one deals with "elementary facts" that
> relate 1 or more objects in some way. For example, "Order #5573 totals
> $127" is a "binary fact" relating the order object referenced "#5573"
> and the dollar amount object referenced by "$127". Most facts are
> binary (i.e. involve two objects), but unaries and ternaries are also
> possible. Quaternaries and higher-order facts are uncommon, when
> dealing with elementary (non-decomposable) facts.
>
> (By the way, 'x[y]==z' can be considered a ternary fact, which may later
> play into the implementation of mapping types in peak.model.)
>
> Superficially, this fact orientation looks a lot like normal OO. After
> all, one might say 'orders[5573].total==127'. Except there's no way to
> go from the '127' back to '.total' let alone 'orders[5573]'. In OO, one
> can meaningfully look at the objects by themselves. In FO, objects
> don't really exist: only facts regarding objects are real.
>
> In a sense, you could say that FO doesn't believe in objects, only
> labels. It assumes that the "objects" we are dealing with are in the
> "real world", so what's in the computer is just a collection of facts
> *about* the objects. So, from the FO point of view, it's silly to talk
> about having an instance of "order", because the "order" isn't in the
> computer, and in fact may not even be a real thing at all!
>
> FO *does* believe in object types, however, including inheritance and
> polymorphism. But one uses a "reference scheme" (such as order numbers)
> to refer to an instance of an object. For all practical purposes,
> reference schemes are either strings, numbers, or tuples thereof. Very
> elementary data, in other words.
>
> The tremendous advantage of this approach is that it has no impedance
> mismatch with databases, rule systems, or form-based UIs, and only a low
> mismatch with direct-manipulation UIs. By contrast, OO-paradigm storage
> systems like peak.storage and ZODB have a high impedance mismatch with
> rule systems and form-based UIs, a medium mismatch with relational DBs,
> and a low mismatch with direct-manipulation UIs.
>
> Designing good constraint validation and business rules frameworks for
> an OO model is *hard*. Until I discovered FO (by way of ORM -- Object
> Role Modelling), I assumed that it was because the problems themselves
> were hard. But from an FO perspective, it's actually pretty easy. From
> the OO viewpoint, plain attributes and sequences and mappings are all
> very different ways of storing/using/managing an object's features.
> From the FO viewpoint, these are all just facts of different arity and
> types.
>
> This unifying characteristic makes the FO viewpoint a perfect
> "intermediate representation" or "lingua franca" between different
> levels of a system. In fact, because of the ease of expressing complex
> constraints in FO systems, and because FO models and constraints can be
> specified or rendered as user-friendly English (or other language)
> expressions, I believe it is in fact preferable to such modelling
> languages as UML for gathering end-user requirements and defining the
> domain model for an application.
>
> The hard part of the design, that I'm chewing on in my spare time, is
> how to create a Pythonic FO notation, that won't end up looking like
> Prolog or Lisp! It would seem that you could just generate current
> peak.model-style classes from an FO model. However, there's a potential
> pitfall there. Minor changes to a conceptual model (such as changing a
> link from "one" to "many") result in more significant changes to a
> peak.model-type system. In other words, one would go from having
> existing objects with 'x.y=z' to 'x.y=[z]'. Code that was written based
> on one access method would have to change to use the other.
>
> If instead one creates form tools that are based on an FO perspective,
> however, this issue vanishes. One isn't doing 'x.y' in the first place,
> but rather searching for 'x has y of ?', and expecting a collection of
> matching facts. A little further thought on this for such things as
> business rules, DB access, and so on, quickly reveals that the actual
> use cases for an "object oriented" model for business objects are
> becoming rather scarce.
>
> Indeed, the more I think about it the more I realize that trying to
> implement a lot of typical business functions by direct mapping to the
> OO paradigm actually *adds* complexity and effort. Business in the real
> world actually does consist mostly of data and rules. Objects in the
> real world are not at all encapsulated and they certainly do not enforce
> their own business rules!
>
> So, the fundamental paradigm for peak.query applications will be to have
> an EditingContext against which queries are made, and that facts are
> added to or removed from. EditingContexts will also publish validation
> facts. That is, one source of facts from an editing context will be
> facts about what fact changes you've made are violating business rules
> or other validation. But it's likely that these "constraint metafacts"
> (to coin a phrase) will not be anything "special" in the system; it's
> just that business rules will be able to be sources of facts. Indeed,
> anything that you would compute in an object's methods could be moved to
> a business rule implementing a fact type. For that matter, it becomes
> silly to even talk about business rules: there are really only fact
> types, and a fact type can be implemented by any computation you wish.
>
> This is both more and less granular than a DM. When you have related
> objects in two DMs, right now you must write code in both to save and
> load the linking attribute(s). The same thing in an FO approach would
> correspond to a single fact type, pairing the two objects. So, there
> would be only *one* implementation of that fact type, handling both
> sides of the relationship.
>
> As a practical matter, however, you won't write code to create fact
> types for every possible attribute of every object type! Instead, for
> most applications you'll define relatively trivial mappings from fact
> types to 2 or more columns from a relational DB, using relational
> algebra operators similar to the prototypes that are already in
> peak.query today.
>
> But, unlike the peak.query of today, fact retrieval will be able to be
> asynchronous. That is, you'll be able to "subscribe" to a query, or
> yield a task's execution until a new fact is asserted. Even if your
> application isn't doing event-driven I/O or using a reactor loop, you
> could use these subscriptions to e.g. automatically raise an error when
> a constraint is violated. (In practice, the EditingContext will do this
> when you ask that it commit your changes to its parent EditingContext,
> if any.) If you're writing a non-web GUI, you'd more likely subscribe
> to such events in order to display status bar text or highlight an input
> error.
>
> Writing this email has actually helped me sort out quite a few of the
> implementation details I had been wondering about, such as how business
> rules and constraints should work. Now I see that they do not require
> anything "special", as they are just the same as any other kind of
> fact. They only appear to be "meta", but in truth they are constructed
> by the same fundamental joining of facts as any other compound fact!
> Very nice.
>
> For a given fact type, one needs to know only how to assert it, retract
> it, and query it. And the EditingContext needs to have a source for
> facts that would prevent a transaction from committing. A possible API
> for EditingContext might simply be to provide three methods, each
> accepting a single fact object as its parameter. For queries, one would
> fill in unknown roles with "variables", ala Prolog. So, something like:
>
> amt = Variable()
> results = ec.find( Order.total(5573, amt) )
>
> would make 'results' an event source that yields the fact
> 'Order.total(5573,127)'.
>
> In my imagination I can see Ty going "Eeew!" in disgust at this
> hideous-seeming API. Making queries for every attribute of an object?
> Am I out of my mind?
>
> No. Here I'm merely illustrating an elementary fact type. A compound
> fact type might encompass quite a few items of data. In fact, in can
> encompass as many facts as you want. It solves the OO "reporting
> problem", in fact, because one can create queries of arbitrary
> complexity without introducing duplicated implementation for the more
> elementary components of those queries. So, an application can define
> its own compound fact types, and an EC will implement the operations by
> delegation to the elementary fact types if there isn't a custom
> implementation defined. Such custom implementations can be useful for
> providing specially hand-tuned SQL or other algorithms, if needed for
> some critical aspect of an application.
>
> Hm. It's almost possible to go full circle to an OO approach, at least
> for "flat" attributes like strings and numbers. We could simply make
> facts returned by queries *mutable*. Setting an attribute on the fact
> could automatically retract the previous version of the fact and assert
> the new one. We could possibly also provide some kind of convenience
> feature to navigate to related objects, instead of just seeing their
> keys, but this would have to be well thought-out in order to avoid the
> traps of synchrony and container mismatch. That is, it would have to be
> capable of being asynchronous, and it would have to always provide a
> sequence, rather than a single object, even if the role in principle is
> required to always be exactly one object. (Because between transaction
> boundaries, invalid conditions are allowed to exist, to avoid
> chicken-and-egg update problems.)
>
> Speaking of the chicken-and-egg problem, mapping to an RDBMS could be
> rather interesting. The model as I've described it so far has no
> built-in way to ensure that inserts and updates get done in a way that
> honors RDBMS-level constraints. Hm. Actually, the mapping from fact
> types to tables actually has to incorporate information about joins, so
> the information to do it is there. The details might get a bit
> "interesting", but it should be possible to implement it one time,
> rather than sweating the details in every DM with foreign keys, as one
> must do now. (Not that it's hard to do in today's DMs, but it's still
> boilerplate or dead-chicken-waving or whatever you want to call it.)
>
> Hm. This is sounding simpler than I thought it would be, at least in
> principle. But I still need to:
>
> * Work out the metamodel for fact types, how to map from constraints to
> derived fact types that signal the violation of those constraints, and a
> Pythonic notation for defining a fact-oriented domain model. (For
> example, to define a constraint on a fact type or set of fact types, one
> must have a way to reference those types in code.)
>
> * Define a mechanism for implementations of fact types, that allows for
> the possibility of segmenting compound fact implementations (e.g.
> database tables) as well as compounding elementary fact implementations,
> in a way that doesn't lose the connectedness. That is, if I issue a
> compound query that has 3 facts answerable by a DB table, I don't want
> to generate 3 queries against that table, just one. And if the other
> needed facts are in another table in the same DB, I want to generate a
> query that joins them! As it happens, a lot of the ground work for this
> has already been done in peak.query.algebra.
>
> * Define a framework for defining concrete schema mappings, in a way
> that allows reuse. That is, I might want to define mappings from a fact
> schema to two different database schemas, but share an implementation of
> various business rules between them.
>
> Oh yeah, and then I need to actually implement all that. Does that
> maybe answer any questions you might have about whether this will be
> done soon? :)
>
> One final thought... that earlier example of the API could probably use
> a little rework. Maybe:
>
> # Query a fact type, given a context
> results = order_total(ec).find(5573, Variable())
>
> # Assert a fact
> order_total(ec).add(5573, 127)
>
> # Retract a fact
> order_total(ec).remove(5573, 127)
>
> # shortcut for repeated use w/same editing context
> order_total = order_total(ec)
>
> Here, the idea is that a fact type such as 'order_total' is a protocol
> that you can adapt the editing context to. Thus, one would create
> different EC subclasses to define a specific schema mapping. The EC
> itself doesn't actually implement anything, however. Instead, you
> simply declare adapters (probably sticky ones) for the fact types that
> can be implemented in the schema. And if a compound fact type doesn't
> have a specific adaptation, it just creates a generic compound
> implementation over the adaptation of the fact types it comprises.
>
> This still isn't perfect, in that this API doesn't yet explicitly deal
> with asynchronous results, or how to "listen" to assertions or
> retractions, or how to deal with queries when there are assertions or
> retractions that haven't been "flushed" to the physical storage scheme.
> But it's getting there. In fact, it's probably close enough to allow
> prototyping some toy schema implementations (sans metamodels) to explore
> how editing contexts and fact type implementations could/should work.
>
> _______________________________________________
> PEAK mailing list
> PEAK at eby-sarna.com
> http://www.eby-sarna.com/mailman/listinfo/peak
>
More information about the PEAK
mailing list