[PEAK] DM refactoring and asynchrony (was Re: PEAK and Twisted,
John D. Heintz
jheintz at pobox.com
Sun Apr 18 14:58:46 EDT 2004
Found the Microsoft research paper finally!
Via Lambda the Ultimate:
John D. Heintz wrote:
> Hi all,
> I've been meaning to send these links to the list for a while but have
> been trying to understand the content well enough to be confident in
> recommending anyone read it ;-)
> This email though has nugged me into action.
> I think that there are some underlying similarities between peak.query,
> FO and Set oriented data.
> I've been doing a lot of reading about ReiserFS and the "Future
> Vision":http://www.namesys.com/whitepaper.html which led me to a Russian
> authored book:
> The Set Model for Database and Information Systems by Mikhail M. Gilula.
> This book describes the mathematical model and a programming language
> StarSet. Not much available besides the book and it's no longer published.
> I stumbled onto a Java extension language along the same lines:
> "JavaSet - extending Java by persistent
> Also, a Microsoft language X# tries to accomplish much of the same
> thing. Arrg! I can't find the url. Dang, I'll have to send it tomorrow
> after I find the white paper I know I printed off. Well, from memory
> this language tries to smooth over the difference between row and
> hierachical data by using streams and paths as the underlying structures.
> Anyway, I thought this stuff might be of interest to the group.
> John Heintz
> Phillip J. Eby wrote:
>> At 09:46 AM 4/16/04 +0200, Ulrich Eck wrote:
>>> hmm, afaik there is no simple peak<->twisted-demo around. we should
>>> work one out i think comparable to the bulletins example. junction
>>> from peakplace also uses twisted.pb, but not yet peak.events afaik.
>> Last I heard, John was working on it. Certainly, he submitted a
>> number of bug reports that would've required him to be using
>> peak.events with Twisted in Junction in order to find the problems he
>> did. :)
>>> the reason why i didn't spend too much time on it is,
>>> that pje plans to completly refactor the DataManager API to integrate
>>> with peak.events and peak.query. a concrete timetable and design for
>>> this refactoring is not yet available i think.
>> The design is getting a bit more concrete, but not the timetable. :(
>> It may not so much be a refactoring of the current API, as addition of
>> another completely different API. But there is some theoretical work
>> to sort out first.
>> The future peak.query API will get rid of individual DM's in favor of
>> an EditingContext, similar to those in various other systems, such as
>> ZODB connection objects. The container metaphor of DM's will fall by
>> the wayside here, in favor of a more integrated approach.
>> The part that is not yet clear to me is how OO, and how async the EC
>> interface will be. I want EC's to support *conceptual* queries, and
>> be able to yield asynchronous events representing changes to objects.
>> Mostly what I'm doing now on the design is trying to get a metamodel
>> in my head that unites "Fact Orientation" with "Object Orientation".
>> Fact orientation has no impedance mismatch between relational
>> databases, business or AI rules, or much of anything else. But, it's
>> quite a mismatch with the current DM/domain object paradigm. Indeed,
>> it's as inside-out as Twisted versus peak.events.
>> In a fact orientation approach, one deals with "elementary facts" that
>> relate 1 or more objects in some way. For example, "Order #5573
>> totals $127" is a "binary fact" relating the order object referenced
>> "#5573" and the dollar amount object referenced by "$127". Most facts
>> are binary (i.e. involve two objects), but unaries and ternaries are
>> also possible. Quaternaries and higher-order facts are uncommon, when
>> dealing with elementary (non-decomposable) facts.
>> (By the way, 'x[y]==z' can be considered a ternary fact, which may
>> later play into the implementation of mapping types in peak.model.)
>> Superficially, this fact orientation looks a lot like normal OO.
>> After all, one might say 'orders.total==127'. Except there's no
>> way to go from the '127' back to '.total' let alone 'orders'.
>> In OO, one can meaningfully look at the objects by themselves. In FO,
>> objects don't really exist: only facts regarding objects are real.
>> In a sense, you could say that FO doesn't believe in objects, only
>> labels. It assumes that the "objects" we are dealing with are in the
>> "real world", so what's in the computer is just a collection of facts
>> *about* the objects. So, from the FO point of view, it's silly to
>> talk about having an instance of "order", because the "order" isn't in
>> the computer, and in fact may not even be a real thing at all!
>> FO *does* believe in object types, however, including inheritance and
>> polymorphism. But one uses a "reference scheme" (such as order
>> numbers) to refer to an instance of an object. For all practical
>> purposes, reference schemes are either strings, numbers, or tuples
>> thereof. Very elementary data, in other words.
>> The tremendous advantage of this approach is that it has no impedance
>> mismatch with databases, rule systems, or form-based UIs, and only a
>> low mismatch with direct-manipulation UIs. By contrast, OO-paradigm
>> storage systems like peak.storage and ZODB have a high impedance
>> mismatch with rule systems and form-based UIs, a medium mismatch with
>> relational DBs, and a low mismatch with direct-manipulation UIs.
>> Designing good constraint validation and business rules frameworks for
>> an OO model is *hard*. Until I discovered FO (by way of ORM -- Object
>> Role Modelling), I assumed that it was because the problems themselves
>> were hard. But from an FO perspective, it's actually pretty easy.
>> From the OO viewpoint, plain attributes and sequences and mappings are
>> all very different ways of storing/using/managing an object's
>> features. From the FO viewpoint, these are all just facts of
>> different arity and types.
>> This unifying characteristic makes the FO viewpoint a perfect
>> "intermediate representation" or "lingua franca" between different
>> levels of a system. In fact, because of the ease of expressing
>> complex constraints in FO systems, and because FO models and
>> constraints can be specified or rendered as user-friendly English (or
>> other language) expressions, I believe it is in fact preferable to
>> such modelling languages as UML for gathering end-user requirements
>> and defining the domain model for an application.
>> The hard part of the design, that I'm chewing on in my spare time, is
>> how to create a Pythonic FO notation, that won't end up looking like
>> Prolog or Lisp! It would seem that you could just generate current
>> peak.model-style classes from an FO model. However, there's a
>> potential pitfall there. Minor changes to a conceptual model (such as
>> changing a link from "one" to "many") result in more significant
>> changes to a peak.model-type system. In other words, one would go
>> from having existing objects with 'x.y=z' to 'x.y=[z]'. Code that was
>> written based on one access method would have to change to use the other.
>> If instead one creates form tools that are based on an FO perspective,
>> however, this issue vanishes. One isn't doing 'x.y' in the first
>> place, but rather searching for 'x has y of ?', and expecting a
>> collection of matching facts. A little further thought on this for
>> such things as business rules, DB access, and so on, quickly reveals
>> that the actual use cases for an "object oriented" model for business
>> objects are becoming rather scarce.
>> Indeed, the more I think about it the more I realize that trying to
>> implement a lot of typical business functions by direct mapping to the
>> OO paradigm actually *adds* complexity and effort. Business in the
>> real world actually does consist mostly of data and rules. Objects in
>> the real world are not at all encapsulated and they certainly do not
>> enforce their own business rules!
>> So, the fundamental paradigm for peak.query applications will be to
>> have an EditingContext against which queries are made, and that facts
>> are added to or removed from. EditingContexts will also publish
>> validation facts. That is, one source of facts from an editing
>> context will be facts about what fact changes you've made are
>> violating business rules or other validation. But it's likely that
>> these "constraint metafacts" (to coin a phrase) will not be anything
>> "special" in the system; it's just that business rules will be able to
>> be sources of facts. Indeed, anything that you would compute in an
>> object's methods could be moved to a business rule implementing a fact
>> type. For that matter, it becomes silly to even talk about business
>> rules: there are really only fact types, and a fact type can be
>> implemented by any computation you wish.
>> This is both more and less granular than a DM. When you have related
>> objects in two DMs, right now you must write code in both to save and
>> load the linking attribute(s). The same thing in an FO approach would
>> correspond to a single fact type, pairing the two objects. So, there
>> would be only *one* implementation of that fact type, handling both
>> sides of the relationship.
>> As a practical matter, however, you won't write code to create fact
>> types for every possible attribute of every object type! Instead, for
>> most applications you'll define relatively trivial mappings from fact
>> types to 2 or more columns from a relational DB, using relational
>> algebra operators similar to the prototypes that are already in
>> peak.query today.
>> But, unlike the peak.query of today, fact retrieval will be able to be
>> asynchronous. That is, you'll be able to "subscribe" to a query, or
>> yield a task's execution until a new fact is asserted. Even if your
>> application isn't doing event-driven I/O or using a reactor loop, you
>> could use these subscriptions to e.g. automatically raise an error
>> when a constraint is violated. (In practice, the EditingContext will
>> do this when you ask that it commit your changes to its parent
>> EditingContext, if any.) If you're writing a non-web GUI, you'd more
>> likely subscribe to such events in order to display status bar text or
>> highlight an input error.
>> Writing this email has actually helped me sort out quite a few of the
>> implementation details I had been wondering about, such as how
>> business rules and constraints should work. Now I see that they do
>> not require anything "special", as they are just the same as any other
>> kind of fact. They only appear to be "meta", but in truth they are
>> constructed by the same fundamental joining of facts as any other
>> compound fact! Very nice.
>> For a given fact type, one needs to know only how to assert it,
>> retract it, and query it. And the EditingContext needs to have a
>> source for facts that would prevent a transaction from committing. A
>> possible API for EditingContext might simply be to provide three
>> methods, each accepting a single fact object as its parameter. For
>> queries, one would fill in unknown roles with "variables", ala
>> Prolog. So, something like:
>> amt = Variable()
>> results = ec.find( Order.total(5573, amt) )
>> would make 'results' an event source that yields the fact
>> In my imagination I can see Ty going "Eeew!" in disgust at this
>> hideous-seeming API. Making queries for every attribute of an
>> object? Am I out of my mind?
>> No. Here I'm merely illustrating an elementary fact type. A compound
>> fact type might encompass quite a few items of data. In fact, in can
>> encompass as many facts as you want. It solves the OO "reporting
>> problem", in fact, because one can create queries of arbitrary
>> complexity without introducing duplicated implementation for the more
>> elementary components of those queries. So, an application can define
>> its own compound fact types, and an EC will implement the operations
>> by delegation to the elementary fact types if there isn't a custom
>> implementation defined. Such custom implementations can be useful for
>> providing specially hand-tuned SQL or other algorithms, if needed for
>> some critical aspect of an application.
>> Hm. It's almost possible to go full circle to an OO approach, at
>> least for "flat" attributes like strings and numbers. We could simply
>> make facts returned by queries *mutable*. Setting an attribute on the
>> fact could automatically retract the previous version of the fact and
>> assert the new one. We could possibly also provide some kind of
>> convenience feature to navigate to related objects, instead of just
>> seeing their keys, but this would have to be well thought-out in order
>> to avoid the traps of synchrony and container mismatch. That is, it
>> would have to be capable of being asynchronous, and it would have to
>> always provide a sequence, rather than a single object, even if the
>> role in principle is required to always be exactly one object.
>> (Because between transaction boundaries, invalid conditions are
>> allowed to exist, to avoid chicken-and-egg update problems.)
>> Speaking of the chicken-and-egg problem, mapping to an RDBMS could be
>> rather interesting. The model as I've described it so far has no
>> built-in way to ensure that inserts and updates get done in a way that
>> honors RDBMS-level constraints. Hm. Actually, the mapping from fact
>> types to tables actually has to incorporate information about joins,
>> so the information to do it is there. The details might get a bit
>> "interesting", but it should be possible to implement it one time,
>> rather than sweating the details in every DM with foreign keys, as one
>> must do now. (Not that it's hard to do in today's DMs, but it's still
>> boilerplate or dead-chicken-waving or whatever you want to call it.)
>> Hm. This is sounding simpler than I thought it would be, at least in
>> principle. But I still need to:
>> * Work out the metamodel for fact types, how to map from constraints
>> to derived fact types that signal the violation of those constraints,
>> and a Pythonic notation for defining a fact-oriented domain model.
>> (For example, to define a constraint on a fact type or set of fact
>> types, one must have a way to reference those types in code.)
>> * Define a mechanism for implementations of fact types, that allows
>> for the possibility of segmenting compound fact implementations (e.g.
>> database tables) as well as compounding elementary fact
>> implementations, in a way that doesn't lose the connectedness. That
>> is, if I issue a compound query that has 3 facts answerable by a DB
>> table, I don't want to generate 3 queries against that table, just
>> one. And if the other needed facts are in another table in the same
>> DB, I want to generate a query that joins them! As it happens, a lot
>> of the ground work for this has already been done in peak.query.algebra.
>> * Define a framework for defining concrete schema mappings, in a way
>> that allows reuse. That is, I might want to define mappings from a
>> fact schema to two different database schemas, but share an
>> implementation of various business rules between them.
>> Oh yeah, and then I need to actually implement all that. Does that
>> maybe answer any questions you might have about whether this will be
>> done soon? :)
>> One final thought... that earlier example of the API could probably
>> use a little rework. Maybe:
>> # Query a fact type, given a context
>> results = order_total(ec).find(5573, Variable())
>> # Assert a fact
>> order_total(ec).add(5573, 127)
>> # Retract a fact
>> order_total(ec).remove(5573, 127)
>> # shortcut for repeated use w/same editing context
>> order_total = order_total(ec)
>> Here, the idea is that a fact type such as 'order_total' is a protocol
>> that you can adapt the editing context to. Thus, one would create
>> different EC subclasses to define a specific schema mapping. The EC
>> itself doesn't actually implement anything, however. Instead, you
>> simply declare adapters (probably sticky ones) for the fact types that
>> can be implemented in the schema. And if a compound fact type doesn't
>> have a specific adaptation, it just creates a generic compound
>> implementation over the adaptation of the fact types it comprises.
>> This still isn't perfect, in that this API doesn't yet explicitly deal
>> with asynchronous results, or how to "listen" to assertions or
>> retractions, or how to deal with queries when there are assertions or
>> retractions that haven't been "flushed" to the physical storage
>> scheme. But it's getting there. In fact, it's probably close enough
>> to allow prototyping some toy schema implementations (sans metamodels)
>> to explore how editing contexts and fact type implementations
>> could/should work.
>> PEAK mailing list
>> PEAK at eby-sarna.com
More information about the PEAK