[PEAK] DM refactoring and asynchrony (was Re: PEAK and Twisted, revisited)

Phillip J. Eby pje at telecommunity.com
Fri Apr 16 21:01:44 EDT 2004


At 09:46 AM 4/16/04 +0200, Ulrich Eck wrote:

>hmm, afaik there is no simple peak<->twisted-demo around. we should
>work one out i think comparable to the bulletins example. junction
>from peakplace also uses twisted.pb, but not yet peak.events afaik.

Last I heard, John was working on it.  Certainly, he submitted a number of 
bug reports that would've required him to be using peak.events with Twisted 
in Junction in order to find the problems he did.  :)


>the reason why i didn't spend too much time on it is,
>that pje plans to completly refactor the DataManager API to integrate
>with peak.events and peak.query. a concrete timetable and design for
>this refactoring is not yet available i think.

The design is getting a bit more concrete, but not the timetable.  :(  It 
may not so much be a refactoring of the current API, as addition of another 
completely different API.  But there is some theoretical work to sort out 
first.

The future peak.query API will get rid of individual DM's in favor of an 
EditingContext, similar to those in various other systems, such as ZODB 
connection objects.  The container metaphor of DM's will fall by the 
wayside here, in favor of a more integrated approach.

The part that is not yet clear to me is how OO, and how async the EC 
interface will be.  I want EC's to support *conceptual* queries, and be 
able to yield asynchronous events representing changes to objects.

Mostly what I'm doing now on the design is trying to get a metamodel in my 
head that unites "Fact Orientation" with "Object Orientation".  Fact 
orientation has no impedance mismatch between relational databases, 
business or AI rules, or much of anything else.  But, it's quite a mismatch 
with the current DM/domain object paradigm.  Indeed, it's as inside-out as 
Twisted versus peak.events.

In a fact orientation approach, one deals with "elementary facts" that 
relate 1 or more objects in some way.  For example, "Order #5573 totals 
$127" is a "binary fact" relating the order object referenced "#5573" and 
the dollar amount object referenced by "$127".  Most facts are binary (i.e. 
involve two objects), but unaries and ternaries are also 
possible.  Quaternaries and higher-order facts are uncommon, when dealing 
with elementary (non-decomposable) facts.

(By the way, 'x[y]==z' can be considered a ternary fact, which may later 
play into the implementation of mapping types in peak.model.)

Superficially, this fact orientation looks a lot like normal OO.  After 
all, one might say 'orders[5573].total==127'.  Except there's no way to go 
from the '127' back to '.total' let alone 'orders[5573]'.  In OO, one can 
meaningfully look at the objects by themselves.  In FO, objects don't 
really exist: only facts regarding objects are real.

In a sense, you could say that FO doesn't believe in objects, only 
labels.  It assumes that the "objects" we are dealing with are in the "real 
world", so what's in the computer is just a collection of facts *about* the 
objects.  So, from the FO point of view, it's silly to talk about having an 
instance of "order", because the "order" isn't in the computer, and in fact 
may not even be a real thing at all!

FO *does* believe in object types, however, including inheritance and 
polymorphism.  But one uses a "reference scheme" (such as order numbers) to 
refer to an instance of an object.  For all practical purposes, reference 
schemes are either strings, numbers, or tuples thereof.  Very elementary 
data, in other words.

The tremendous advantage of this approach is that it has no impedance 
mismatch with databases, rule systems, or form-based UIs, and only a low 
mismatch with direct-manipulation UIs.  By contrast, OO-paradigm storage 
systems like peak.storage and ZODB have a high impedance mismatch with rule 
systems and form-based UIs, a medium mismatch with relational DBs, and a 
low mismatch with direct-manipulation UIs.

Designing good constraint validation and business rules frameworks for an 
OO model is *hard*.  Until I discovered FO (by way of ORM -- Object Role 
Modelling), I assumed that it was because the problems themselves were 
hard.  But from an FO perspective, it's actually pretty easy.  From the OO 
viewpoint, plain attributes and sequences and mappings are all very 
different ways of storing/using/managing an object's features.  From the FO 
viewpoint, these are all just facts of different arity and types.

This unifying characteristic makes the FO viewpoint a perfect "intermediate 
representation" or "lingua franca" between different levels of a 
system.  In fact, because of the ease of expressing complex constraints in 
FO systems, and because FO models and constraints can be specified or 
rendered as user-friendly English (or other language) expressions, I 
believe it is in fact preferable to such modelling languages as UML for 
gathering end-user requirements and defining the domain model for an 
application.

The hard part of the design, that I'm chewing on in my spare time, is how 
to create a Pythonic FO notation, that won't end up looking like Prolog or 
Lisp!  It would seem that you could just generate current peak.model-style 
classes from an FO model.  However, there's a potential pitfall 
there.  Minor changes to a conceptual model (such as changing a link from 
"one" to "many") result in more significant changes to a peak.model-type 
system.  In other words, one would go from having existing objects with 
'x.y=z' to 'x.y=[z]'.  Code that was written based on one access method 
would have to change to use the other.

If instead one creates form tools that are based on an FO perspective, 
however, this issue vanishes.  One isn't doing 'x.y' in the first place, 
but rather searching for 'x has y of ?', and expecting a collection of 
matching facts.  A little further thought on this for such things as 
business rules, DB access, and so on, quickly reveals that the actual use 
cases for an "object oriented" model for business objects are becoming 
rather scarce.

Indeed, the more I think about it the more I realize that trying to 
implement a lot of typical business functions by direct mapping to the OO 
paradigm actually *adds* complexity and effort.  Business in the real world 
actually does consist mostly of data and rules.  Objects in the real world 
are not at all encapsulated and they certainly do not enforce their own 
business rules!

So, the fundamental paradigm for peak.query applications will be to have an 
EditingContext against which queries are made, and that facts are added to 
or removed from.  EditingContexts will also publish validation facts.  That 
is, one source of facts from an editing context will be facts about what 
fact changes you've made are violating business rules or other 
validation.  But it's likely that these "constraint metafacts" (to coin a 
phrase) will not be anything "special" in the system; it's just that 
business rules will be able to be sources of facts.  Indeed, anything that 
you would compute in an object's methods could be moved to a business rule 
implementing a fact type.  For that matter, it becomes silly to even talk 
about business rules: there are really only fact types, and a fact type can 
be implemented by any computation you wish.

This is both more and less granular than a DM.  When you have related 
objects in two DMs, right now you must write code in both to save and load 
the linking attribute(s).  The same thing in an FO approach would 
correspond to a single fact type, pairing the two objects.  So, there would 
be only *one* implementation of that fact type, handling both sides of the 
relationship.

As a practical matter, however, you won't write code to create fact types 
for every possible attribute of every object type!  Instead, for most 
applications you'll define relatively trivial mappings from fact types to 2 
or more columns from a relational DB, using relational algebra operators 
similar to the prototypes that are already in peak.query today.

But, unlike the peak.query of today, fact retrieval will be able to be 
asynchronous.  That is, you'll be able to "subscribe" to a query, or yield 
a task's execution until a new fact is asserted.  Even if your application 
isn't doing event-driven I/O or using a reactor loop, you could use these 
subscriptions to e.g. automatically raise an error when a constraint is 
violated.  (In practice, the EditingContext will do this when you ask that 
it commit your changes to its parent EditingContext, if any.)  If you're 
writing a non-web GUI, you'd more likely subscribe to such events in order 
to display status bar text or highlight an input error.

Writing this email has actually helped me sort out quite a few of the 
implementation details I had been wondering about, such as how business 
rules and constraints should work.  Now I see that they do not require 
anything "special", as they are just the same as any other kind of 
fact.  They only appear to be "meta", but in truth they are constructed by 
the same fundamental joining of facts as any other compound fact!  Very nice.

For a given fact type, one needs to know only how to assert it, retract it, 
and query it.  And the EditingContext needs to have a source for facts that 
would prevent a transaction from committing.  A possible API for 
EditingContext might simply be to provide three methods, each accepting a 
single fact object as its parameter.  For queries, one would fill in 
unknown roles with "variables", ala Prolog.  So, something like:

amt = Variable()
results = ec.find( Order.total(5573, amt) )

would make 'results' an event source that yields the fact 
'Order.total(5573,127)'.

In my imagination I can see Ty going "Eeew!" in disgust at this 
hideous-seeming API.  Making queries for every attribute of an object?  Am 
I out of my mind?

No.  Here I'm merely illustrating an elementary fact type.  A compound fact 
type might encompass quite a few items of data.  In fact, in can encompass 
as many facts as you want.  It solves the OO "reporting problem", in fact, 
because one can create queries of arbitrary complexity without introducing 
duplicated implementation for the more elementary components of those 
queries.  So, an application can define its own compound fact types, and an 
EC will implement the operations by delegation to the elementary fact types 
if there isn't a custom implementation defined.  Such custom 
implementations can be useful for providing specially hand-tuned SQL or 
other algorithms, if needed for some critical aspect of an application.

Hm.  It's almost possible to go full circle to an OO approach, at least for 
"flat" attributes like strings and numbers.  We could simply make facts 
returned by queries *mutable*.  Setting an attribute on the fact could 
automatically retract the previous version of the fact and assert the new 
one.  We could possibly also provide some kind of convenience feature to 
navigate to related objects, instead of just seeing their keys, but this 
would have to be well thought-out in order to avoid the traps of synchrony 
and container mismatch.  That is, it would have to be capable of being 
asynchronous, and it would have to always provide a sequence, rather than a 
single object, even if the role in principle is required to always be 
exactly one object.  (Because between transaction boundaries, invalid 
conditions are allowed to exist, to avoid chicken-and-egg update problems.)

Speaking of the chicken-and-egg problem, mapping to an RDBMS could be 
rather interesting.  The model as I've described it so far has no built-in 
way to ensure that inserts and updates get done in a way that honors 
RDBMS-level constraints.  Hm.  Actually, the mapping from fact types to 
tables actually has to incorporate information about joins, so the 
information to do it is there.  The details might get a bit "interesting", 
but it should be possible to implement it one time, rather than sweating 
the details in every DM with foreign keys, as one must do now.  (Not that 
it's hard to do in today's DMs, but it's still boilerplate or 
dead-chicken-waving or whatever you want to call it.)

Hm.  This is sounding simpler than I thought it would be, at least in 
principle.  But I still need to:

* Work out the metamodel for fact types, how to map from constraints to 
derived fact types that signal the violation of those constraints, and a 
Pythonic notation for defining a fact-oriented domain model.  (For example, 
to define a constraint on a fact type or set of fact types, one must have a 
way to reference those types in code.)

* Define a mechanism for implementations of fact types, that allows for the 
possibility of segmenting compound fact implementations (e.g. database 
tables) as well as compounding elementary fact implementations, in a way 
that doesn't lose the connectedness.  That is, if I issue a compound query 
that has 3 facts answerable by a DB table, I don't want to generate 3 
queries against that table, just one.  And if the other needed facts are in 
another table in the same DB, I want to generate a query that joins 
them!  As it happens, a lot of the ground work for this has already been 
done in peak.query.algebra.

* Define a framework for defining concrete schema mappings, in a way that 
allows reuse.  That is, I might want to define mappings from a fact schema 
to two different database schemas, but share an implementation of various 
business rules between them.

Oh yeah, and then I need to actually implement all that.  Does that maybe 
answer any questions you might have about whether this will be done soon?  :)

One final thought...  that earlier example of the API could probably use a 
little rework.  Maybe:

     # Query a fact type, given a context
     results = order_total(ec).find(5573, Variable())

     # Assert a fact
     order_total(ec).add(5573, 127)

     # Retract a fact
     order_total(ec).remove(5573, 127)

     # shortcut for repeated use w/same editing context
     order_total = order_total(ec)

Here, the idea is that a fact type such as 'order_total' is a protocol that 
you can adapt the editing context to.  Thus, one would create different EC 
subclasses to define a specific schema mapping.  The EC itself doesn't 
actually implement anything, however.  Instead, you simply declare adapters 
(probably sticky ones) for the fact types that can be implemented in the 
schema.  And if a compound fact type doesn't have a specific adaptation, it 
just creates a generic compound implementation over the adaptation of the 
fact types it comprises.

This still isn't perfect, in that this API doesn't yet explicitly deal with 
asynchronous results, or how to "listen" to assertions or retractions, or 
how to deal with queries when there are assertions or retractions that 
haven't been "flushed" to the physical storage scheme.  But it's getting 
there.  In fact, it's probably close enough to allow prototyping some toy 
schema implementations (sans metamodels) to explore how editing contexts 
and fact type implementations could/should work.




More information about the PEAK mailing list