[PEAK] ORM and "fact base" objects for the Trellis

Wed Jul 18 01:15:47 EDT 2007

Way back in 2004, I was sketching a system for implementing 
constraint satisfaction, ORM, and event-driven programming, using 
generic functions and something I dubbed a "fact base":

http://dirtsimple.org/2004/12/fact-types-fact-sets-and-change-events.html

As it happens, this idea seems a lot more practical now, with the 
Trellis to handle event propagation and fact sets that are defined in 
terms of each other.

In fact, it's downright straightforward with the hub-and-spokes 
technology I described in my last post.  Each fact type (table or 
query) simply delegates all mutation operations to the fact base as a 
whole, and then updates itself in response to the events it receives.

Its event inputs are spokes, tied to an all-purpose update rule on 
the fact base.  So, any set that the master fact base object can 
generate keys for, will receive its events straight from the 
source.  However, a fact set can *also* have rules for its events, in 
order to derive them from other sets.  Either way works, or even both 
at once.  (If it receives data directly from the master fact base, 
its derivation rules won't run on that update cycle -- just like any 
other mutually recursive rule overridden by direct update.)

The article I linked to fills in a lot of the remaining pieces needed 
to define a kind of "all-purpose database" as a giant, non-enumerable 
set of records.  It just needs a way to generically parse records 
into keys (identifying target sets) and values (indicating the values 
to be added to or removed from the target sets), and a way to create 
an appropriate implementation set for a given key, if one isn't 
already available in cache.

The fact base cache is just a weakref dictionary pointing to sets, 
where the sets' add/delete event cells are spokes off the fact base's 
update rule.  Indexes, whether they are simple key lookups or sorted 
lists, can also be implemented this way, even in memory.  Really, any 
data structure that can be maintained by noting the creation or 
deletion of rows is practical.  And the fact base itself doesn't 
really need to know or care how those data structures work; all that 
matters is that the sets be cached by key and that the events be 
linked as spokes.

A fairly simple in-memory database could be implemented using a 
handful of set types, along with a small framework for defining 
simple tuple-structured record types, similar to the EIM (External 
Information Model) framework I designed for Chandler (and that was 
foreshadowed a few years ago in the thread of blog posts I linked above!).

With some extensions to the model, one could create tuples that 
represent queries, by putting "wildcard" or "variable" objects in the 
fields whose value is left open for the query to determine.  Such 
tuples would make fine "keys" for the fact base, such that adding and 
deleting them will update any currently-live query sets based on those keys.

The two pieces that are still a little bit vague are the mechanism 
for figuring out what keys to potentially extract (which varies based 
on the sets currently cached by the fact base), and the mechanism for 
creating new implementation sets.

In my 2004 blog posts, I outlined an idea for using generic functions 
to do this, but it was based on RuleDispatch.  I'd rather use 
PEAK-Rules now, but I may need to make some more progress on it 
first.  I originally intended to have PEAK-Rules finished this month, 
with the Trellis coming out later in the year, but now things seem to 
have reversed, with lots of progress on the Trellis and relatively 
little on PEAK-Rules so far this month.

So...  current plan looks like:

* Finish first cut doc and test re-org for the Trellis API

* Implement Hub, Spoke, and nail down some set manipulation patterns 
and perhaps set base classes

* Start hammering out a fact base implementation, and a prototype 
record schema framework.  The latter will be something of a 
throwaway, intended mainly to get the kinks worked out of the fact 
base framework, and less to be the basis of a production-quality O-R 
mapping system.

In other words, it'll be a proof of concept for how to handle set/key 
management, indexes, and queries, without the use of an actual 
backing store, and probably without general-purpose joins or a host 
of other relational operators.

In the long term, however, it'll probably grow all of those things, 
including the generator expression query syntax.  However, the 
precise definition of "long term" depends heavily on whether one of 
my clients likes the results of their prototyping.  :)