[PEAK] More hub-and-spoke-database-ry :)
Phillip J. Eby
pje at telecommunity.com
Wed Jul 25 13:18:45 EDT 2007
I've been thinking a bit more about how a factbase system would work
with the trellis, and I think I've got it narrowed down to something
interesting.
So, your fact base itself is going to look something like this:
class FactBase(trellis.Component):
cache = trellis.rule(lambda self: WeakValueDictionary())
def __getitem__(self, key):
try:
return self.cache[key]
except KeyError:
data = self.cache[key] = self.makeSet(key)
return data
Where 'self.makeSet()' is either hardcoded or an open-ended generic function.
So far, this looks a lot like Calendar or Month from my previous
example. But let's go all the way and say you can add or delete
arbitrary objects from this sucker:
def add(self, item):
self.log = ('add', item)
def remove(self, item):
self.log = ('del', item)
I'll assume here that 'log' is a Hub attribute that keeps track of
actions, such that 'self.added' and 'self.removed' are event cells
containing sets of records that have been added or removed as of any
given point in time.
The code to do that isn't trivial to sketch right now, since Hubs
don't exist yet. However, once that code exists, it'll probably be
part of some sort of "trellis.Set" base class that FactBase can
simply inherit from. That is, FactBase will be a mutable set that
also acts like a dictionary of sets.
Okay, so suppose we now want to know about all the records of a
particular type that have been added to our fact base, such that we
get updates on added or deleted records of that type.
Let's say that we want to just say "myFactBase[SomeRecordType]" to
get back a set of all records of that type. So we implement
'makeSet(key)' for keys that are record types, such that it returns a
set. To do this, we need an object I'll call a "router":
class RecordTypeRouter(trellis.Component):
trellis.values(
factbase = None,
)
cache = trellis.rule(lambda self: WeakValueDictionary())
@trellis.rule
def update(self):
cache = self.cache
added = {}
deleted = {}
for record in self.factbase.deleted:
deleted.setdefault(type(record), set()).add(record)
for record in self.factbase.added :
added.setdefault(type(record), set()).add(record)
for k, v in deleted.iteritems():
if k in cache:
cache[k].deleted = v
for k, v in added.iteritems():
if k in cache:
cache[k].added = v
def __getitem__(self, key):
try:
return self.cache[key]
except KeyError:
update = self.__cells__['update']
data = self.cache[key] = VirtualSet(
added = update.spoke((), event=True),
deleted = update.spoke((), event=True),
)
return data
So what this does is that it distributes change events, keyed on
record type. Adding records of a type nobody cares about, doesn't
update any sets. Only sets you're "looking at" (and thus are cached)
get add/delete events. You could probably generalize this a bit
more, and make a FilterRouter that uses a function and a FilterSet
that uses the function plus a value, e.g.:
class FilteringRouter(trellis.Component):
trellis.values(
parent = None,
keyfunc = type,
)
trellis.cell_factories(
added = lambda self: parent.__cells__['added'],
deleted = lambda self: parent.__cells__['deleted'],
)
cache = trellis.rule(lambda self: WeakValueDictionary())
@trellis.rule
def update(self):
cache = self.cache
key_of = self.keyfunc
added = {}
deleted = {}
for record in self.deleted:
deleted.setdefault(key_of(record), set()).add(record)
for record in self.added :
added.setdefault(key_of(record), set()).add(record)
for k, v in deleted.iteritems():
if k in cache:
cache[k].deleted = v
for k, v in added.iteritems():
if k in cache:
cache[k].added = v
def __getitem__(self, key):
try:
return self.cache[key]
except KeyError:
update = self.__cells__['update']
data = self.cache[key] = FilteredSet(
added = update.spoke((), event=True),
deleted = update.spoke((), event=True),
keyfunc = self.keyfunc, key = key,
parent = self.parent
)
return data
Now, FilteredSet can in principle implement __iter__ and __contains__
in terms of its parent set (plus keyfunc(record)==key tests), and you
can implement arbitrary hierarchies of "addressable sets" in a
factbase, e.g. factbase[RecordType, keyfield, keyval] to retrieve a
set of records by the contents of a particular field.
I don't really like __getitem__ having to know what type of set you
want, though. Plus, this mechanism of using __getitem__ and caches
in the first place is ugly, because it forces the thing you're
accessing to know how to create the thing you want. That seems like
a dependency inversion.
It seems like what we really want is to be able to use the
ObjectRoles package to define various kinds of routers and sets as
*aspects* of a factbase. Except that we only want them to stay
around if they're in use, so we need a kind of "weak role". This
would be straightforward to add to the ObjectRoles package, so that
we could then do something like:
class FilteringRouter(roles.WeakRole, trellis.Component):
def __init__(self, parent, keyfunc, **kw):
trellis.Component.__init__(parent=parent, keyfunc=keyfunc, **kw)
# ...all the previous router code
Ugh. It still needs a cache, because the cell needs to know which
spokes to send the data to. At least now we can attach
FilteringRouter instances to the FilteredSet instances, e.g.:
FilteringRouter(FilteringRouter(factbase, type)[someType], get_key_attr)
Except maybe we should just call these things "splitters", i.e.:
records = Splitter(Splitter(factbase, type)[someType], get_key_attr)[key]
This makes it a little clearer what's going on. It also shows why I
wanted to use a factbase cache, so you could use a key to go straight
to the desired set, without needing to use hairy access paths like
this one every time you need the set. It certainly seems reasonable
to make splitters be WeakRoles, though, since that lets you attach as
many of them as you like to a set.
At least the only thing that needs any special handling for
__getitem__ is the factbase's "makeSet" method. That way, a key like
'someRecordType.someAttr' can be mapped to
"Splitter(Splitter(factbase, type)[someRecordType],
someAttr.__get__)", which means that
'factbase[someRecordType.someAttr][attrVal]' would return a set of
the records with that attribute value, with automatic updating.
Of course, it's possible that the set for a particular record type
isn't actually going to be produced via a type-based splitter; it may
instead be a database-backed set of some type. So, there will
probably need to be splitters beyond just the simple filter-based
type I've outlined here.
It's also possible that making the entire factbase have
add()/remove() methods and added/removed events is a bad idea, versus
separating things by record type from the very beginning. If a
particular set of record types are all mapped to a single DB backend,
that backend can do something like that, in order to log changes that
need to be flushed to SQL or disk or whatever. But it's not
necessary to do that across the entire factbase.
Hm. Actually, I suppose there *is* a benefit, which is that you can
also implement global undo by logging all changes. Okay, I guess I'm sold.
Iteration and containment tests for sets are a bit vague at this
point, but if splitters didn't directly create a FilteredSet, and
instead asked the parent set to create a filtered set, then this
would allow DB-backed sets to provide sets that do queries (or apply
filters) when you try to iterate or membership-test.
Hm. This is starting to sound almost workable. :) Queries with
live updates, roll-forward and undo logging, multiple backends (or no
backends!), with support for arbitrary Python transform and filter
operations. Nice.
Still a lot of bits to sort out, since this is all purely the "flat
records" level, rather than the ORM level. Same general principles
apply, except that it's virtually impossible to do any of this fancy
stuff on the object level. You basically want to convert to objects
only *after* all of the actual query work is done, because then you
don't have to deal with the two-levels-of-updating problem that
object-based (as opposed to tuple-based) sets have.
So, in the long run, this will need a way to specify tuple-level
queries using object-level abstractions. Oh, and joins. We'll
definitely need joins. They shouldn't be difficult to specify,
however, using appropriate Splitters for their bases.
At that point, we will have what is effectively a Rete network
implementation -- i.e., a forward-chaining rules engine, with
database backing. Not bad at all.
More information about the PEAK
mailing list