[PEAK] A pattern for subscribable queries in the Trellis
Phillip J. Eby
pje at telecommunity.com
Mon May 12 19:18:01 EDT 2008
So, this week I've been working a bit on the persistence/ORM stuff
for the Trellis, specifically on bidirectional links in collections.
And, in order to have set/list/dict attributes on either side of a
two-way link update correctly, it's necessary to have a "relationship
manager" or switchboard component that handles updating both sides of
the connection.
While working on the design for that component, I've noticed that
there are some similarities between what it needs to do and what the
Time service does. Specifically, both have a method that needs to be
called to inquire about a condition... and then have the calling
rule be recalculated if the condition changes. Second, that
subscription process needs to happen in such a way that it doesn't
create a dependency cycle.
Currently, if you use a time-based rule to trigger an action that
causes a new time-based rule to be set up, this can create what
appears to be a cyclical dependency. (This is because there's a rule
that depends on the "schedule" that fires callback conditions, but if
one of those called-back rules inquires about a new future time, it
will cause the schedule to be modified.)
In a similar way, a "relationship manager" needs to be able to
iterate over received events regarding a subset of collection
memberships, and then save the dependency for future use.
So, here's what I've figured out for future reference, of how to
create this sort of object, in a way that avoids circular calculations.
To create a subscribable-query object, you will need a weak-value
dictionary keying query parameters to cells, and some type of index
structure, stored in a cell. You'll then have a "maintain" rule that
updates indexed cells when conditions change, and a "modifier" that
adds cells to the weak dictionary. The query function will then
access the stored cells to perform its job.
Something roughly like this, in other words:
class QueryableThing(trellis.Component):
_index = trellis.make(SomeNonTrellisType, writable=True)
_queries = trellis.make(weakref.WeakValueDictionary)
def query(self, *args):
# This could also preprocess the args or postprocess the values
return self._query_cell(args).value
@trellis.modifier
def _query_cell(self, key):
c = self._queries.get(key)
if c is None:
c = self._queries[key] = ... #some cell
trellis.on_undo(self._queries.pop, key, None)
... # add cell to self._index, with undo info
trellis.on_commit(trellis.changed,
trellis.Cells(self)['_index'])
return c
The 'Time' service already follows this pattern, more or less, except
that '_index' is named '_schedule', and '_cache' is named '_events'.
This general pattern should also be usable for the "relationship
manager". In fact, it's probably a general enough pattern to be
worth embodying as a component or wrapper of some sort. For example,
we could perhaps have a decorator that worked something like this:
@collections.cell_cache
def _queries(self, key):
return ... # some cell
@trellis.maintain(make=SomeTypeForTheIndex)
def _index(self):
for key in self._queries.added:
# add key to index, with undo logging if index isn't
# a trellis-ized type
def query(self, *args):
# This could also preprocess the args or postprocess the values
return self._queries[args].value
@trellis.maintain
def some_rule(self):
# code that looks at _index and some other input data
# to update any cells for those keys that are still in
# self._queries
In other words, the "cell_cache" decorator would turn _queries into a
make() descriptor for a weak-value cache whose __getitem__ always
returns a cell, and whose 'new' attribute is an observable sequence
of the keys that were added to the cache during the previous recalc.
Neither of these examples handles one tricky bit, however: garbage
collection of index contents. In the Time service, this isn't
important because the constant advance of the clock allows the index
to be cleaned up automatically. However, for a relationship manager,
a cleanup-on-demand approach could lead to it being a long time
before a relevant key is encountered by chance and cleaned up.
One possible way to handle cleanup would be to customize the weak
reference callbacks used by the cache. However, since the GC calls
can occur at an arbitrary time, this could be somewhat awkward. A
more controlled approach might be to use the cells'
connect/disconnect capability -- but then there's even more overhead
added, because connect/disconnect functions can't modify trellis data
structures. They have to schedule these as future operations, using
on_commit (which has still other caveats, in that there's no direct
control flow link).
There's certainly an advantage, however, to making this functionality
part of the core library, in that I expect it to be repeated for
things like low-level select() I/O, signal handling, GUI events, and
other circumstances where you need to have some sort of match
pattern. While it's certainly *possible* to implement these directly
with connect/disconnect functions, it seems to me that the common
case is more of a cache of cells.
Unfortunately, it's becoming clear that my current method of dealing
with both connections and the Time service are a bit of a
hack. Instead of running as "perform" rules and on_commit
(respectively), it seems to me that both should be done by running in
the on_commit phase.
That is to say, connect/disconnect processing should run after a
recalc is finished, rather than during the "perform" phase. In that
way, connect/disconnect operations can be free to change other
trellis data structures, but are run at the beginning of a new
recalc, rather than in the recalc during which the subscription
status change occurred.
Whew! Okay, that makes more sense. Making this a general
characteristic of connect/disconnect simplifies the implementation of
a query cache, since one can implement the indexing as
connect/disconnect operations of the cell's connector.
I like this, because it gives connections a sensible logic in the
overall scheme of things. True, it requires changes to the undo
logic and overall transaction mechanism, but these should be
straightforward to make.
Okay, so how do we make this work with query caches? Well, what if
we had a descriptor that made its attribute a weak-value collection
of cells, but was otherwise very much like a regular trellis
attribute, in terms of the options you could specify? E.g.:
@collections.cellcache # this should also allow initially=,
resetting, etc...
def _queries(self, key):
return ... # default value for key
@_queries.connector
def _add_to_index(self, key):
# add key to index, with undo logging if index isn't trellisized
@_queries.disconnector
def _remove_from_index(self, key):
# remove key from index, with undo logging if index
isn't trellisized
def query(self, *args):
# This could also preprocess the args or postprocess the values
return self._queries[args].value
@trellis.maintain
def some_rule(self):
# code that looks at the index and some other input data
# to update any cells for those keys that are in self._queries
# and have listeners
To be a bit more specific, a cellcache's .connector, .disconnector,
and rule would be passed in the key of the corresponding cell, unlike
the case with regular connect/disconnect methods. (Which currently
get passed a cell, and possibly a memento.)
This pattern looks positively ideal for most of the use cases where
we want dynamic subscribability. In fact, it's almost better than
the ability we have right now to set up connect/disconnect for single
cells. (Which is only used to implement laziness, currently.)
For example, to use this approach to monitor wx events, one could do
something like this in a widget component base class:
def get_event(self, event_type):
return self._events[event]. value
@collections.cellcache(resetting_to=None)
def _events(self, key):
"""Cache of cells holding events of a given type"""
@_events.connector
def _bind(self, key):
receive = self._events[key].receive
self.bind(key, receive)
trellis.on_undo(self.unbind, key, receive)
@_events.disconnector
def _unbind(self, key):
receive = self._events[key].receive
self.unbind(key, receive)
trellis.on_undo(self.bind, key, receive)
Then, you could "poll" events using self.get_event(event_type), with
the necessary bind/unbind operations happening dynamically, as
needed. The method would return an event or None, so you could call
skip() or whatever on that object if needed.
(In practice, something like this probably needs to be a bit smarter
about the event keys used, because you may want to monitor events by
ID, and there is more than one kind of event that might monitor e.g.
mouse position, control key status, etc. But that's a topic for another post.)
Anyway, this rather looks like the preferred way to use connectors
and disconnectors, so I'll add this to my implementation to-do
list. Once it's done, it should then be (relatively) straightforward
to implement a "relationship manager" and bidirectional links between
collections. In fact, it won't be so much a relationship manager per
se, as a sort of value-based dynamic dispatch or "pub-sub" tool,
somewhat like a callback-free version of PyDispatcher.
Hm, maybe I should call it a PubSubHub... or just a
Hub. :) Anyway, you would do something like:
some_hub.put(SomeRelationship, 'link', ob1, ob2)
To publicize a link of type SomeRelationship being created between
ob1 and ob2, and then you'd have a rule like this to "notice" such operations:
for rel, op, ob1, ob2 in some_hub.get(None, None, None, self):
# ob1 will be an object that was linked or unlinked from me
# rel will be the relationship
# op will specify what happened
That is, you'll be able to "query" a hub to see those events that
equal your supplied values (with None being a wildcard). Your rule
will then be recalculated when matching events occur, without you
needing to explicitly register or unregister callbacks as in
PyDispatcher or Spike.
Whew! That was a long and winding road, but I believe it gets us
where we want to go next. This pub-sub-hub thing will probably also
come very much in handy later, when we're doing record-based
stuff. Probably, you won't use just one giant hub, but have more
specialized hubs so that rules can copy and transform messages
between them. (If you had a rule that read values from one hub, and
then wrote back into the *same* hub, it would create a circular
dependency between the hub's internal rules and your external ones.)
Anyway... onward and upward.
More information about the PEAK
mailing list