[PEAK] A pattern for subscribable queries in the Trellis

Mon May 12 19:18:01 EDT 2008

So, this week I've been working a bit on the persistence/ORM stuff 
for the Trellis, specifically on bidirectional links in collections.

And, in order to have set/list/dict attributes on either side of a 
two-way link update correctly, it's necessary to have a "relationship 
manager" or switchboard component that handles updating both sides of 
the connection.

While working on the design for that component, I've noticed that 
there are some similarities between what it needs to do and what the 
Time service does.  Specifically, both have a method that needs to be 
called to inquire about a condition...  and then have the calling 
rule be recalculated if the condition changes.  Second, that 
subscription process needs to happen in such a way that it doesn't 
create a dependency cycle.

Currently, if you use a time-based rule to trigger an action that 
causes a new time-based rule to be set up, this can create what 
appears to be a cyclical dependency.  (This is because there's a rule 
that depends on the "schedule" that fires callback conditions, but if 
one of those called-back rules inquires about a new future time, it 
will cause the schedule to be modified.)

In a similar way, a "relationship manager" needs to be able to 
iterate over received events regarding a subset of collection 
memberships, and then save the dependency for future use.

So, here's what I've figured out for future reference, of how to 
create this sort of object, in a way that avoids circular calculations.

To create a subscribable-query object, you will need a weak-value 
dictionary keying query parameters to cells, and some type of index 
structure, stored in a cell.  You'll then have a "maintain" rule that 
updates indexed cells when conditions change, and a "modifier" that 
adds cells to the weak dictionary.  The query function will then 
access the stored cells to perform its job.

Something roughly like this, in other words:

     class QueryableThing(trellis.Component):
         _index   = trellis.make(SomeNonTrellisType, writable=True)
         _queries = trellis.make(weakref.WeakValueDictionary)

         def query(self, *args):
             # This could also preprocess the args or postprocess the values
             return self._query_cell(args).value

         @trellis.modifier
         def _query_cell(self, key):
             c = self._queries.get(key)
             if c is None:
                 c = self._queries[key] = ... #some cell
                 trellis.on_undo(self._queries.pop, key, None)
                 ... # add cell to self._index, with undo info
                 trellis.on_commit(trellis.changed, 
trellis.Cells(self)['_index'])
             return c

The 'Time' service already follows this pattern, more or less, except 
that '_index' is named '_schedule', and '_cache' is named '_events'.

This general pattern should also be usable for the "relationship 
manager".  In fact, it's probably a general enough pattern to be 
worth embodying as a component or wrapper of some sort.  For example, 
we could perhaps have a decorator that worked something like this:

         @collections.cell_cache
         def _queries(self, key):
             return ...  # some cell

         @trellis.maintain(make=SomeTypeForTheIndex)
         def _index(self):
             for key in self._queries.added:
                 # add key to index, with undo logging if index isn't
                 # a trellis-ized type

         def query(self, *args):
             # This could also preprocess the args or postprocess the values
             return self._queries[args].value

         @trellis.maintain
         def some_rule(self):
             # code that looks at _index and some other input data
             # to update any cells for those keys that are still in
             # self._queries

In other words, the "cell_cache" decorator would turn _queries into a 
make() descriptor for a weak-value cache whose __getitem__ always 
returns a cell, and whose 'new' attribute is an observable sequence 
of the keys that were added to the cache during the previous recalc.

Neither of these examples handles one tricky bit, however: garbage 
collection of index contents.  In the Time service, this isn't 
important because the constant advance of the clock allows the index 
to be cleaned up automatically.  However, for a relationship manager, 
a cleanup-on-demand approach could lead to it being a long time 
before a relevant key is encountered by chance and cleaned up.

One possible way to handle cleanup would be to customize the weak 
reference callbacks used by the cache.  However, since the GC calls 
can occur at an arbitrary time, this could be somewhat awkward.  A 
more controlled approach might be to use the cells' 
connect/disconnect capability -- but then there's even more overhead 
added, because connect/disconnect functions can't modify trellis data 
structures.  They have to schedule these as future operations, using 
on_commit (which has still other caveats, in that there's no direct 
control flow link).

There's certainly an advantage, however, to making this functionality 
part of the core library, in that I expect it to be repeated for 
things like low-level select() I/O, signal handling, GUI events, and 
other circumstances where you need to have some sort of match 
pattern.  While it's certainly *possible* to implement these directly 
with connect/disconnect functions, it seems to me that the common 
case is more of a cache of cells.

Unfortunately, it's becoming clear that my current method of dealing 
with both connections and the Time service are a bit of a 
hack.  Instead of running as "perform" rules and on_commit 
(respectively), it seems to me that both should be done by running in 
the on_commit phase.

That is to say, connect/disconnect processing should run after a 
recalc is finished, rather than during the "perform" phase.  In that 
way, connect/disconnect operations can be free to change other 
trellis data structures, but are run at the beginning of a new 
recalc, rather than in the recalc during which the subscription 
status change occurred.

Whew!  Okay, that makes more sense.  Making this a general 
characteristic of connect/disconnect simplifies the implementation of 
a query cache, since one can implement the indexing as 
connect/disconnect operations of the cell's connector.

I like this, because it gives connections a sensible logic in the 
overall scheme of things.  True, it requires changes to the undo 
logic and overall transaction mechanism, but these should be 
straightforward to make.

Okay, so how do we make this work with query caches?  Well, what if 
we had a descriptor that made its attribute a weak-value collection 
of cells, but was otherwise very much like a regular trellis 
attribute, in terms of the options you could specify?  E.g.:

         @collections.cellcache  # this should also allow initially=, 
resetting, etc...
         def _queries(self, key):
             return ...  # default value for key

         @_queries.connector
         def _add_to_index(self, key):
             # add key to index, with undo logging if index isn't trellisized

         @_queries.disconnector
         def _remove_from_index(self, key):
             # remove key from index, with undo logging if index 
isn't trellisized

         def query(self, *args):
             # This could also preprocess the args or postprocess the values
             return self._queries[args].value

         @trellis.maintain
         def some_rule(self):
             # code that looks at the index and some other input data
             # to update any cells for those keys that are in self._queries
             # and have listeners

To be a bit more specific, a cellcache's .connector, .disconnector, 
and rule would be passed in the key of the corresponding cell, unlike 
the case with regular connect/disconnect methods.  (Which currently 
get passed a cell, and possibly a memento.)

This pattern looks positively ideal for most of the use cases where 
we want dynamic subscribability.  In fact, it's almost better than 
the ability we have right now to set up connect/disconnect for single 
cells.  (Which is only used to implement laziness, currently.)

For example, to use this approach to monitor wx events, one could do 
something like this in a widget component base class:

         def get_event(self, event_type):
             return self._events[event]. value

         @collections.cellcache(resetting_to=None)
         def _events(self, key):
             """Cache of cells holding events of a given type"""

         @_events.connector
         def _bind(self, key):
             receive = self._events[key].receive
             self.bind(key, receive)
             trellis.on_undo(self.unbind, key, receive)

         @_events.disconnector
         def _unbind(self, key):
             receive = self._events[key].receive
             self.unbind(key, receive)
             trellis.on_undo(self.bind, key, receive)

Then, you could "poll" events using self.get_event(event_type), with 
the necessary bind/unbind operations happening dynamically, as 
needed.  The method would return an event or None, so you could call 
skip() or whatever on that object if needed.

(In practice, something like this probably needs to be a bit smarter 
about the event keys used, because you may want to monitor events by 
ID, and there is more than one kind of event that might monitor e.g. 
mouse position, control key status, etc.  But that's a topic for another post.)

Anyway, this rather looks like the preferred way to use connectors 
and disconnectors, so I'll add this to my implementation to-do 
list.  Once it's done, it should then be (relatively) straightforward 
to implement a "relationship manager" and bidirectional links between 
collections.  In fact, it won't be so much a relationship manager per 
se, as a sort of value-based dynamic dispatch or "pub-sub" tool, 
somewhat like a callback-free version of PyDispatcher.

Hm, maybe I should call it a PubSubHub...  or just a 
Hub.  :)  Anyway, you would do something like:

     some_hub.put(SomeRelationship, 'link', ob1, ob2)

To publicize a link of type SomeRelationship being created between 
ob1 and ob2, and then you'd have a rule like this to "notice" such operations:

     for rel, op, ob1, ob2 in some_hub.get(None, None, None, self):
         # ob1 will be an object that was linked or unlinked from me
         # rel will be the relationship
         # op will specify what happened

That is, you'll be able to "query" a hub to see those events that 
equal your supplied values (with None being a wildcard).  Your rule 
will then be recalculated when matching events occur, without you 
needing to explicitly register or unregister callbacks as in 
PyDispatcher or Spike.

Whew!  That was a long and winding road, but I believe it gets us 
where we want to go next.  This pub-sub-hub thing will probably also 
come very much in handy later, when we're doing record-based 
stuff.  Probably, you won't use just one giant hub, but have more 
specialized hubs so that rules can copy and transform messages 
between them.  (If you had a rule that read values from one hub, and 
then wrote back into the *same* hub, it would create a circular 
dependency between the hub's internal rules and your external ones.)

Anyway...  onward and upward.