[PEAK] A pattern for subscribable queries in the Trellis
Sergey Schetinin
maluke at gmail.com
Tue May 13 04:00:35 EDT 2008
>From what I understood this seems like a very versatile approach,
great. I guess we need to see some initial implementation to start
having questions about it, but if I can be of any help implementing,
especially the wx binding, I'd gladly participate.
On Tue, May 13, 2008 at 2:18 AM, Phillip J. Eby <pje at telecommunity.com> wrote:
> So, this week I've been working a bit on the persistence/ORM stuff for the
> Trellis, specifically on bidirectional links in collections.
>
> And, in order to have set/list/dict attributes on either side of a two-way
> link update correctly, it's necessary to have a "relationship manager" or
> switchboard component that handles updating both sides of the connection.
>
> While working on the design for that component, I've noticed that there are
> some similarities between what it needs to do and what the Time service
> does. Specifically, both have a method that needs to be called to inquire
> about a condition... and then have the calling rule be recalculated if the
> condition changes. Second, that subscription process needs to happen in
> such a way that it doesn't create a dependency cycle.
>
> Currently, if you use a time-based rule to trigger an action that causes a
> new time-based rule to be set up, this can create what appears to be a
> cyclical dependency. (This is because there's a rule that depends on the
> "schedule" that fires callback conditions, but if one of those called-back
> rules inquires about a new future time, it will cause the schedule to be
> modified.)
>
> In a similar way, a "relationship manager" needs to be able to iterate over
> received events regarding a subset of collection memberships, and then save
> the dependency for future use.
>
> So, here's what I've figured out for future reference, of how to create
> this sort of object, in a way that avoids circular calculations.
>
> To create a subscribable-query object, you will need a weak-value
> dictionary keying query parameters to cells, and some type of index
> structure, stored in a cell. You'll then have a "maintain" rule that
> updates indexed cells when conditions change, and a "modifier" that adds
> cells to the weak dictionary. The query function will then access the
> stored cells to perform its job.
>
> Something roughly like this, in other words:
>
> class QueryableThing(trellis.Component):
> _index = trellis.make(SomeNonTrellisType, writable=True)
> _queries = trellis.make(weakref.WeakValueDictionary)
>
> def query(self, *args):
> # This could also preprocess the args or postprocess the values
> return self._query_cell(args).value
>
> @trellis.modifier
> def _query_cell(self, key):
> c = self._queries.get(key)
> if c is None:
> c = self._queries[key] = ... #some cell
> trellis.on_undo(self._queries.pop, key, None)
> ... # add cell to self._index, with undo info
> trellis.on_commit(trellis.changed,
> trellis.Cells(self)['_index'])
> return c
>
> The 'Time' service already follows this pattern, more or less, except that
> '_index' is named '_schedule', and '_cache' is named '_events'.
>
> This general pattern should also be usable for the "relationship manager".
> In fact, it's probably a general enough pattern to be worth embodying as a
> component or wrapper of some sort. For example, we could perhaps have a
> decorator that worked something like this:
>
> @collections.cell_cache
> def _queries(self, key):
> return ... # some cell
>
> @trellis.maintain(make=SomeTypeForTheIndex)
> def _index(self):
> for key in self._queries.added:
> # add key to index, with undo logging if index isn't
> # a trellis-ized type
>
> def query(self, *args):
> # This could also preprocess the args or postprocess the values
> return self._queries[args].value
>
> @trellis.maintain
> def some_rule(self):
> # code that looks at _index and some other input data
> # to update any cells for those keys that are still in
> # self._queries
>
> In other words, the "cell_cache" decorator would turn _queries into a
> make() descriptor for a weak-value cache whose __getitem__ always returns a
> cell, and whose 'new' attribute is an observable sequence of the keys that
> were added to the cache during the previous recalc.
>
> Neither of these examples handles one tricky bit, however: garbage
> collection of index contents. In the Time service, this isn't important
> because the constant advance of the clock allows the index to be cleaned up
> automatically. However, for a relationship manager, a cleanup-on-demand
> approach could lead to it being a long time before a relevant key is
> encountered by chance and cleaned up.
>
> One possible way to handle cleanup would be to customize the weak reference
> callbacks used by the cache. However, since the GC calls can occur at an
> arbitrary time, this could be somewhat awkward. A more controlled approach
> might be to use the cells' connect/disconnect capability -- but then there's
> even more overhead added, because connect/disconnect functions can't modify
> trellis data structures. They have to schedule these as future operations,
> using on_commit (which has still other caveats, in that there's no direct
> control flow link).
>
> There's certainly an advantage, however, to making this functionality part
> of the core library, in that I expect it to be repeated for things like
> low-level select() I/O, signal handling, GUI events, and other circumstances
> where you need to have some sort of match pattern. While it's certainly
> *possible* to implement these directly with connect/disconnect functions, it
> seems to me that the common case is more of a cache of cells.
>
> Unfortunately, it's becoming clear that my current method of dealing with
> both connections and the Time service are a bit of a hack. Instead of
> running as "perform" rules and on_commit (respectively), it seems to me that
> both should be done by running in the on_commit phase.
>
> That is to say, connect/disconnect processing should run after a recalc is
> finished, rather than during the "perform" phase. In that way,
> connect/disconnect operations can be free to change other trellis data
> structures, but are run at the beginning of a new recalc, rather than in the
> recalc during which the subscription status change occurred.
>
> Whew! Okay, that makes more sense. Making this a general characteristic
> of connect/disconnect simplifies the implementation of a query cache, since
> one can implement the indexing as connect/disconnect operations of the
> cell's connector.
>
> I like this, because it gives connections a sensible logic in the overall
> scheme of things. True, it requires changes to the undo logic and overall
> transaction mechanism, but these should be straightforward to make.
>
> Okay, so how do we make this work with query caches? Well, what if we had
> a descriptor that made its attribute a weak-value collection of cells, but
> was otherwise very much like a regular trellis attribute, in terms of the
> options you could specify? E.g.:
>
> @collections.cellcache # this should also allow initially=,
> resetting, etc...
> def _queries(self, key):
> return ... # default value for key
>
> @_queries.connector
> def _add_to_index(self, key):
> # add key to index, with undo logging if index isn't trellisized
>
> @_queries.disconnector
> def _remove_from_index(self, key):
> # remove key from index, with undo logging if index isn't
> trellisized
>
> def query(self, *args):
> # This could also preprocess the args or postprocess the values
> return self._queries[args].value
>
> @trellis.maintain
> def some_rule(self):
> # code that looks at the index and some other input data
> # to update any cells for those keys that are in self._queries
> # and have listeners
>
> To be a bit more specific, a cellcache's .connector, .disconnector, and
> rule would be passed in the key of the corresponding cell, unlike the case
> with regular connect/disconnect methods. (Which currently get passed a
> cell, and possibly a memento.)
>
> This pattern looks positively ideal for most of the use cases where we want
> dynamic subscribability. In fact, it's almost better than the ability we
> have right now to set up connect/disconnect for single cells. (Which is
> only used to implement laziness, currently.)
>
> For example, to use this approach to monitor wx events, one could do
> something like this in a widget component base class:
>
> def get_event(self, event_type):
> return self._events[event]. value
>
> @collections.cellcache(resetting_to=None)
> def _events(self, key):
> """Cache of cells holding events of a given type"""
>
> @_events.connector
> def _bind(self, key):
> receive = self._events[key].receive
> self.bind(key, receive)
> trellis.on_undo(self.unbind, key, receive)
>
> @_events.disconnector
> def _unbind(self, key):
> receive = self._events[key].receive
> self.unbind(key, receive)
> trellis.on_undo(self.bind, key, receive)
>
> Then, you could "poll" events using self.get_event(event_type), with the
> necessary bind/unbind operations happening dynamically, as needed. The
> method would return an event or None, so you could call skip() or whatever
> on that object if needed.
>
> (In practice, something like this probably needs to be a bit smarter about
> the event keys used, because you may want to monitor events by ID, and there
> is more than one kind of event that might monitor e.g. mouse position,
> control key status, etc. But that's a topic for another post.)
>
> Anyway, this rather looks like the preferred way to use connectors and
> disconnectors, so I'll add this to my implementation to-do list. Once it's
> done, it should then be (relatively) straightforward to implement a
> "relationship manager" and bidirectional links between collections. In
> fact, it won't be so much a relationship manager per se, as a sort of
> value-based dynamic dispatch or "pub-sub" tool, somewhat like a
> callback-free version of PyDispatcher.
>
> Hm, maybe I should call it a PubSubHub... or just a Hub. :) Anyway, you
> would do something like:
>
> some_hub.put(SomeRelationship, 'link', ob1, ob2)
>
> To publicize a link of type SomeRelationship being created between ob1 and
> ob2, and then you'd have a rule like this to "notice" such operations:
>
> for rel, op, ob1, ob2 in some_hub.get(None, None, None, self):
> # ob1 will be an object that was linked or unlinked from me
> # rel will be the relationship
> # op will specify what happened
>
> That is, you'll be able to "query" a hub to see those events that equal
> your supplied values (with None being a wildcard). Your rule will then be
> recalculated when matching events occur, without you needing to explicitly
> register or unregister callbacks as in PyDispatcher or Spike.
>
> Whew! That was a long and winding road, but I believe it gets us where we
> want to go next. This pub-sub-hub thing will probably also come very much
> in handy later, when we're doing record-based stuff. Probably, you won't
> use just one giant hub, but have more specialized hubs so that rules can
> copy and transform messages between them. (If you had a rule that read
> values from one hub, and then wrote back into the *same* hub, it would
> create a circular dependency between the hub's internal rules and your
> external ones.)
>
> Anyway... onward and upward.
>
> _______________________________________________
> PEAK mailing list
> PEAK at eby-sarna.com
> http://www.eby-sarna.com/mailman/listinfo/peak
>
--
Best Regards,
Sergey Schetinin
http://s3bk.com/ -- S3 Backup
http://word-to-html.com/ -- Word to HTML Converter
More information about the PEAK
mailing list