[PEAK] Callback-free publish/subscribe now in Trellis SVN

Thu May 15 19:09:54 EDT 2008

I've just checked in to the Trellis SVN, a new trellis data type, 
``collections.Hub``, that lets you broadcast and receive messages 
without explicit subscriptions or callbacks.  You can create a rule 
that reads messages from a hub using its ``get()`` method, and you 
can write messages to the hub via its ``put()`` method.  Any rule 
that depends on a ``get()`` call will be recalculated after any 
matching ``put()`` calls occur.  Below is an excerpt from the 
documentation describing the mechanism of operation.  Enjoy, and 
please let me know if you have any questions or problems!

Hub
---

A ``collections.Hub`` is used for loosely-coupled many-to-many 
communications with flexible pattern matching -- aka 
"publish/subscribe" or "pub/sub" messaging::

     >>> hub = collections.Hub()

You can send messages into a hub by calling its ``put()`` method::

     >>> hub.put(1, 2, 3)

However, this does nothing unless there are rules using the hub's 
``get()`` method to receive these messages::

     >>> @trellis.Performer
     ... def watch_3_3():
     ...     for message in hub.get(None, None, 3):
     ...         print message

     >>> hub.put(1, 2, 3)
     (1, 2, 3)

The ``put()`` and ``get()`` methods both accept an arbitrary number 
of positional arguments, but ``get()`` will only match ``put()`` 
calls with the same number of arguments::

     >>> hub.put('x', 'y')

     >>> hub.put(1, 2, 3, 4)

And then, only if the non-``None`` arguments to ``get()`` match the 
corresponding arguments given to ``put``::

     >>> hub.put(1, 2, 4)

     >>> hub.put(5, 4, 3)
     (5, 4, 3)

You can of course have multiple rules monitoring the same hub::

     >>> @trellis.Performer
     ... def watch_2_4():
     ...     for message in hub.get(2, 4, None):
     ...         print "24:", message

     >>> hub.put(2,4,3)
     24: (2, 4, 3)
     (2, 4, 3)

     >>> hub.put(2, 4, 4)
     24: (2, 4, 4)

And you can send more than one value in a single recalculation or 
atomic action, with the relative order of messages being preserved 
for each observer::

     >>> def send_many():
     ...     hub.put(1, 2, 3)
     ...     hub.put(2, 4, 4)
     ...     hub.put(2, 4, 3)

     >>> trellis.atomically(send_many)
     24: (2, 4, 4)
     24: (2, 4, 3)
     (1, 2, 3)
     (2, 4, 3)

Note, however, that all arguments to ``put()`` and ``get()`` must be hashable::

     >>> hub.put(1, [])
     Traceback (most recent call last):
       ...
     TypeError: list objects are unhashable

     >>> hub.get(1, [])
     Traceback (most recent call last):
       ...
     TypeError: list objects are unhashable

This is because hubs use a dictionary-based indexing system, that 
avoids the need to test every message against every observer's match 
pattern.  Each active ``get()`` pattern is saved under an index, 
keyed by its rightmost non-``None`` value.

Each value in a message is then looked up in this index, and then 
tested against that (hopefully small) subset of active patterns.  For 
example, if we look at the contents of our sample hub's index, we can 
see that the ``(None, None, 3)`` match pattern is indexed under 
"position 2, value 3", and the ``(2, 4, None)`` pattern is indexed 
under "position 1, value 4"::

     >>> hub._index
     {(2, 3): {(None, None, 3): 1}, (1, 4): {(2, 4, None): 1}}

This means that ``(2, 4, None)`` will only be checked for messages 
with a 4 in the second position, and ``(None, None, 3)`` will only be 
checked for messages with a 3 in the third position (which of course 
it will always match).

So, for best performance in high-volume applications, make sure you 
design your messages to place "more distinct" fields further to the 
right.  For example, if you have a small number of distinct message 
types, you should probably make the message type the first field, so 
that if a ``get()`` matches on both the message type and some 
more-distinctive field, it will be indexed only on the 
more-distinctive field, avoiding it being matched against every 
message of the desired type.  (Unless of course, the ``get()`` is 
*supposed* to return all messages of the desired type!)

In contrast, if you placed the message type as the last field, then 
any ``get()`` targeting a particular message type would incur a 
match-time penalty for *every* message of that type.  Thus, you 
should place fields with fewer possible values more to the left, and 
fields with a larger number of possible values more to the right.