The PEAK Developers' Center   TrellisCollections UserPreferences
 
HelpContents Search Diffs Info Edit Subscribe XML Print View
You are not allowed to edit any pages.

Clear message


Event-Driven Collections with the Trellis

The peak.events.collections module includes some additional data structures for use with the Trellis:

>>> from peak.events import trellis, collections

Table of Contents

SortedSet

trellis.List objects have some inherent inefficiencies due to the wide variety of operations supported by Python lists. While trellis.Set and trellis.Dict objects update themselves in place by applying change logs, trellis.List has to use a copy-on-write strategy to manage updates, because there isn't any simple way to reduce operations like sort(), reverse(), remove(), etc. to a meaningful change log. (That's why it only provides a simple changed flag.)

So if you need to use large lists in an application, you may be better off using a different data structure design. That way, if you only need a subset of the list interface, you can implement a changelog-based structure. For example, the Trellis package includes a SortedSet type that maintains an index of items sorted by keys, with a cell that lists changed regions.

A collections.SortedSet is a specialized component that lets you wrap a trellis.Set (or anything similar to one, that offers iteration plus added and removed cells) with a sort key, to get a list:

>>> myItems = trellis.Set([3,2,1])
>>> myIndex = collections.SortedSet(data=myItems)

>>> class Viewer(trellis.Component):
...     model = trellis.attr(None)
...
...     trellis.perform()
...     def view_it(self):
...         if self.model is not None:
...             print self.model

>>> view = Viewer(model=myIndex)
[1, 2, 3]

You can change the sort_key attribute to re-sort the items:

>>> myIndex.sort_key = lambda x: -x
[3, 2, 1]

SortedSet objects also have a discrete rule attribute called changes. It's normally empty, but during the recalculation triggered by a change to the underlying set, it will temporarily contain one or more (start, end, size) tuples, representing the effective changes to the old list. But as with all discrete or receiver attributes, you'll never see a value in it from non-rule code:

>>> myIndex.changes
[]

Only in rule code will you ever see it containing values, a moment before it becomes empty again:

>>> view.model = None   # quiet, please

>>> class Watcher(trellis.Component):
...     trellis.perform()
...     def dump(self):
...         print myIndex.changes

>>> watcher=Watcher()
[]

>>> myItems.remove(2)
[(1, 2, 0)]
[]
>>> myIndex
[3, 1]

The changed values describe the changes made to the index contents. In this case, (1, 2, 0) means that the list slice 1:2 was reduced to length zero. Compare with the effect of re-inserting 2:

>>> myItems.add(2)
[(1, 1, 1)]
[]
>>> myIndex
[3, 2, 1]

This value means the slice 1:1 was increased to length 1. In a sense, you can view the provided changes as being "set slice" operators. Note, too, that it's possible to have multiple slices if multiple items are added or deleted:

>>> def add_some():
...     myItems.add(0)
...     myItems.add(4)

>>> trellis.atomically(add_some)
[(3, 3, 1), (0, 0, 1)]
[]
>>> myIndex
[4, 3, 2, 1, 0]

>>> add_some()

As you can see, 1 item was inserted at position 3 in the old list, followed by 1 item being inserted at position 0. In other words, if you loop over the changes attribute and apply the slice operations in that order, you can successfully update another sequence to match the changed sequence.

Finally, note that adjacent operations may be merged into single slices:

>>> def del_some():
...     myItems.remove(2)
...     myItems.remove(3)

>>> trellis.atomically(del_some)
[(1, 3, 0)]
[]
>>> myIndex
[4, 1, 0]

And that insertion+deletion at the same position can lead to change slices that don't result in a net change to the number of rows:

>>> def add_and_del():
...     myItems.remove(1)
...     myItems.add(2)

>>> trellis.atomically(add_and_del)
[(1, 2, 1)]
[]
>>> myIndex
[4, 2, 0]

>>> def add_and_del():
...     myItems.remove(2)
...     myItems.add(1)

>>> trellis.atomically(add_and_del)
[(1, 2, 1)]
[]
>>> myIndex
[4, 1, 0]

And changing the sort key always results in a change slice for the entire index:

>>> myIndex.sort_key = lambda x:x
[(0, 3, 3)]
[]
>>> myIndex
[0, 1, 4]

As does setting or clearing the reverse flag (which reverses the effect of the sort key):

>>> myIndex.reverse = True
[(0, 3, 3)]
[]
>>> myIndex
[4, 1, 0]

>>> myItems.add(7)
[(0, 0, 1)]
[]
>>> myIndex
[7, 4, 1, 0]

>>> myItems.add(2)
[(2, 2, 1)]
[]
>>> myIndex
[7, 4, 2, 1, 0]

>>> myItems.remove(7)
[(0, 1, 0)]
[]

>>> myItems.remove(2)
[(1, 2, 0)]
[]

>>> myIndex
[4, 1, 0]

>>> myIndex.reverse = False
[(0, 3, 3)]
[]

>>> myIndex
[0, 1, 4]

TODO: test changes to .data

SubSet

The SubSet type lets you create a Set that is constrained to be a subset of another set. If items are removed from the base set, the same items are removed from the SubSet:

>>> s1 = trellis.Set([1,2,3])
>>> ss = collections.SubSet([2], base = s1)
>>> ss
SubSet([2])

>>> s1.remove(2)
>>> ss
SubSet([])

Adding an item to the SubSet when the item is not in the base set has no effect:

>>> ss.add(4)
>>> ss
SubSet([])

But you can add items that are currently in the base set:

>>> ss.add(3)
>>> ss
SubSet([3])

>>> s1.remove(3)
>>> ss
SubSet([])

TODO: changes to base do not affect the current subset membership

Observing

The Observing type is designed to assist in sending change notifications to GUI components that need to be told when a portion of the display is out of date. It works by taking a set of keys and a lookup function, and providing a changes attribute. Usually the lookup function will be an accessor on some trellis-ized data structure:

>>> data = trellis.List([1,2,3,4,5])
>>> o = collections.Observing(lookup_func = data.__getitem__)

>>> def show():
...     print "Changes:", o.changes
>>> c = trellis.Performer(show)
Changes: {}

The changes attribute is a dictionary (a regular dictionary, not a Dict) that maps the observing instance's keys to (new, old) value tuples. Whenever a change occurs to either the keys or their corresponding values, it becomes non-empty, and then reverts to being empty:

>>> o.keys.update([1,2,4])
Changes: {1: (2, 2), 2: (3, 3), 4: (5, 5)}
Changes: {}

In the above example, the "new" and "old" values are the same, because the keys shown were all just added to the keys set. However, if a value changes for a key that's already present in keys, the "new" and "old" values will be different:

>>> data[1] = -2
Changes: {1: (-2, 2)}
Changes: {}

>>> o.keys.remove(2)    # removing key doesn't induce "changes"

>>> o.keys.add(3)       # but adding does
Changes: {3: (4, 4)}
Changes: {}

>>> data.reverse()      # and so does any change on the underlying data
Changes: {1: (4, -2), 3: (-2, 4), 4: (1, 5)}
Changes: {}

The basic idea for using an Observing object is to update the keys to reflect the currently-visible row or column numbers (or other keys) of a data structure being displayed in a GUI. Then, a @perform rule that reads the changes can notify the GUI of any changes that happen to the underlying data structure.

You can, of course, do this without an Observing instance, but recalculations would be prohibitively expensive if the underlying data set is large, compared to the portion being displayed. (E.g., when scrolling through an entire database.)

So GUI applications that display grids or tables will typically use a SortedSet to find out about rows or columns being inserted, deleted, or replaced, and an Observing instance to find out about changes within the rows or columns.

TODO: changes doesn't update if you initialize keys with a populated set.

Hub (NEW in 0.7a2)

A collections.Hub is used for loosely-coupled many-to-many communications with flexible pattern matching -- aka "publish/subscribe" or "pub/sub" messaging:

>>> hub = collections.Hub()

You can send messages into a hub by calling its put() method:

>>> hub.put(1, 2, 3)

However, this does nothing unless there are rules using the hub's get() method to receive these messages:

>>> def watch_3_3():
...     for message in hub.get(None, None, 3):
...         print message

>>> watch_3_3 = trellis.Performer(watch_3_3)

>>> hub.put(1, 2, 3)
(1, 2, 3)

The put() and get() methods both accept an arbitrary number of positional arguments, but get() will only match put() calls with the same number of arguments:

>>> hub.put('x', 'y')

>>> hub.put(1, 2, 3, 4)

And then, only if the non-None arguments to get() match the corresponding arguments given to put:

>>> hub.put(1, 2, 4)

>>> hub.put(5, 4, 3)
(5, 4, 3)

You can of course have multiple rules monitoring the same hub:

>>> def watch_2_4():
...     for message in hub.get(2, 4, None):
...         print "24:", message
>>> watch_2_4 = trellis.Performer(watch_2_4)

>>> hub.put(2,4,3)
24: (2, 4, 3)
(2, 4, 3)

>>> hub.put(2, 4, 4)
24: (2, 4, 4)

And you can send more than one value in a single recalculation or atomic action, with the relative order of messages being preserved for each observer:

>>> def send_many():
...     hub.put(1, 2, 3)
...     hub.put(2, 4, 4)
...     hub.put(2, 4, 3)

>>> trellis.atomically(send_many)
24: (2, 4, 4)
24: (2, 4, 3)
(1, 2, 3)
(2, 4, 3)

Note, however, that all arguments to put() and get() must be hashable:

>>> hub.put(1, [])
Traceback (most recent call last):
  ...
TypeError: list objects are unhashable

>>> hub.get(1, [])
Traceback (most recent call last):
  ...
TypeError: list objects are unhashable

This is because hubs use a dictionary-based indexing system, that avoids the need to test every message against every observer's match pattern. Each active get() pattern is saved under an index, keyed by its rightmost non-None value.

Each value in a message is then looked up in this index, and then tested against that (hopefully small) subset of active patterns. For example, if we look at the contents of our sample hub's index, we can see that the (None, None, 3) match pattern is indexed under "position 2, value 3", and the (2, 4, None) pattern is indexed under "position 1, value 4":

>>> hub._index
{(2, 3): {(None, None, 3): 1}, (1, 4): {(2, 4, None): 1}}

This means that (2, 4, None) will only be checked for messages with a 4 in the second position, and (None, None, 3) will only be checked for messages with a 3 in the third position (which of course it will always match).

So, for best performance in high-volume applications, make sure you design your messages to place "more distinct" fields further to the right. For example, if you have a small number of distinct message types, you should probably make the message type the first field, so that if a get() matches on both the message type and some more-distinctive field, it will be indexed only on the more-distinctive field, avoiding it being matched against every message of the desired type. (Unless of course, the get() is supposed to return all messages of the desired type!)

In contrast, if you placed the message type as the last field, then any get() targeting a particular message type would incur a match-time penalty for every message of that type. Thus, you should place fields with fewer possible values more to the left, and fields with a larger number of possible values more to the right.


PythonPowered
EditText of this page (last modified 2008-05-23 12:49:19)
FindPage by browsing, title search , text search or an index
Or try one of these actions: AttachFile, DeletePage, LikePages, LocalSiteMap, SpellCheck