[PEAK] Trellis API Sketches, part 2
Phillip J. Eby
pje at telecommunity.com
Fri Jul 6 22:45:00 EDT 2007
The examples we've looked at so far only use simple rule and value
cells, and we've avoided the use of mutable collections as monitoring targets.
The process monitor example used tuples so that value comparisons
between the old and new values of a set of processes would be able to
detect changes. For the process monitor application, this approach
is sufficient because of the relatively small number of items being
handled. For large and frequently-changing collections, however,
this will not suffice.
For example, suppose we wanted to maintain a to-do list of new
processes to be launched. We could implement a FIFO queue like this::
class FIFO(trellis.Component):
trellis.rules(
data = lambda self: []
)
changed = trellis.event()
def __len__(self):
self.changed # make callers depend on any changes
return len(self.data)
def __iter__(self):
while self:
yield self.pop()
def append(self, item):
self.changed = True
self.data.append(item)
def pop(self):
self.changed # make callers depend on any changes
self.changed = True
return self.data.pop(0)
Most of this should be obvious, except for one new feature: the
'changed' trellis.event(). Events are similar to other attributes,
*except* that they revert back to a default value (usually None) at
the end of each global update, and this reversion does not cause any
dependent rules to be updated.
So in this example, any rules that examine the contents of the FIFO
(whether via .pop() or len() or bool()) will be recalculated after
any items are added or removed. And immediately afterward, 'changed'
will revert to None, until the next time a change occurs.
Now, as for the lines that just say "self.changed" without any other
action, they are there simply to ensure that any rule using those
methods will be marked as depending on whether the FIFO's contents
have changed.
This is a fairly primitive data structure as these things go - about
the simplest possible useful mutable data structure you can
make. Obviously, you could extend this idea to creating a set or
sequence with all the trimmings, such that any change would set the
changed flag, and any reading of the contents would cause the rule
using them to be refreshed whenever the contents changed.
You could also get more specific, and log the actual changes
occurring, rather than just flag a specific change. One way to do
this, of course, is to use a FIFO to record the changes! For example...
class FancyFIFO(FIFO):
trellis.rules(
_added = lambda self: FIFO(),
_removed = lambda self: FIFO()
)
trellis.events(
added = lambda self: list(self._added),
removed = lambda self: list(self._removed),
)
def append(self, item):
FIFO.append(self, item)
self._added.append(item)
def pop(self):
item = FIFO.pop(self)
self._removed.append(item)
return item
Here, the 'added' and 'removed' attributes are always either
collections of the items added or removed in the last trellis pulse,
or None if the FancyFIFO was not changed.
More complex collections might use more sophisticated ways of
providing suitable change events for the application's needs. Also,
ideally such collections should not actually *expose* requested
changes to any code that might be directly reading the collection's
contents, since this could lead to unintentional ordering
dependencies between rules.
That is, if one rule makes a change to a collection, and another rule
reads its value, then it makes a difference what order these two
rules are calculated in. This can be prevented by the simple
expedient of storing the changer of the collection in an event attribute.
Attempting to set a controlled attribute to two different values
during a given pulse is automatically an error, so it's almost like
having a "lock". (Except that instead of a race or deadlock, you get
an exception, which then tells you where your code is flawed!)
The trellis library will probably provide some basic collection types
that implement appropriate "concurrency" controls.
In general, however, most programs will have few data structures
where such things are needed. If a data structure is only changed by
one rule, or only read by one rule, no additional protection is
required. And generally speaking, it's best for data structures to
be written to by only one rule.
In cases where multiple rules need to write something, the changes
can be serialized using a FIFO -- which is why I sketched the first
FIFO example in the first place.
Specifically, I discovered I needed FIFOs when I tried sketching this::
class Time(context.Service, trellis.Component):
trellis.rules(
now = lambda self: EventLoop.pulse and Timestamp(time()),
_added = lambda self: trellis.FIFO()
_schedule = lambda self: [NOT_YET],
_events = lambda self: weakref.WeakValueDictionary(),
)
@trellis.rule
def next_event_time(self):
while self._added:
heapq.heappush(self._schedule, self.added.pop())
while self.now >= self._schedule[0]:
key = heapq.heappop(self._schedule)
if key in self._events:
self._events.pop(key).value = True
return self._schedule[0]
def after(self, when):
if when not in self._events:
if when <= time():
return True
self._added.append(when)
self._events[when] = trellis.Cell(value=False)
return self._events[when].value
def delay(self, secs=0):
return self.after(self.now + secs)
This is a rough draft of how trellis's "Time" service might look. As
you can see, it doesn't even need any particularly fancy features;
the only ones we haven't seen yet are EventLoop.pulse and the
Timestamp and Cell constructors.
None of these features, however, are anything particularly special or
internal to the trellis system. If this were the supplied Time
service, and you didn't like it for some reason, you could trivially
write your own and replace the standard one. That will come in handy
for running tests using simulated time, or providing events based on
Twisted's scheduler instead!
However, it's illustrative of how open the framework is that not only
can you replace this, or have as many other similar services running
*simultaneously* as you like, but the implementation is just so darn simple.
But on with the explanation of it. The 'after()' method returns a
true or false value, telling you if a certain time has passed, and
makes any rule that calls this method dependent on an event linked to
that future time, causing recalculation when that future time arrives.
The 'next_event_time' attribute will always equal the time of the
next possible refresh of a rule, or NOT_YET if there are no remaining
future events. (NOT_YET is a timestamp that's greater than every
other possible timestamp, so it's never taken off the _schedule heap.)
Oh, and the EventLoop.pulse thing is effectively a way of asking for
your rule to be polled constantly. It's a cell that always contains
the current pulse count; it gets incremented whenever something is
changed imperatively (i.e. by assignment to a cell, as opposed to a
rule returning a new value). So, whenever "something happens",
Time.now and Time.next_event_time will be updated.
Building up from here, the trellis will also have service objects for
Signals and IO. These, along with the EventLoop and Time services,
will be replaced by Twisted-specific versions when using Twisted as
the main event loop.
However, unlike Twisted's singleton reactor, it will be possible to
have different threads running different event loops, if you need to,
or nested event loops, or darn near any other weirdness you'd like to
set up, so long as only one event loop is actually based on
Twisted. (In that respect, Trellis will be pretty much the same as
peak.events and peak.running.)
So far, I haven't found any application areas in my sketches where
I'd want to use peak.events-style microthreads. I suspect that this
is because those are most useful for stateful sequential processes,
like say, a web spider making outgoing HTTP requests. It's very
easy, however, to create a microthread wrapper that runs on the
trellis framework::
_sentinel = object()
class Microthread(trellis.Component):
def __init__(self, geniter, **kw):
self.stack = [geniter]
self.input = None
super(Microthread, self).__init__(**kw)
input = trellis.event(value=_sentinel)
@trellis.rule
def step(self):
if not self.stack:
return # we're done
if self.input is not _sentinel:
# don't resume unless there's input
return self.process(self.stack[-1].send(self.input))
return self.step
def process(self, retval):
if instance(retval, GeneratorType):
self.stack.append(retval)
self.input = None
elif isinstance(retval, Deferred):
@retval.addcallback
def on_fire(val):
self.input = val
elif isinstance(retval, Until):
waiting_past = Time.now
def wait_for(self):
if retval.active:
if waiting_past(retval.timeout):
self.input = False
retval.active = False
elif retval.condition():
self.input = True
retval.active = False
return Cell(rule=wait_for, init=True)
else:
self.stack.pop()
if self.stack:
self.input = retval
else:
return retval
``process()`` here could (and should!) be a generic function, but I
wrote it this way for illustration purposes, showing what we might do
with a generator ("call" it by putting it on the stack) or a Twisted
Deferred (register callbacks to feed the input back into the yield).
If it's a special "Until" object wrapping a condition function, say:
yield Until(lambda: something and something_else or foobar)
Then process() returns a new cell that will restart the task at the
right point in time.
Finally, if it's any non-special value, the stack gets popped, and
the value is fed back into self.input so the "calling" generator will
be resumed.
Actually, I'm leaving out a ton of other stuff, like error
handling. As written here, errors will blow the whole thing
apart. However, the basic structure here is *really* simple -- a
good bit simpler than the same thing in peak.events-land, where
there's a lot more explicit callback management. An error handling
version would need another event attribute to hold incoming errors,
and the 'step' rule would have to check whether 'error' was set and
use .throw() instead of .send(), and there needs to be an exception
handler that sets self.error to the exception data, etc.
Of course, as with the other mechanisms we've discussed, this is all
built on basic Trellis features, which means you're free to invent
alternative implementations of the concept and run them side-by-side
in the same system, simultaneously.
Whew. I think that this second set of sketches now shows that my
current API concepts will work at the services layer as well as at
the application layer, without needing to puncture any basic abstractions.
In particular, none of the code so far has had to mess with *any*
private parts of cells or of the event loop system itself -- which is
a *major* victory for the Trellis abstraction. It's very
"non-leaky", and so far all the examples have been very easy for me
to reason about and draft.
Of course, without actually *running* any of them, I can't truly
verify my intuitions as yet. For that, I'll have to actually
*implement* the Trellis, and in particular my new version of the
update algorithm that's designed to support the circularity and
orthogonality of features required by the API sketches so far.
But the algorithm details will have to wait for another post; I've
been at the computer almost continuously for the last 14 hours with
only one 2-hour break, and I definitely need to get away from this
for awhile. The algorithm itself is rather complex, with lots of
hairy edge bits and corner cases. My current pseudocode draft
includes a lot of statements like this:
if the cell is not a constant (and the reader isn't me):
make the reader a (weak) listener of this cell
make this cell a (strong) dependency of the reader
The actual implementation of each of those lines will involve quite a
few lines of code to do weakref munging, list maintenance, lookups of
contextual variables, and similar things. And that is only a small
excerpt from one of the smaller parts of the overall update
algorithm! So that is definitely a subject for another post.
More information about the PEAK
mailing list