[PEAK] Trellis API Sketches, part 2

Fri Jul 6 22:45:00 EDT 2007

The examples we've looked at so far only use simple rule and value 
cells, and we've avoided the use of mutable collections as monitoring targets.

The process monitor example used tuples so that value comparisons 
between the old and new values of a set of processes would be able to 
detect changes.  For the process monitor application, this approach 
is sufficient because of the relatively small number of items being 
handled.  For large and frequently-changing collections, however, 
this will not suffice.

For example, suppose we wanted to maintain a to-do list of new 
processes to be launched.  We could implement a FIFO queue like this::

class FIFO(trellis.Component):

     trellis.rules(
         data = lambda self: []
     )

     changed = trellis.event()

     def __len__(self):
         self.changed    # make callers depend on any changes
         return len(self.data)

     def __iter__(self):
         while self:
             yield self.pop()

     def append(self, item):
         self.changed = True
         self.data.append(item)

     def pop(self):
         self.changed    # make callers depend on any changes
         self.changed = True
         return self.data.pop(0)

Most of this should be obvious, except for one new feature: the 
'changed' trellis.event().  Events are similar to other attributes, 
*except* that they revert back to a default value (usually None) at 
the end of each global update, and this reversion does not cause any 
dependent rules to be updated.

So in this example, any rules that examine the contents of the FIFO 
(whether via .pop() or len() or bool()) will be recalculated after 
any items are added or removed.  And immediately afterward, 'changed' 
will revert to None, until the next time a change occurs.

Now, as for the lines that just say "self.changed" without any other 
action, they are there simply to ensure that any rule using those 
methods will be marked as depending on whether the FIFO's contents 
have changed.

This is a fairly primitive data structure as these things go - about 
the simplest possible useful mutable data structure you can 
make.  Obviously, you could extend this idea to creating a set or 
sequence with all the trimmings, such that any change would set the 
changed flag, and any reading of the contents would cause the rule 
using them to be refreshed whenever the contents changed.

You could also get more specific, and log the actual changes 
occurring, rather than just flag a specific change.  One way to do 
this, of course, is to use a FIFO to record the changes!  For example...

class FancyFIFO(FIFO):

     trellis.rules(
         _added = lambda self: FIFO(),
         _removed = lambda self: FIFO()
     )

     trellis.events(
         added = lambda self: list(self._added),
         removed = lambda self: list(self._removed),
     )

     def append(self, item):
         FIFO.append(self, item)
         self._added.append(item)

     def pop(self):
         item = FIFO.pop(self)
         self._removed.append(item)
         return item

Here, the 'added' and 'removed' attributes are always either 
collections of the items added or removed in the last trellis pulse, 
or None if the FancyFIFO was not changed.

More complex collections might use more sophisticated ways of 
providing suitable change events for the application's needs.  Also, 
ideally such collections should not actually *expose* requested 
changes to any code that might be directly reading the collection's 
contents, since this could lead to unintentional ordering 
dependencies between rules.

That is, if one rule makes a change to a collection, and another rule 
reads its value, then it makes a difference what order these two 
rules are calculated in.  This can be prevented by the simple 
expedient of storing the changer of the collection in an event attribute.

Attempting to set a controlled attribute to two different values 
during a given pulse is automatically an error, so it's almost like 
having a "lock".  (Except that instead of a race or deadlock, you get 
an exception, which then tells you where your code is flawed!)

The trellis library will probably provide some basic collection types 
that implement appropriate "concurrency" controls.

In general, however, most programs will have few data structures 
where such things are needed.  If a data structure is only changed by 
one rule, or only read by one rule, no additional protection is 
required.  And generally speaking, it's best for data structures to 
be written to by only one rule.

In cases where multiple rules need to write something, the changes 
can be serialized using a FIFO -- which is why I sketched the first 
FIFO example in the first place.

Specifically, I discovered I needed FIFOs when I tried sketching this::

class Time(context.Service, trellis.Component):

     trellis.rules(
         now        = lambda self: EventLoop.pulse and Timestamp(time()),
         _added     = lambda self: trellis.FIFO()
         _schedule  = lambda self: [NOT_YET],
         _events    = lambda self: weakref.WeakValueDictionary(),
     )

     @trellis.rule
     def next_event_time(self):
         while self._added:
            heapq.heappush(self._schedule, self.added.pop())

         while self.now >= self._schedule[0]:
             key = heapq.heappop(self._schedule)
             if key in self._events:
                 self._events.pop(key).value = True

         return self._schedule[0]

     def after(self, when):
         if when not in self._events:
             if when <= time():
                 return True
             self._added.append(when)
             self._events[when] = trellis.Cell(value=False)
         return self._events[when].value

     def delay(self, secs=0):
         return self.after(self.now + secs)

This is a rough draft of how trellis's "Time" service might look.  As 
you can see, it doesn't even need any particularly fancy features; 
the only ones we haven't seen yet are EventLoop.pulse and the 
Timestamp and Cell constructors.

None of these features, however, are anything particularly special or 
internal to the trellis system.  If this were the supplied Time 
service, and you didn't like it for some reason, you could trivially 
write your own and replace the standard one.  That will come in handy 
for running tests using simulated time, or providing events based on 
Twisted's scheduler instead!

However, it's illustrative of how open the framework is that not only 
can you replace this, or have as many other similar services running 
*simultaneously* as you like, but the implementation is just so darn simple.

But on with the explanation of it.  The 'after()' method returns a 
true or false value, telling you if a certain time has passed, and 
makes any rule that calls this method dependent on an event linked to 
that future time, causing recalculation when that future time arrives.

The 'next_event_time' attribute will always equal the time of the 
next possible refresh of a rule, or NOT_YET if there are no remaining 
future events.  (NOT_YET is a timestamp that's greater than every 
other possible timestamp, so it's never taken off the _schedule heap.)

Oh, and the EventLoop.pulse thing is effectively a way of asking for 
your rule to be polled constantly.  It's a cell that always contains 
the current pulse count; it gets incremented whenever something is 
changed imperatively (i.e. by assignment to a cell, as opposed to a 
rule returning a new value).  So, whenever "something happens", 
Time.now and Time.next_event_time will be updated.

Building up from here, the trellis will also have service objects for 
Signals and IO.  These, along with the EventLoop and Time services, 
will be replaced by Twisted-specific versions when using Twisted as 
the main event loop.

However, unlike Twisted's singleton reactor, it will be possible to 
have different threads running different event loops, if you need to, 
or nested event loops, or darn near any other weirdness you'd like to 
set up, so long as only one event loop is actually based on 
Twisted.  (In that respect, Trellis will be pretty much the same as 
peak.events and peak.running.)

So far, I haven't found any application areas in my sketches where 
I'd want to use peak.events-style microthreads.  I suspect that this 
is because those are most useful for stateful sequential processes, 
like say, a web spider making outgoing HTTP requests.  It's very 
easy, however, to create a microthread wrapper that runs on the 
trellis framework::

     _sentinel = object()

     class Microthread(trellis.Component):

         def __init__(self, geniter, **kw):
             self.stack = [geniter]
             self.input = None
             super(Microthread, self).__init__(**kw)

         input = trellis.event(value=_sentinel)

         @trellis.rule
         def step(self):
             if not self.stack:
                 return  # we're done
             if self.input is not _sentinel:
                 # don't resume unless there's input
                 return self.process(self.stack[-1].send(self.input))
             return self.step

         def process(self, retval):
             if instance(retval, GeneratorType):
                 self.stack.append(retval)
                 self.input = None
             elif isinstance(retval, Deferred):
                 @retval.addcallback
                 def on_fire(val):
                     self.input = val
             elif isinstance(retval, Until):
                 waiting_past = Time.now
                 def wait_for(self):
                     if retval.active:
                         if waiting_past(retval.timeout):
                             self.input = False
                             retval.active = False
                         elif retval.condition():
                             self.input = True
                             retval.active = False
                 return Cell(rule=wait_for, init=True)
             else:
                 self.stack.pop()
                 if self.stack:
                     self.input = retval
                 else:
                     return retval

``process()`` here could (and should!) be a generic function, but I 
wrote it this way for illustration purposes, showing what we might do 
with a generator ("call" it by putting it on the stack) or a Twisted 
Deferred (register callbacks to feed the input back into the yield).

If it's a special "Until" object wrapping a condition function, say:

      yield Until(lambda: something and something_else or foobar)

Then process() returns a new cell that will restart the task at the 
right point in time.

Finally, if it's any non-special value, the stack gets popped, and 
the value is fed back into self.input so the "calling" generator will 
be resumed.

Actually, I'm leaving out a ton of other stuff, like error 
handling.  As written here, errors will blow the whole thing 
apart.  However, the basic structure here is *really* simple -- a 
good bit simpler than the same thing in peak.events-land, where 
there's a lot more explicit callback management.  An error handling 
version would need another event attribute to hold incoming errors, 
and the 'step' rule would have to check whether 'error' was set and 
use .throw() instead of .send(), and there needs to be an exception 
handler that sets self.error to the exception data, etc.

Of course, as with the other mechanisms we've discussed, this is all 
built on basic Trellis features, which means you're free to invent 
alternative implementations of the concept and run them side-by-side 
in the same system, simultaneously.

Whew.  I think that this second set of sketches now shows that my 
current API concepts will work at the services layer as well as at 
the application layer, without needing to puncture any basic abstractions.

In particular, none of the code so far has had to mess with *any* 
private parts of cells or of the event loop system itself -- which is 
a *major* victory for the Trellis abstraction.  It's very 
"non-leaky", and so far all the examples have been very easy for me 
to reason about and draft.

Of course, without actually *running* any of them, I can't truly 
verify my intuitions as yet.  For that, I'll have to actually 
*implement* the Trellis, and in particular my new version of the 
update algorithm that's designed to support the circularity and 
orthogonality of features required by the API sketches so far.

But the algorithm details will have to wait for another post; I've 
been at the computer almost continuously for the last 14 hours with 
only one 2-hour break, and I definitely need to get away from this 
for awhile.  The algorithm itself is rather complex, with lots of 
hairy edge bits and corner cases.  My current pseudocode draft 
includes a lot of statements like this:

    if the cell is not a constant (and the reader isn't me):
        make the reader a (weak) listener of this cell
        make this cell a (strong) dependency of the reader

The actual implementation of each of those lines will involve quite a 
few lines of code to do weakref munging, list maintenance, lookups of 
contextual variables, and similar things.  And that is only a small 
excerpt from one of the smaller parts of the overall update 
algorithm!  So that is definitely a subject for another post.