[PEAK] A taxonomy of event sources: designing the events API

Wed Jan 7 00:23:50 EST 2004

Just a few thoughts on the kinds of event sources that PEAK will deal 
with...  if anybody has thoughts/suggestions/issues beyond what's listed 
here, please let me know.  At the moment I'm at home sick with the flu or 
something, so I'm not sure that all my thoughts will be coherent.  Then 
again, nobody may notice the difference from my usual writing.  :)

Stream vs. Broadcast vs. Handled -- Streamed events are consumed by the 
next registered callback.  Broadcast events are sent to all registered 
callbacks as of the time of the event.  Handled events are passed to a 
series of "handlers" until the event is "consumed".  Handled events can't 
be used to directly resume a thread, but the handler receiving them can of 
course forward the event through another event source.

Edge vs. Level -- An edge-triggered event resumes threads only when the 
event "occurs".  A level-triggered event (condition) resumes threads so 
long as the condition is present.  Callbacks registered while the condition 
is present are called immediately.

Examples:

Edge Stream - queue-like data source, GUI clicks/commands.

Edge broadcast - set a value, like a model field for a GUI's view

Level Stream - token passing/semaphore: i.e. a thread can "take" it by 
turning "off" the level, and then "release" it by turning "on" the level, 
whereupon the next thread in line may "take" it.  Also useful for I/O 
conditions like "readable(stream)" and "writable(stream)", since the 
condition may no longer hold once a thread receives the event and does 
something about it.

Level broadcast - condition flag (possibly automatic computation based on 
other kinds of event sources)

Edge handled - GUI commands, logger events, timer events

Level handled - no such thing, since handled events don't resume threads.

The above taxonomy is quite rough, but it seems to boil down into two basic 
kinds of event sources: handled events and everything else.  :)  The 
"everything else" then needs to know whether it's stream or broadcast, and 
that seems to be something that could live as a flagged thing.  The edge 
vs. level thing would then branch out a bit lower in the class hierarchy, 
with  some sort of "condition" base class for the level-triggered 
stuff.  It would need to have a way to know if the "condition" in question 
holds, in order to handle shouldSuspend(thread) differently than 
edge-triggered sources.  (Edge sources simply do 
'self.addCallback(thread.step); return True', or some such, while level 
sources check whether the condition is true first, and if so, just tell the 
thread not to suspend.)

It's rather interesting, I think, that despite the simplicity of these five 
categories, there is an immense amount of room for variation in actual 
event source implementations.  For example, one could create transformers 
from one kind of source to another, like stream-to-broadcast or 
edge-to-level (e.g. by testing whether the event value meets a condition), 
or transformers that transform the events they forward in some way.

The difficult thing for me at this point, then, is to ferret out which are 
the most basic forms of the fundamental five (four, really, since the 
"handled" event source paradigm doesn't really have any fundamental 
variations, though there are plenty of possible implementation variations 
having to do with handler priorities).  I could easily spend forever 
creating variations on each theme.

I also don't want to force people to spell out all the options every time 
they create an event source.  It should be possible to just create a 
Condition (level broadcast), Semaphore (level stream), Value (edge 
broadcast), or Distributor (edge stream).

It'd be nice to have a way to apply transforms or dependencies, perhaps as 
an input to these types.  So that whenever the basic event source has its 
addCallback() or shouldSuspend() methods called, the "dependencies" can be 
linked.  OTOH, transforms seem to be the most varied concept of all, since 
there are any number of possible rules being applied, coming from any 
number of event sources.  Consider, for example, a Condition that is based 
on the relationship between two or three Values, or a Distributor whose 
events are being converted from XML-RPC to pickles (just to make up a 
random nonsense conversion).

There are a few useful abstractions, though, like the notion of a 
dependency to another event source, which means that while callbacks exist 
on the dependent source, there must be a callback registered with the 
depended-on source.  In the simplest case this might be a single callback 
function.

Another idea is that perhaps it would be useful to have operators 
applicable to event sources:  aValue * 3 might create a derived Value, and 
aValue.gt(27) might produce a Condition tied to aValue.

Hm.  Actually, maybe the simplest thing of all would be to spawn event 
threads to do transformations, as the only "primitive" transform needed 
then would be an 'AnyOf(*eventSources)' object, thus allowing threads to 
wait on multiple events simultaneously.  It seems horribly wasteful to have 
so many threads, as it's hard not to think of them as expensive 
things.  But event threads are scarcely more than a simple object holding a 
list of generators.  Not only are they trivial in memory usage, they don't 
consume any CPU until/unless something happens to task switch to them.

If we take this approach, we can easily write any possible transform or 
dependency by writing a generator function, as long as we have a suitable 
implementation of 'AnyOf'.  The trick to writing a decent 'AnyOf' event is 
that it must resume a thread suspended on it once and *only* once, even if 
more than one of the "any" events fires.  I suppose the simplest way to do 
this is for AnyOf.shouldSuspend() to register a special callback with each 
of the "any" events, that will only call the thread once.  Heck, I guess 
actually that could be done by AnyOf.addCallback() - it would simply wrap 
the supplied callback with a forwarder that fires only once, and then call 
.addCallback() on each of the "any" events.

With this primitive available, we could then manage any sort of event 
transformations we liked, using event threads.  And since event threads are 
based on functional operations, we can build up a library over time of 
common functional patterns of transformation, such as "value matches 
constant -> condition" and "value computed from values using 
function".  One would then simply spawn a thread on the function with 
parameters, e.g.:

     self.scheduler.spawn( fireWhenEqual(self.aValue, 42, self.aCondition) )

This would then fire 'aCondition' when 'aValue' equals 42.  Or perhaps:

     self.scheduler.spawn(
         computeValue(
             lambda: self.aValue * 3,
             self.tripleValue,
             self.aValue
         )
     )

to ensure that 'tripleValue' is always set to three times 'aValue', 
whenever 'aValue' changes.  Of course, it might be easier (and more 
efficient) to simply write:

     def updateTriple(self):
         while True:
             yield self.aValue; value = events.resume()
             self.tripleValue.set(value * 3)

     updateTriple = binding.Make(
         events.threaded(updateTriple), uponAssembly=True
     )

So we'll have to see over time how that works out.  (events.threaded will 
be an advice wrapper that turns a generator function into a function that 
returns a thread, so once this component is assembled, 'self.updateTriple' 
will be an initialized events.Thread object that has already run up to its 
first 'yield'.  You can see how this will make using event threads 
ridiculously easy.)

Hm.  I just realized something rather interesting about this style of event 
dependency.  There's no way to end up in an infinite event loop if you 
always use threads to handle events.  By that I mean that if our 
'updateTriple' method updated 'self.aValue' instead of 'self.tripleValue' 
by mistake, it would not result in an infinite loop.  The reason is that 
while a thread is triggering an event, it cannot also be waiting on an 
event.  Thus, no such looping is possible.

Indeed, if you think about it, you'll see that any thread resumed by the 
first thread has the same dilemna - it can't fire an event that will cause 
the first thread to be resumed, because the first thread isn't suspended.

This doesn't mean that you can't create an infinite loop at all, just that 
you can't do it in a single event cycle.  For example, the second thread 
could, upon waking, suspend itself for let's say 0.0001 seconds, to allow 
the first thread to go back to sleep.  Then it could fire an event that 
would re-wake the first.  Note, however, that you have to 1) go out of your 
way to do it, 2) wait for another event (thus setting a cap on the speed of 
re-firing, and allowing the possibility of other threads intervening), and 
3) can't cause a stack overflow due to recursive invocation of 
threads.  The maximum Python stack depth during any event threading 
activity is equal to the number of simultaneously runnable threads.

(Of course, all that assumes that you don't do evil things like manually 
registering a callback to run 'aThread.step()' while the thread is already 
running, although we could certainly implement a re-entrancy check to 
prevent that if we wanted to.)

Anyway, that's a bit of a digression, certainly from the event 
taxonomy.  But it does seem as though we now know what basic types are 
required for non-handler events:

Condition
Semaphore
Value
Distributor
AnyOf

I've already mentioned that I'm dropping Queue, Subscription, and the 
associated interfaces.  They had circular reference issues and other 
complications, and writing this message has proved to me that they're 
entirely superfluous.  The existing State and Value types (and interfaces) 
will be merged into one 'Value' type; the distinction between them is 
trivial.  In the new Value, the 'set()' method will take an optional 
'force' parameter to mean, "fire an event even if the new value set is 
equal to the existing value".  There doesn't seem to be any point in 
maintaining different types for such a minor behavior variation.

Semaphore should take a count (default of 1), and have 
set(count)/put()/take() methods.  put() would increment the count, and 
take() would decrement it.  Yielding to a  semaphore suspends a thread 
unless the semaphore has a positive count.  A suspended thread will resume 
as soon as the count rises above zero again.

Condition should have a set() method that takes a boolean, and it would 
suspend threads yielding to it when the boolean is false.  Unlike a 
semaphore, however, it invokes all its callbacks when the condition becomes 
true.  (Note that a Condition is different from a boolean Value: a boolean 
Value *always* suspends yielding threads until the next change of the 
value, while a Condition only suspends when the value is false.)

Distributor would have just a 'send(event)' method, that would fire a 
single callback, passing in the event.  And AnyOf will be just a pure 
IEventSource.

I think the interfaces will change a bit too.  There should probably be an 
'IStatefulEventSource', adding a no-arguments __call__ method to 
IEventSource, to get the current "state" of the event source.  This would 
be implemented by all of the types previously listed, except for 
Distributor and AnyOf.  I suppose there could also be an 
'ISettableEventSource', adding a 'set(value)' method to 
'IStatefulEventSource'.  This again could be implemented by everything but 
Distributor and AnyOf, so perhaps it's sufficient to have a single 
interface for get/settability.  OTOH, functions like 'writable(socket)' 
shouldn't have to guarantee they'll return a settable event source, so it 
probably is best to keep the interfaces separate.

Finally, I suppose that Distributor and Semaphore should have interfaces to 
cover their special methods (put/take and send).

Hm.  Well that doesn't sound too bad...  five classes, maybe an abstract 
base, and one more if you count Thread.  Not bad at all for an event-driven 
programming microkernel!  Oh yeah, throw in the 'resume()' function, and a 
'threaded()' advice wrapper.

There will also be the scheduler and I/O selector components, but 
technically they're not part of the microkernel.  You can actually use 
event threads without any kind of scheduler or selector if you don't need 
to wait for time or I/O events.

I also did a bit of research, looking at Twisted's half-dozen or so reactor 
variants for supporting different GUI frameworks, and they all look like 
variations of the idea I had the other day for simply adding a 
periodically-scheduled event thread that asks the GUI framework for GUI 
events, and/or using the GUI framework's I/O event support to schedule 
I/O.  We can interoperate with Twisted's support for these by simply adding 
a thread to 'iterate()' the Twisted reactor repeatedly, or by adding GUI 
threads on our own.  Similarly, we can either have a raw I/O selector 
manage I/O event sources, or we can implement an I/O selector that works by 
delegating to a reactor.  Heck, the scheduler can also be implemented as 
either a standalone scheduler, or as a component that schedules threads 
using a reactor.  Either way, the interface can be the same.

So, the parts of peak.events that will lie beneath scheduling -- 
fundamental event source types and the event thread type -- really *are* a 
microkernel that doesn't care much what architectures you build on top of it.

I think that about wraps this up.  Unless somebody can see some gaping 
design flaws that I've missed here, or has some use cases that aren't 
covered, I think this represents a decent plan for the microkernel 
API.  Thoughts, anyone?