[PEAK] Connecting the Trellis to non-trellis systems

Mon Mar 10 21:05:58 EDT 2008

There are two types of connections to non-trellis systems I'd like to 
talk about for a bit: connections to callback-based systems, and 
connections to "pure imperative" systems.  For example, connecting to 
a socket using Twisted or receiving wxPython GUI events might be an 
example of the first kind of system, and connecting to a database via 
SQLAlchemy or controlling Ecco via DDE would be examples of the second.

The catch with the first kind of system is that in the Trellis, we 
don't like callbacks.  :)  More specifically, we don't like having to 
explicitly register them.  What we really want is to just do 
something like 'is_readable(socket)' in a rule, and have the 
callbacks automatically get set up -- and automatically disconnected 
when they're not needed.

One crude way of doing this is already implemented in Trellis "Time" 
rules.  There's a weak dictionary that discards the timed event cells 
if they are no longer being referenced.  This is fine for Trellis 
time events, as they don't have any reference cycles and the Trellis 
ignores discarded events.

But, for more complex applications, there can be undesirable 
consequences for leaving subscriptions in effect when they are not 
being used.  For example, consider the consequences of say, capturing 
the mouse in a particular window, and not releasing it!  (i.e., 
imagine you could have a cell like 'captured_mouse_position', and 
capture the mouse as a side effect of depending on its value).

(Also, when interacting with an external system, unless special 
precautions are taken, the callback from that system is going to have 
a reference to the target cell...  meaning it won't be able to be 
garbage collected, and thus won't be able to rely on getting 
automatically cleaned up.)

So, we really need an easy way to make or break callback arrangements 
when a cell gains -- or loses -- subscribers.  Perhaps a specialized 
cell type that's easily customized (either per-instance or 
per-subclass) to do the right sort of callback-making and callback-breaking.

The second kind of interfacing is a bit different.  Instead of 
wanting to receive input via callbacks from the outside world, we 
want to simply *wrap* an externally-supplied data value, passing both 
reads and writes through to the underlying system (maybe with some 
caching), but notifying other cells when the value is changed.

Generally, these external systems may be costly to retrieve data 
from, so we probably want them to be "optional" attributes, i.e., not 
initialized until/unless you read their values.  Costly retrieval 
also implies that we may want to cache the read value, rather than 
passing through every read.

In fact, even if retrieval isn't costly, we *still* need to do 
caching, because within a given trellis recalculation, a cell's value 
is supposed to be stable.  It can't just go changing on its own.

If we're connecting to a system where values *can* change on the fly, 
but does not offer notification callbacks, then we may need some way 
to use a "time to live" (TTL) before the value is automatically 
refreshed.  And we can provide an explicit refresh operation as a @modifier.

Having the cached value is also handy for writes; we can compare the 
cached value to the written value in order to decide whether to pass 
on the write to the underlying system.

It seems like it might be possible to create a single cell type that 
handles all of these use cases.  There could be methods like:

_poll() -> return the current value from the outside system
_send(value) -> send the value to the outside system
_receive(value) -> receive a value from a callback

_subscribe() -> arrange for _receive() to be called on changes
_unsubscribe() -> stop calling _receive()

refresh() -> self._receive(self._poll())

With the exception of _receive(), and refresh(), these methods would 
be specific to the type of thing being interfaced with.  When the 
cell is uninitialized -- or if its TTL has expired -- reading its 
value would do a refresh() first, before reading the cached value.

Whenever the cell is written with a changed value, it would pass the 
value through to _send(), as soon as the current recalculation was 
completed.  And when listeners are added or removed from the cell, it 
would make sure to arrange for the appropriate _subscribe() or 
_unsubscribe() operation to occur when the current recalculation commits.

In practice, it will probably be best to have two cell types: one for 
a read-only connections, and one for read-write connections.  For 
SQLAlchemy integration, we'll take the read-write version and set it 
up to talk through a delegated descriptor.  (Which will be an 
interesting sub-project in itself, I suspect.)

TTL is also an interesting subproject.  If reading the cell value 
checks the TTL using the standard Time service, then any rule that 
reads the cell will also implicitly depend on the TTL and be 
refreshed when the TTL expires.  In some respects this is reasonable 
and perhaps even desirable, except that it will cause rules to be 
re-run even when the value hasn't changed.  Perhaps it would be 
better to make the TTL system a part of the subscribe/unsubscribe 
mechanism, such that refresh() is called when the TTL expires, if and 
only if there are listeners still looking for the value.

Yes, that seems to make more sense.  In fact, if this cell type is a 
variation of the standard rule+value cell type, then it's even 
easier.  The rule will simply amount to something like:

     if ttl_has_expired:
         reset the ttl
         return _poll()

So that could work pretty decently, I think.  What's rather 
interesting about this is that it would make it *really* easy to 
interface to systems that have to be polled, such as inter-thread 
queues, filesystem directory contents, and so forth.  We could even 
have an API function like "sense(interval, func, *args)" so you could 
do something like::

     if trellis.sense(10, os.path.isfile, some_file):
        ...

in a rule, so as to automatically detect when some_file is created, 
checking every 10 seconds.  (The function would similar to the Time 
service, i.e., by referring to a service that holds cells in a weak 
value dictionary, keyed by the interval, function, and function 
arguments.  Thus, the same cell would be reused by any/every rule 
that's polling for the same thing.)

So, I'm thinking that the two cell types we need could be called a 
Sensor (read-only) and an Effector (read-write).  The only difference 
between the two would be that an Effector would be writable, and 
would need a _send() method.  (Well, and Effector would probably mix 
in different classes to get write behavior, but that's an 
implementation detail.)

[insert **long** delay while I sketch lots of code for hours on end...]

So, I think I've got this figured out now.  Sensor and Effector will 
be generic cell types, and the standard Cell() constructor will be 
enhanced to automatically create them if needed.  There will be a 
'writer' keyword you can use to supply a 'write(value)' function, and 
if specified it will make your cell an Effector.

To implement subscription-based rules, you'll be able to subclass a 
base called AbstractConnector, implementing read(), subscribe(cell) 
and unsubscribe(cell, key) methods.  Using such a rule will make your 
cell a Sensor (unless you also specify a writer, in which case you'll 
get an Effector).  You'll also be able to use 
Connector(read,sub,unsub) to create a connector from three functions, 
without needing to make a subclass.

The net result, once I do the implementation and testing, should be 
that connecting to read-only outside data sources will require only 
making appropriate connectors, or using a polling factory to wrap the 
rule.  Connecting a writer to a data source will require only that 
the write function be known.

There may also need to be some changes to the high-level API to allow 
specifying this sort of thing for rules in a class body.  More 
likely, however, connectors will get managed via services or explicit 
cell creation and manipulation.

For example, it's likely that testing a socket's readability or 
writability will occur through an API, rather than by having a cell 
attribute tied directly to this.  These APIs will simply create the 
appropriate cell and read its value, caching the cell according to 
its creation parameters (e.g., by using an add-on, or a weak 
dictionary).  For example, a socket management service would probably 
cache such cells by fileno(), while for a wx event listener, there 
would probably be a cache of cells by event ID attached as an add-on 
to the target window or other object.  Querying for these events is 
then reduced to a dictionary lookup followed by a .value access in 
the common case.

Whew!  I think that's more than enough for today.  :)