[PEAK] Connecting the Trellis to non-trellis systems
Phillip J. Eby
pje at telecommunity.com
Mon Mar 10 21:05:58 EDT 2008
There are two types of connections to non-trellis systems I'd like to
talk about for a bit: connections to callback-based systems, and
connections to "pure imperative" systems. For example, connecting to
a socket using Twisted or receiving wxPython GUI events might be an
example of the first kind of system, and connecting to a database via
SQLAlchemy or controlling Ecco via DDE would be examples of the second.
The catch with the first kind of system is that in the Trellis, we
don't like callbacks. :) More specifically, we don't like having to
explicitly register them. What we really want is to just do
something like 'is_readable(socket)' in a rule, and have the
callbacks automatically get set up -- and automatically disconnected
when they're not needed.
One crude way of doing this is already implemented in Trellis "Time"
rules. There's a weak dictionary that discards the timed event cells
if they are no longer being referenced. This is fine for Trellis
time events, as they don't have any reference cycles and the Trellis
ignores discarded events.
But, for more complex applications, there can be undesirable
consequences for leaving subscriptions in effect when they are not
being used. For example, consider the consequences of say, capturing
the mouse in a particular window, and not releasing it! (i.e.,
imagine you could have a cell like 'captured_mouse_position', and
capture the mouse as a side effect of depending on its value).
(Also, when interacting with an external system, unless special
precautions are taken, the callback from that system is going to have
a reference to the target cell... meaning it won't be able to be
garbage collected, and thus won't be able to rely on getting
automatically cleaned up.)
So, we really need an easy way to make or break callback arrangements
when a cell gains -- or loses -- subscribers. Perhaps a specialized
cell type that's easily customized (either per-instance or
per-subclass) to do the right sort of callback-making and callback-breaking.
The second kind of interfacing is a bit different. Instead of
wanting to receive input via callbacks from the outside world, we
want to simply *wrap* an externally-supplied data value, passing both
reads and writes through to the underlying system (maybe with some
caching), but notifying other cells when the value is changed.
Generally, these external systems may be costly to retrieve data
from, so we probably want them to be "optional" attributes, i.e., not
initialized until/unless you read their values. Costly retrieval
also implies that we may want to cache the read value, rather than
passing through every read.
In fact, even if retrieval isn't costly, we *still* need to do
caching, because within a given trellis recalculation, a cell's value
is supposed to be stable. It can't just go changing on its own.
If we're connecting to a system where values *can* change on the fly,
but does not offer notification callbacks, then we may need some way
to use a "time to live" (TTL) before the value is automatically
refreshed. And we can provide an explicit refresh operation as a @modifier.
Having the cached value is also handy for writes; we can compare the
cached value to the written value in order to decide whether to pass
on the write to the underlying system.
It seems like it might be possible to create a single cell type that
handles all of these use cases. There could be methods like:
_poll() -> return the current value from the outside system
_send(value) -> send the value to the outside system
_receive(value) -> receive a value from a callback
_subscribe() -> arrange for _receive() to be called on changes
_unsubscribe() -> stop calling _receive()
refresh() -> self._receive(self._poll())
With the exception of _receive(), and refresh(), these methods would
be specific to the type of thing being interfaced with. When the
cell is uninitialized -- or if its TTL has expired -- reading its
value would do a refresh() first, before reading the cached value.
Whenever the cell is written with a changed value, it would pass the
value through to _send(), as soon as the current recalculation was
completed. And when listeners are added or removed from the cell, it
would make sure to arrange for the appropriate _subscribe() or
_unsubscribe() operation to occur when the current recalculation commits.
In practice, it will probably be best to have two cell types: one for
a read-only connections, and one for read-write connections. For
SQLAlchemy integration, we'll take the read-write version and set it
up to talk through a delegated descriptor. (Which will be an
interesting sub-project in itself, I suspect.)
TTL is also an interesting subproject. If reading the cell value
checks the TTL using the standard Time service, then any rule that
reads the cell will also implicitly depend on the TTL and be
refreshed when the TTL expires. In some respects this is reasonable
and perhaps even desirable, except that it will cause rules to be
re-run even when the value hasn't changed. Perhaps it would be
better to make the TTL system a part of the subscribe/unsubscribe
mechanism, such that refresh() is called when the TTL expires, if and
only if there are listeners still looking for the value.
Yes, that seems to make more sense. In fact, if this cell type is a
variation of the standard rule+value cell type, then it's even
easier. The rule will simply amount to something like:
if ttl_has_expired:
reset the ttl
return _poll()
So that could work pretty decently, I think. What's rather
interesting about this is that it would make it *really* easy to
interface to systems that have to be polled, such as inter-thread
queues, filesystem directory contents, and so forth. We could even
have an API function like "sense(interval, func, *args)" so you could
do something like::
if trellis.sense(10, os.path.isfile, some_file):
...
in a rule, so as to automatically detect when some_file is created,
checking every 10 seconds. (The function would similar to the Time
service, i.e., by referring to a service that holds cells in a weak
value dictionary, keyed by the interval, function, and function
arguments. Thus, the same cell would be reused by any/every rule
that's polling for the same thing.)
So, I'm thinking that the two cell types we need could be called a
Sensor (read-only) and an Effector (read-write). The only difference
between the two would be that an Effector would be writable, and
would need a _send() method. (Well, and Effector would probably mix
in different classes to get write behavior, but that's an
implementation detail.)
[insert **long** delay while I sketch lots of code for hours on end...]
So, I think I've got this figured out now. Sensor and Effector will
be generic cell types, and the standard Cell() constructor will be
enhanced to automatically create them if needed. There will be a
'writer' keyword you can use to supply a 'write(value)' function, and
if specified it will make your cell an Effector.
To implement subscription-based rules, you'll be able to subclass a
base called AbstractConnector, implementing read(), subscribe(cell)
and unsubscribe(cell, key) methods. Using such a rule will make your
cell a Sensor (unless you also specify a writer, in which case you'll
get an Effector). You'll also be able to use
Connector(read,sub,unsub) to create a connector from three functions,
without needing to make a subclass.
The net result, once I do the implementation and testing, should be
that connecting to read-only outside data sources will require only
making appropriate connectors, or using a polling factory to wrap the
rule. Connecting a writer to a data source will require only that
the write function be known.
There may also need to be some changes to the high-level API to allow
specifying this sort of thing for rules in a class body. More
likely, however, connectors will get managed via services or explicit
cell creation and manipulation.
For example, it's likely that testing a socket's readability or
writability will occur through an API, rather than by having a cell
attribute tied directly to this. These APIs will simply create the
appropriate cell and read its value, caching the cell according to
its creation parameters (e.g., by using an add-on, or a weak
dictionary). For example, a socket management service would probably
cache such cells by fileno(), while for a wx event listener, there
would probably be a cache of cells by event ID attached as an add-on
to the target window or other object. Querying for these events is
then reduced to a dictionary lookup followed by a .value access in
the common case.
Whew! I think that's more than enough for today. :)
More information about the PEAK
mailing list