[PEAK] Miscellaneous Trellis+GUI+Twisted+DB thoughts
Phillip J. Eby
pje at telecommunity.com
Thu May 1 21:13:24 EDT 2008
So, now that the new API is basically in place (minus
connect/disconnect, which I've got a partial version of in my
checkout), I'm turning my attention to making use of the new features
in practical apps, with GUIs, I/O, and databases.
Socket and other I/O with Twisted
---------------------------------
For I/O, I'd like it to be practical to implement internet protocols
with Trellis components, but *without* the protocol implementation
depending on any *actual* I/O. My first whack at this is in the
FreeSwytch library, which implements the protocol for talking to the
FreeSwitch VoIP system's event server using Trellis components. But
the implementation of the code that hooks to Twisted is a bit crude
and is entirely specific to FreeSwytch.
However, with a bit of cleanup, I think it would be possible to make
a general-purpose library for connecting pure-Trellis protocol
implementations to Twisted. In fact, a good portion of that code
would actually not be specific to Twisted, and could be used to
connect to any sufficiently-capable I/O loop.
The non-specific abstractions I have in mind are basically extensions
of the existing Pipe concept, to add the ability to detect whether
the connection is live, to request reconnection or closing, and
optionally, to have a "peer" associated with the object.
So, for a TCP connection, you would have an inbound pipe and and
outbound pipe, for reading and writing, respectively. And each would
have an open/close status (since some sockets are half-closable, as
used in HTTP).
This would be fairly easy to hook up to a Twisted "protocol" object
as a bridge. But, you still have to connect it to your application's
protocol object(s), or vice versa. (This is where the awkwardness in
FreeSwytch comes in.)
After giving it some thought, though, it seems to me that the issues
of connecting an application protocol object to either a client or
server connection can be solved by the simple expedient of having
connect/listen APIs that accept a pipe for *new connections* should be fed to.
In other words, when calling Twisted's listenTCP() or connectTCP(),
you would pass in a protocol-factory wrapper around a Pipe instance
belonging to your application, and have a @maintain rule that reads
connection(s) from the Pipe and creates your actual application
protocol objects. That way, whatever natural factory exists in your
application isn't dependent on Twisted, and can be tested by feeding
it dummy connection objects.
(Actually - they're not dummy connection objects, because the
connection objects themselves will be abstract/inert, so they are
perfectly usable for testing by themselves. You just write test code
that puts data into the pipes, or checks the data coming out.)
So, we will need a couple of "connection" components that are
general-purpose, and a couple of Twisted-specific adapter objects to
turn a pipe into a Twisted protocol factory, and a connection object
into a Twisted protocol. The protocol factory adapter would create a
connection and then use the other adapter to turn the connection into
a protocol. Voila!
It probably won't be *quite* that simple, since Twisted has various
kinds of protocols and factories, for handling subprocess
communications, Unix sockets, datagrams, etc. It's unlikely that
I'll implement all of them right away. AFAIK, Chandler only needs
the TCP variants, though for some of my hobby projects (such as a
TiVo protocols library), I'll need UDP and subprocess comms too.
At this point, I don't have a detailed design for the connection
objects. I plan to look over what Twisted's doing, as well as
Chandler and my own projects, and nail down the details soon. (I do
have a neat code name for the bridging library, though: "vine", as in
a thing that's Twisted but grows on a Trellis. :) )
GUI Layout
----------
So, it's been a long time since I thought much about how to do GUI
layout with Trellis, long enough that the Trellis's capabilities have
completely changed the picture... no pun intended. Back then, one
of the hardest parts of doing grid-like form layouts and other fancy
dynamic layouts with the Trellis was the up-and-down nature of
calculations, where a column's width depends on the width of its
contents, but the contents' *actual* width might depend on the width
of the column.
More specifically, the issue was that an object would need to know
its position relative to its parent, which meant adding an object to
a layout required updating both the parent and the child to know
about each other. This is a PITA when constructing objects in code,
as anyone who's built a wx layout by hand will know. :)
Specifically, you can't just create a parent and pass in a list of
children all in one giant expression, because the children have to be
created *already knowing* who their parent is.
Now, the simple layout and styling system I created a few months ago
can work around this to some extent, but it entirely relies on the
underlying GUI library to manage the actual layout, which makes
testing a bit more difficult.
However, that old problem is now solved, because a component that
wants to position its children can simply do so in a @maintain rule
that explicitly sets its children's positions. For example, you
could have a "column" component that sets all its children's left
positions and maximum widths, based on its own position and maximum
width, and a "columns" component that sets the left position and
maximum widths of its contained columns, based on their requested
widths and the available width of the "columns" object itself.
It's actually a bit more complex than I've described, in that spans
have to be dealt with for purposes of width
calculation. Essentially, each column's position is defined by the
maximum value of a set of constraints, where the principal constraint
is defined by the previous column's position plus its width.
More generally, you could say that an entire layout consists of
nothing but taking the minimums or maximums of various combined
constraints defined by the contents of the layout. And, as long as
those constraints don't change a lot, it's pretty efficient. I
wouldn't want to use it to lay out arbitrarily large tables, but for
form layouts it should be more than sufficient.
Layout mechanisms could also benefit from better methods of handling
aggregate functions in the Trellis. For example, being able to have
dynamically updated min(), max(), sum(), etc. of collections, that
don't require looping over the entire collection when a single value
is changed. That will probably come when I get to working on collection APIs.
It's probably a bit early to worry much about layout though, so this
part is on the back burner, relative to Twisted, GUI events, and collections.
GUI Events
----------
I've been surveying Chandler GUI code for its usage of wx events, and
examining how such code could be simplified and made more testable by
allowing event-based values to be queried. That is, if for example
you could use rules saying things like 'if mouseover(self)', instead
of having to explicitly bind mouse enter/leave events to update
methods that then modify a variable for some other code to read.
So far, I haven't come up with any particularly brilliant way to
spell this, although just the ability to do this with wx-specific
events and capabilities would be nice in many places. My main
concern for the API design is that I'd like to allow for the kind of
separation that the socket connection stuff does. That is, much of
what a "controller" does (in the original MVC sense of controller) is
independent of the GUI technology used, and could be tested
independently. (i.e., things like mouseovers, doubleclicks, dragging, etc.)
And I rather like the idea of being able to maybe test a component by
saying something like someController.clickAt(x, y) and find out what
all it would do to its GUI, somewhat the way I can test TCP protocols
without needing to connect them to Twisted. It wouldn't prevent GUI
system bugs or quirks, of course, but once you found those bugs you
could incorporate the workarounds into your testing.
The main problem with the idea is that the communication channel
between app and GUI is a lot wider than that between two TCP
sockets! There's the mouse, keyboard, and the current focus, not to
mention all the possible drawing operations going the other
way. These factors seem to make it hard to have any generic notion
of such communication channels, the way it should be possible to do
with sockets.
So, in the short run, I expect I will just stick to making it
possible to easily monitor wx events (and event-driven values) in
Trellis rules. A more sophisticated or more-generic alternative can
wait until we get some experience with the first round of
simplification. Unfortunately, this may not provide any testability
benefits at first, but it's a walk vs. crawl thing.
Collections and Databases
-------------------------
After doing a good bit of work to get a "hello world" version of
using the SQLAlchemy ORM with the Trellis, I have something that
*appears* to work, but in actuality I have no real way to know if it
will actually work correctly in a real application. This is largely
due to the complex and unknown interactions between SQLAlchemy's
not-so-well-defined event and lifecycle model on the one hand, and
the Trellis' need for undoability and non-conflicting writes on the other.
I have no idea even how to begin *testing* it, and that bothers
me. SQLAlchemy seems like a nice way to access databases in a
(semi-)relational way, or to do an ORM on some non-Trellised data
structures. In fact, I'm looking forward to actually trying to use
SQLAlchemy's ORM on other projects of mine, where lack of
Trellis-ness won't be a problem. (E.g. command-line utilities and
"standard" web apps.) But for mixing with the Trellis, it doesn't
look like such a good idea after all.
Even if the base mapping were to work well (and if it does, it's
essentially accidental at this point), I haven't even tried to
integrate collections yet, and this is where all the interesting bits
for Trellis applications are likely to be. It's all very well and
good that you can make your model objects viewable and editable in a
GUI, but practical applications (especially Chandler) need to deal
heavily with collections of objects that are being dynamically
updated, and potentially need to be reflected in a GUI.
Chandler's use cases for collection updates actually have two
different levels of update granularity, too. Changes to the items in
a collection need to be reflected immediately, while changes in
collection *membership* may need to be deferred until a manual
refresh or timeout, whether those changes are the result of
background syncing operations or foreground changes made by the user.
As far as I can tell, SQLAlchemy's collections don't provide a way to
handle these use cases that I can cleanly integrate with the
Trellis. In contrast, integrating "gated refresh" in the Trellis is
straightforward, at least in principle. (I.e., using @maintain rules
that return their previous value unless a refresh event occcurs or
it's the first run of the rule.)
So, what's needed to get a db-backed collections framework? At the
Trellis level, we now have the necessary raw materials to implement
ORM... ironically through one of the features that were added to
support SQLAlchemy ORM. (Specifically, the ability to test whether a
cell's value was set or computed.)
This means that it's possible to create a DB-backed version of a
Trellis component simply by replacing its cells with ones that lazily
retrieve data from a database, along with rules that write data back
when the cells are set. These operations could probably be managed
by a simple AddOn that holds the writeback rule(s) and caches any
relevant records.
Or, alternatively, you could write mappers as AddOn classes, such
that each mapper manages the state for some portion of the data. For
example, an object with data in more than one record type/table could
have an add-on for each, tracking whether the associated cells need a
write-back.
(Nah, that's too complicated. Even if you need per-mapper state,
there's no reason to have multiple add-ons; they can just be objects
managed by a single add-on.)
Anyway, the resulting model would probably look a lot like the
peak.storage.data_managers module in the old PEAK core, except that
model objects won't subclass Persistent, and it will be possible in
principle for a single object to be managed by multiple DMs... while
also allowing a single DM to serve as a cache and state manager for
multiple object types, without subclassing.
That is, I think that we can actually implement persistence
strategies as DM add-ons keyed by type, and persistence states for
individual objects as add-ons keyed by DM. Thus, you could have an
object loaded from one DM, but saved to multiple DM's (e.g. for a
save to file or "export" operation). And, persistence strategy
implementers wouldn't need to know as much about the inner workings of DMs.
The old QueryLink and ListProxy types from PEAK would be replaced of
course with simple Trellis collection types, for the most
part. Since we'll be using lazy cells to create these objects only
when needed, the special wrappers we used in the past won't be needed.
We will, however, need to solve a somewhat trickier problem, which is
how to set both ends of a bidirectional link without incurring any
strangeness when we switch back and forth between db-backed
collections and pure in-memory ones.
One simple way to do this would be to have a "relationship manager"
that allows you to communicate link/unlink operations between
collections via pipes. This RM would get commands like
.link(fromkey, tokey) and .unlink(fromkey, tokey), then look up
fromkey and tokey in a pair of caches to see if it has a cell or pipe
for that key, and if so, writing the command and the opposing key to
that cell or pipe. Collections would simply ask for a cell or pipe
that matches the key on their side of the relationship, and use it to
automatically include in their adds or deletes.
As a consequence, if a collection isn't currently in use (due to lazy
loading), then it won't receive any change events... just the way
PEAK's old QueryLink class ignored updates for not-yet-loaded collections.
Oh, and of course these special bidirectional collection types will
need to send link/unlink commands to the relationship manager based
on whatever manual changes are made to their contents. And they'll
need to deal with conflicting updates, like explicitly deleting an
item and implicitly add it at the "same time" (during the same recalc).
But, it will be a nicely symmetrical solution, in that these biref
types and the relationship manager will be usable for in-memory
operations just as easily as for DB-backed ones. DB-backed ones may
be a bit more specialized, however, in that they may use the RM
directly to know about what to insert/update in the DB. Actually,
it's more likely that the only specialization will be needed for
collections whose expected size makes them impractical to be loaded
all-at-once, even lazily (as opposed to incrementally). For the most
part, we should be able to use the same "real" biref collection types
for most anything.
Type mappings, conversion, and validation for the DB can probably be
handled by specialized cells. That is, since we'll be specializing
the lazy-load-collection cells to some extent, we can possibly use
them to convert collections on assignment to biref-capable
versions. On the other hand, it might make more sense (and be easier
to implement) if we simply make biref-requiring collection attributes
read-only, and use slice updates (e.g. foo.bar[:] = baz ).
Of course, we'll also have to implement something like PEAK's
transaction system. Fortunately, it's not that complex, if we use a
Contextual service in place of PEAK's configuration
hierarchy. However, if we're going to use SQLAlchemy in place of
peak.storage.Connections and peak.storage.SQL, we'll have to factor
SA's transaction model into the picture, and I'm not sure what it
does with 2PC and such. So, might need to investigate a bit before
things get that far.
Another potential complication is that we want optimistic conflict
detection (long-running txns), where the peak.storage model was
geared to DB-scale txns and pessimistic conflict detection. This
will need some more thought, but I don't think it's
un-possible. :) The existing model rolls back all changes in the
event of an abort, but in an optimistic model you really want to know
what changed and ideally merge states wherever possible, then make
note of what's in conflict.
This might actually be doable in the persistence layer, without even
resorting to EIM-style update management. OTOH, it might be simpler
to put that sort of thing in the layer *below* the ORM, i.e. the
mapping strategies themselves.
Whew. A lot to do, and not a lot of time to do it in, and I'm now
out of time for today. More to come later.
More information about the PEAK
mailing list