[PEAK] Miscellaneous Trellis+GUI+Twisted+DB thoughts

Thu May 1 21:13:24 EDT 2008

So, now that the new API is basically in place (minus 
connect/disconnect, which I've got a partial version of in my 
checkout), I'm turning my attention to making use of the new features 
in practical apps, with GUIs, I/O, and databases.

Socket and other I/O with Twisted
---------------------------------

For I/O, I'd like it to be practical to implement internet protocols 
with Trellis components, but *without* the protocol implementation 
depending on any *actual* I/O.  My first whack at this is in the 
FreeSwytch library, which implements the protocol for talking to the 
FreeSwitch VoIP system's event server using Trellis components.  But 
the implementation of the code that hooks to Twisted is a bit crude 
and is entirely specific to FreeSwytch.

However, with a bit of cleanup, I think it would be possible to make 
a general-purpose library for connecting pure-Trellis protocol 
implementations to Twisted.  In fact, a good portion of that code 
would actually not be specific to Twisted, and could be used to 
connect to any sufficiently-capable I/O loop.

The non-specific abstractions I have in mind are basically extensions 
of the existing Pipe concept, to add the ability to detect whether 
the connection is live, to request reconnection or closing, and 
optionally, to have a "peer" associated with the object.

So, for a TCP connection, you would have an inbound pipe and and 
outbound pipe, for reading and writing, respectively.  And each would 
have an open/close status (since some sockets are half-closable, as 
used in HTTP).

This would be fairly easy to hook up to a Twisted "protocol" object 
as a bridge.  But, you still have to connect it to your application's 
protocol object(s), or vice versa.  (This is where the awkwardness in 
FreeSwytch comes in.)

After giving it some thought, though, it seems to me that the issues 
of connecting an application protocol object to either a client or 
server connection can be solved by the simple expedient of having 
connect/listen APIs that accept a pipe for *new connections* should be fed to.

In other words, when calling Twisted's listenTCP() or connectTCP(), 
you would pass in a protocol-factory wrapper around a Pipe instance 
belonging to your application, and have a @maintain rule that reads 
connection(s) from the Pipe and creates your actual application 
protocol objects.  That way, whatever natural factory exists in your 
application isn't dependent on Twisted, and can be tested by feeding 
it dummy connection objects.

(Actually - they're not dummy connection objects, because the 
connection objects themselves will be abstract/inert, so they are 
perfectly usable for testing by themselves.  You just write test code 
that puts data into the pipes, or checks the data coming out.)

So, we will need a couple of "connection" components that are 
general-purpose, and a couple of Twisted-specific adapter objects to 
turn a pipe into a Twisted protocol factory, and a connection object 
into a Twisted protocol.  The protocol factory adapter would create a 
connection and then use the other adapter to turn the connection into 
a protocol.  Voila!

It probably won't be *quite* that simple, since Twisted has various 
kinds of protocols and factories, for handling subprocess 
communications, Unix sockets, datagrams, etc.  It's unlikely that 
I'll implement all of them right away.  AFAIK, Chandler only needs 
the TCP variants, though for some of my hobby projects (such as a 
TiVo protocols library), I'll need UDP and subprocess comms too.

At this point, I don't have a detailed design for the connection 
objects.  I plan to look over what Twisted's doing, as well as 
Chandler and my own projects, and nail down the details soon.  (I do 
have a neat code name for the bridging library, though: "vine", as in 
a thing that's Twisted but grows on a Trellis.  :) )

GUI Layout
----------

So, it's been a long time since I thought much about how to do GUI 
layout with Trellis, long enough that the Trellis's capabilities have 
completely changed the picture...  no pun intended.  Back then, one 
of the hardest parts of doing grid-like form layouts and other fancy 
dynamic layouts with the Trellis was the up-and-down nature of 
calculations, where a column's width depends on the width of its 
contents, but the contents' *actual* width might depend on the width 
of the column.

More specifically, the issue was that an object would need to know 
its position relative to its parent, which meant adding an object to 
a layout required updating both the parent and the child to know 
about each other.  This is a PITA when constructing objects in code, 
as anyone who's built a wx layout by hand will know.  :)

Specifically, you can't just create a parent and pass in a list of 
children all in one giant expression, because the children have to be 
created *already knowing* who their parent is.

Now, the simple layout and styling system I created a few months ago 
can work around this to some extent, but it entirely relies on the 
underlying GUI library to manage the actual layout, which makes 
testing a bit more difficult.

However, that old problem is now solved, because a component that 
wants to position its children can simply do so in a @maintain rule 
that explicitly sets its children's positions.  For example, you 
could have a "column" component that sets all its children's left 
positions and maximum widths, based on its own position and maximum 
width, and a "columns" component that sets the left position and 
maximum widths of its contained columns, based on their requested 
widths and the available width of the "columns" object itself.

It's actually a bit more complex than I've described, in that spans 
have to be dealt with for purposes of width 
calculation.  Essentially, each column's position is defined by the 
maximum value of a set of constraints, where the principal constraint 
is defined by the previous column's position plus its width.

More generally, you could say that an entire layout consists of 
nothing but taking the minimums or maximums of various combined 
constraints defined by the contents of the layout.  And, as long as 
those constraints don't change a lot, it's pretty efficient.  I 
wouldn't want to use it to lay out arbitrarily large tables, but for 
form layouts it should be more than sufficient.

Layout mechanisms could also benefit from better methods of handling 
aggregate functions in the Trellis.  For example, being able to have 
dynamically updated min(), max(), sum(), etc. of collections, that 
don't require looping over the entire collection when a single value 
is changed.  That will probably come when I get to working on collection APIs.

It's probably a bit early to worry much about layout though, so this 
part is on the back burner, relative to Twisted, GUI events, and collections.

GUI Events
----------

I've been surveying Chandler GUI code for its usage of wx events, and 
examining how such code could be simplified and made more testable by 
allowing event-based values to be queried.  That is, if for example 
you could use rules saying things like 'if mouseover(self)', instead 
of having to explicitly bind mouse enter/leave events to update 
methods that then modify a variable for some other code to read.

So far, I haven't come up with any particularly brilliant way to 
spell this, although just the ability to do this with wx-specific 
events and capabilities would be nice in many places.  My main 
concern for the API design is that I'd like to allow for the kind of 
separation that the socket connection stuff does.  That is, much of 
what a "controller" does (in the original MVC sense of controller) is 
independent of the GUI technology used, and could be tested 
independently.  (i.e., things like mouseovers, doubleclicks, dragging, etc.)

And I rather like the idea of being able to maybe test a component by 
saying something like someController.clickAt(x, y) and find out what 
all it would do to its GUI, somewhat the way I can test TCP protocols 
without needing to connect them to Twisted.  It wouldn't prevent GUI 
system bugs or quirks, of course, but once you found those bugs you 
could incorporate the workarounds into your testing.

The main problem with the idea is that the communication channel 
between app and GUI is a lot wider than that between two TCP 
sockets!  There's the mouse, keyboard, and the current focus, not to 
mention all the possible drawing operations going the other 
way.  These factors seem to make it hard to have any generic notion 
of such communication channels, the way it should be possible to do 
with sockets.

So, in the short run, I expect I will just stick to making it 
possible to easily monitor wx events (and event-driven values) in 
Trellis rules.  A more sophisticated or more-generic alternative can 
wait until we get some experience with the first round of 
simplification.  Unfortunately, this may not provide any testability 
benefits at first, but it's a walk vs. crawl thing.

Collections and Databases
-------------------------

After doing a good bit of work to get a "hello world" version of 
using the SQLAlchemy ORM with the Trellis, I have something that 
*appears* to work, but in actuality I have no real way to know if it 
will actually work correctly in a real application.  This is largely 
due to the complex and unknown interactions between SQLAlchemy's 
not-so-well-defined event and lifecycle model on the one hand, and 
the Trellis' need for undoability and non-conflicting writes on the other.

I have no idea even how to begin *testing* it, and that bothers 
me.  SQLAlchemy seems like a nice way to access databases in a 
(semi-)relational way, or to do an ORM on some non-Trellised data 
structures.  In fact, I'm looking forward to actually trying to use 
SQLAlchemy's ORM on other projects of mine, where lack of 
Trellis-ness won't be a problem.  (E.g. command-line utilities and 
"standard" web apps.)  But for mixing with the Trellis, it doesn't 
look like such a good idea after all.

Even if the base mapping were to work well (and if it does, it's 
essentially accidental at this point), I haven't even tried to 
integrate collections yet, and this is where all the interesting bits 
for Trellis applications are likely to be.  It's all very well and 
good that you can make your model objects viewable and editable in a 
GUI, but practical applications (especially Chandler) need to deal 
heavily with collections of objects that are being dynamically 
updated, and potentially need to be reflected in a GUI.

Chandler's use cases for collection updates actually have two 
different levels of update granularity, too.  Changes to the items in 
a collection need to be reflected immediately, while changes in 
collection *membership* may need to be deferred until a manual 
refresh or timeout, whether those changes are the result of 
background syncing operations or foreground changes made by the user.

As far as I can tell, SQLAlchemy's collections don't provide a way to 
handle these use cases that I can cleanly integrate with the 
Trellis.  In contrast, integrating "gated refresh" in the Trellis is 
straightforward, at least in principle.  (I.e., using @maintain rules 
that return their previous value unless a refresh event occcurs or 
it's the first run of the rule.)

So, what's needed to get a db-backed collections framework?  At the 
Trellis level, we now have the necessary raw materials to implement 
ORM...  ironically through one of the features that were added to 
support SQLAlchemy ORM.  (Specifically, the ability to test whether a 
cell's value was set or computed.)

This means that it's possible to create a DB-backed version of a 
Trellis component simply by replacing its cells with ones that lazily 
retrieve data from a database, along with rules that write data back 
when the cells are set.  These operations could probably be managed 
by a simple AddOn that holds the writeback rule(s) and caches any 
relevant records.

Or, alternatively, you could write mappers as AddOn classes, such 
that each mapper manages the state for some portion of the data.  For 
example, an object with data in more than one record type/table could 
have an add-on for each, tracking whether the associated cells need a 
write-back.

(Nah, that's too complicated.  Even if you need per-mapper state, 
there's no reason to have multiple add-ons; they can just be objects 
managed by a single add-on.)

Anyway, the resulting model would probably look a lot like the 
peak.storage.data_managers module in the old PEAK core, except that 
model objects won't subclass Persistent, and it will be possible in 
principle for a single object to be managed by multiple DMs...  while 
also allowing a single DM to serve as a cache and state manager for 
multiple object types, without subclassing.

That is, I think that we can actually implement persistence 
strategies as DM add-ons keyed by type, and persistence states for 
individual objects as add-ons keyed by DM.  Thus, you could have an 
object loaded from one DM, but saved to multiple DM's (e.g. for a 
save to file or "export" operation).  And, persistence strategy 
implementers wouldn't need to know as much about the inner workings of DMs.

The old QueryLink and ListProxy types from PEAK would be replaced of 
course with simple Trellis collection types, for the most 
part.  Since we'll be using lazy cells to create these objects only 
when needed, the special wrappers we used in the past won't be needed.

We will, however, need to solve a somewhat trickier problem, which is 
how to set both ends of a bidirectional link without incurring any 
strangeness when we switch back and forth between db-backed 
collections and pure in-memory ones.

One simple way to do this would be to have a "relationship manager" 
that allows you to communicate link/unlink operations between 
collections via pipes.  This RM would get commands like 
.link(fromkey, tokey) and .unlink(fromkey, tokey), then look up 
fromkey and tokey in a pair of caches to see if it has a cell or pipe 
for that key, and if so, writing the command and the opposing key to 
that cell or pipe.  Collections would simply ask for a cell or pipe 
that matches the key on their side of the relationship, and use it to 
automatically include in their adds or deletes.

As a consequence, if a collection isn't currently in use (due to lazy 
loading), then it won't receive any change events... just the way 
PEAK's old QueryLink class ignored updates for not-yet-loaded collections.

Oh, and of course these special bidirectional collection types will 
need to send link/unlink commands to the relationship manager based 
on whatever manual changes are made to their contents.  And they'll 
need to deal with conflicting updates, like explicitly deleting an 
item and implicitly add it at the "same time" (during the same recalc).

But, it will be a nicely symmetrical solution, in that these biref 
types and the relationship manager will be usable for in-memory 
operations just as easily as for DB-backed ones.  DB-backed ones may 
be a bit more specialized, however, in that they may use the RM 
directly to know about what to insert/update in the DB.  Actually, 
it's more likely that the only specialization will be needed for 
collections whose expected size makes them impractical to be loaded 
all-at-once, even lazily (as opposed to incrementally).  For the most 
part, we should be able to use the same "real" biref collection types 
for most anything.

Type mappings, conversion, and validation for the DB can probably be 
handled by specialized cells.  That is, since we'll be specializing 
the lazy-load-collection cells to some extent, we can possibly use 
them to convert collections on assignment to biref-capable 
versions.  On the other hand, it might make more sense (and be easier 
to implement) if we simply make biref-requiring collection attributes 
read-only, and use slice updates (e.g. foo.bar[:] = baz ).

Of course, we'll also have to implement something like PEAK's 
transaction system.  Fortunately, it's not that complex, if we use a 
Contextual service in place of PEAK's configuration 
hierarchy.  However, if we're going to use SQLAlchemy in place of 
peak.storage.Connections and peak.storage.SQL, we'll have to factor 
SA's transaction model into the picture, and I'm not sure what it 
does with 2PC and such.  So, might need to investigate a bit before 
things get that far.

Another potential complication is that we want optimistic conflict 
detection (long-running txns), where the peak.storage model was 
geared to DB-scale txns and pessimistic conflict detection.  This 
will need some more thought, but I don't think it's 
un-possible.  :)  The existing model rolls back all changes in the 
event of an abort, but in an optimistic model you really want to know 
what changed and ideally merge states wherever possible, then make 
note of what's in conflict.

This might actually be doable in the persistence layer, without even 
resorting to EIM-style update management.  OTOH, it might be simpler 
to put that sort of thing in the layer *below* the ORM, i.e. the 
mapping strategies themselves.

Whew.  A lot to do, and not a lot of time to do it in, and I'm now 
out of time for today.  More to come later.