[PEAK] Possible performance enhancements for the Trellis

Mon Aug 6 13:46:02 EDT 2007

I've been thinking a bit about ways to further enhance Trellis 
calculation performance, and have thought of two things that could 
perhaps be added to take advantage of the fact that most Trellis 
rules are not recalculated in a given pass, and the fact that most 
Trellis rules have a very stable access pattern, in terms of which 
cells they access.

The current algorithm schedules cells to be potentially recalculated, 
and when a cell's value is requested, it looks "down" the dependency 
hierarchy to see whether any of its dependencies have changed.

But, if we do things in a slightly different way, a cell should be 
able to tell in most cases whether it's up-to-date without looking at 
*any* of its dependencies.

Let's suppose that instead of tracking a "version" (what pulse the 
cell is up-to-date as of) and "changed_as_of" (what pulse the cell 
changed as of), we simply track whether a cell is "dirty" (had a 
dependency changed) and what "layer" it is in.

A cell is initially in layer zero, and its layer is reset to zero any 
time a value is explicitly written to it.  When a cell's rule is 
calculated, its layer is increased until it is greater than the 
highest layer of any of its dependencies.  So, a rule that reads only 
layer zero cells will have a layer of one.  If it later reads a cell 
from layer 2, its layer will become 3.  After that, however, reading 
more layer zero cells will not drop its layer to 1; it will remain a 
layer-3 cell.  (The idea here is to minimize the amount of 
layer-changing a cell does in its lifetime.)

When a cell's value changes, it sets the dirty flag of its observers, 
and adds them to a priority heap (instead of the current simple 
list), ordered by layer.  The cleanup algorithm then runs in priority 
order, so that recalculation is done breadth-first through the layers.

Thus, a cell knows that:

1. It is definitely dirty if its flag is set; it should recalculate 
itself, notify its observers if needed, and then clear its dirty flag

2. It is definitely NOT dirty if its flag is not set, and the *first 
item in the priority heap is on the same or higher layer* (or the 
heap is empty).  Since any dependency of a cell is always on a lower 
layer than that of the observer, then if there are no such cells on 
the queue, then all of this cell's dependencies already had a chance 
to mark this cell dirty -- and didn't.  Therefore, the cell is clean, 
and doesn't need to make any changes.  (The current algorithm has to 
set the _version attribute equal to the current pulse to mark itself clean.)

3. It is *maybe* dirty if its flag is not set, and the first item on 
the priority heap is on a lower layer than the current cell's 
layer.  In this case, we must recursively check the dependencies, 
just as in the current algorithm.  Except that we only need to 
recursively check dependencies that don't fall into one of the 
preceding two categories, so there are fewer branches and shallower 
tests.  In addition, it should be very rare that the recursion is 
required, since the only time this condition can occur is if a cell 
rule requests a value that was not previously a dependency.  (Because 
if it was previously a dependency, then the current cell would have a 
higher layer number, and one of the other two conditions would have applied.)

I think this approach will be a strong win for performance in large 
Trellises, as the current algorithm *always* does a recursive 
dependency scan when a rule reads a value that hasn't yet been 
scanned for changes in the current pulse.  That means that a lot of 
cells get looked at -- almost as many as if you were actually 
recalculating everything.  This new approach would minimize pointless 
scanning, and possibly open the doorway for a better approach to 
"hub" cells that receive data from a large number of child cells (but 
which you don't want to have looping over a whole bunch of inputs as 
dependencies).

The second way to take advantage of a Trellis' stability to improve 
performance is the fact that most rules will always look at the same 
cells over and over.  But the current algorithm builds a new list of 
dependencies (and weak observer links) every time there's a 
recalc.  As a side effect, this also means that various objects (such 
as weakrefs and a "Mailbox" object) get created and discarded for each recalc.

However, if the dependencies are the *same*, then there is no point 
in changing anything -- the existing dependencies could just be left 
alone.  In addition, the code that creates observer links could just 
record the dependency links, and the recalc code could set up the 
backlinks, deleting ones that are no longer needed, and adding the 
ones that are new.  So only the bare minimum number of weakrefs would 
be created, and the elaborate current system of doubly-indirected 
weakrefs could go away -- eliminating the need to have a separate 
Mailbox object as a weakref target, and thereby saving both time and space.

Neither of these algorithm and data structure changes would affect 
the public API, so I'm not sure if I'll bother with them for the 
0.5b1 release, unless I end up reconsidering the way the Trellis 
makes "future" changes.

Specifically, I came up with the idea of the priority heap while 
thinking about ways to build a solid mechanism for TrellisDB to keep 
track of changes to domain objects and map them back to a 
record-oriented model.

I'm beginning to realize that the Trellis' current model for managing 
writes in rules (i.e., writing to a cell causes a new time pulse to 
occur) is a bit more limited than I'd like it to be.  The advantage 
of a write-oriented model is that it handles small changes to large 
datasets better than the current trellis does.  However, you can get 
similar effects by using a good "hub".

Consider a database service that maintains the mapping between domain 
objects and their corresponding records.  When a domain object 
changes, you want the service to know what record changes will 
correspond, and in turn the actual record-based services need to 
update the query sets that contain the before and after versions of 
those records.

The "normal" way to do this is that you'd write code to update the 
record sets when a change occurs, but It's Not The Trellis 
Way(TM).  The Trellis way to do this would be that you have a rule 
that loops over the domain objects in memory to create a record-level 
diff of all the current changes.  But that loop is very inefficient, 
and quickly gets more so.

Enter the Hub.  I've talked about this idea before, and ended up 
discarding it in favor of the todo/future approach that's used in the 
current SVN version (and isn't yet fully documented in the 
tutorial).  However, I'd forgotten about the TrellisDB use cases 
while I was working on that.  (The current stuff is still useful, 
just not as much so for TrellisDB.)

So, I'm thinking that a Hub cell will be something like a dictionary 
of cells, with an output cell that's a dictionary.  You add cells to 
the hub, and it registers them as dependencies of the output 
cell.  But instead of it actually looping over those cells, it just 
takes the notifications it gets, and uses them to assemble a 
dictionary (or set, or list, or whatever) that contains the values of 
the changed cells.  You could then have rules that use the changes to 
do things...  like updating databases.

In a way, it's like writing callbacks to set values -- except that 
there isn't really any value setting -- it's just a more efficient 
way of doing the naive "loop over everything".

And of course, for this to work efficiently, it means that asking for 
the output of a hub shouldn't require looping over all its inputs, 
even under the hood.  Hence, the priority heap idea.  That way, as 
long as you update the hub's layer whenever you add an input cell to 
it, the hub will always live at a layer above its inputs, which means 
it should never need to *actually* loop over its inputs.

In practice, I suspect there may be occasional "devil in the details" 
exceptions to that idea.  More on this in another email, as I refine 
the priority heap algorithm a bit.  The main remaining hurdle is that 
it's difficult to clear junk from the heap when a layer inversion 
occurs -- that is, when a cell asks for a higher-layered value it's 
never used before.  If cell rules were required to be side-effect 
free (or at least able to handle a "Rescheduling" exception), then we 
could simply cancel the calculation of a layer-inverting rule, and 
reschedule it to the layer above the value it's requesting.

To avoid output rules being re-run, they'd need to be marked as 
outputs, and placed at the highest possible layer.  Depending upon an 
output rule would then be an error, and outputs would once again be 
guaranteed to see a fully consistent state.

In order for this to work, it would also be necessary to ban 
non-output cells from making modifications to other cells.  In other 
words, non-output rules would never be able to change anything.  A 
bit of experimenting with the current SVN version of the Trellis 
shows that only two very tricky test examples in Specification.txt 
fail this requirement, and they could easily be discarded.

With this being the case, I'm thinking that in 0.5b1 I will go ahead 
and implement an even stronger restriction: that *no* rule may make 
modifications to *anything*.  That way, if at a later point we go 
with a heap-based recalc algorithm that can retry/reschedule rules, 
it won't be necessary to go back and modify existing code.  That is, 
we'll already be used to the restrictions.

I suppose I should also allow you to mark rules as outputs in 0.5b1, 
even if it doesn't do anything special right now.  That way, the 
tutorial and examples will all be suitable for the 
re-entrant/non-recursive algorithm.

One final hurdle: generator/task rules.  Such rules *can't* be 
re-entered, so having a Reschedule exception is a non-starter.  That 
means tasks would have to always be treated as "output" rules, and 
thus can't be used as inputs to other rules.  I think this is 
probably an okay restriction, but I'll need to make some changes to 
the current implementation so that it doesn't actually produce any 
output values as a side effect of yielding.

I think that this limitation on tasks will be okay, because I was 
already finding that the value-yielding aspect of tasks was going to 
be difficult to explain and even more difficult to use 
correctly.  Restricting tasks to being only useful for their 
side-effects (such as socket transmissions, database updates, etc.) 
fixes both of those problems.