[PEAK] PEAK-Rules progress & plans
Phillip J. Eby
pje at telecommunity.com
Sat Jun 17 20:00:47 EDT 2006
At 01:28 AM 6/1/2006 -0400, Phillip J. Eby wrote:
>It's too bad Python bytecode doesn't have any kind of "computed jump"
>feature, otherwise such tables could be built right into the bytecode.
Actually, it turns out that Python bytecode *does* have a feature that can
be abused to perform repeated jumps:
http://mail.python.org/pipermail/python-dev/2006-June/066115.html
The demo code, using the latest version of my BytecodeAssembler package, is
here:
http://peak.telecommunity.com/DevCenter/BytecodeAssembler#demo-computed-goto-switch-statement
Unfortunately, the technique is not currently compatible with Psyco or
PyPy, although Armin Rigo suggests that Psyco might be fixable to support
the trick:
http://mail.python.org/pipermail/python-dev/2006-June/066133.html
For PyPy, it would probably be better to add a JUMP_TOP opcode or something
similar, although that would require getting approval for some kind of
"switch" statement in Python 2.6.
I would hope, however, that the trick can be accomodated in Psyco for older
versions of Python, as it would essentially allow a generic function's
dispatch table to be JIT-compiled.
Anyway, this development means that I don't really need to worry about
doing either recursive function calls or binary search trees for dispatch
nodes that are best expressed as N-way lookups in a hash table.
As a result, I can thus focus my attention on other aspects of the dispatch
algorithm, such as choosing expressions to dispatch on:
>The main parts remaining that I'd like to optimize have to do with the
>laziness of index generation and especially the computation of the "best"
>index for dispatching. RuleDispatch goes through a lot of convolutions to
>optimize the indices so that they can quickly compute their "goodness" for
>an arbitrary collection of active rules. However, I wonder if it wouldn't
>be easier and/or simpler to just recompute indexes on each subtree
>buildout, especially if the indexes' "goodness" measurements were
>conducted using a sampling approach.
After thinking about this some more, it seems to me that recomputing the
indexes on sub-dispatches doesn't make sense, as it's rather more complex
than just updating the indexes eagerly the way RuleDispatch does.
By the same token, I'm wondering how much less than perfectly optimal a
dispatch tree can be, before the dispatching cost exceeds the cost of
building a more efficient tree. RuleDispatch recomputes the "goodness" of
each index as a candidate for dispatching at *every* dispatch node, and
even using the wacky indexing techniques that I came up with, this is an
O(n*m) operation, where n is the number of cases remaining at that node,
and m is the number of possible dispatch candidates. The incremental cost
of building out those nodes is thus fairly high, albeit very lazy.
However, I'm not sure it's necessary to be so particular about which
indexes are used, because small differences in the "goodness" of indexes
aren't all that important once you get past the first lookup.
Also, the more I've thought about the actual use cases, it seems to me that
you can divide dispatch subtrees into one of two types: "wide" and
"deep". A wide subtree is one that has some indexes which are dramatically
better than others for that point in the dispatch. For example, a generic
function that dispatches on twenty classes in one argument position, and
then has a bunch of other odds and ends tested elsewhere, is "wide" because
that one index is way ahead of anything else.
A "deep" subtree, on the other hand, has lots of possible choices for what
to test on, and none of them are significantly better than the others. The
same generic function that dispatches on twenty classes, once you've picked
*which* class, will then not anything very obviously good to switch
on. Such subtrees are probably more suited to writing out a simple if-then
tree, rather than trying to do any kind of indexing, as the indexing will
probably cost more.
The children of a "wide" subtree may be either wide or deep, but the
children of a deep subtree are also deep. This characteristic suggests
that the best overall way to implement a generic function dispatch tree is
to have one level of indexed "wide" dispatching that does all of the "good"
lookups to find a child function that it then calls to do the "deep"
dispatching, if any.
This top-level dispatching can be made "wider" by using combined indexes --
for example, the PEAK-Rules core dispatcher currently works by doing a
single "wide" lookup on the types of all its arguments against a
cache. This approach could be extended to any cacheable lookup keys, and
it's also something that could be explicitly hinted by the programmer for
performance if necessary.
Identifying whether a particular subtree is wide or deep is
interesting. When the best available dispatching index has two possible
branches, it's quite clear that you've reached a "deep"
situation. However, it's probably better to look at it in terms of the
average number of remaining cases on all branches of the subtree. The
nearer this average is to 1, the "wider" the subtree/index. The nearer it
is to total the number of remaining cases, the "deeper" it is. And if it
is around N/2, then it is definitely deep.
If we look at this once, at the root of the dispatch tree, we can simply
walk down the indexes in order of this goodness value, and simply pick a
threshhold at which we make a division between wide indexes and deep ones,
and eagerly create high-level lookup tables for the wide indexes, and
lazily create functions that encode the corresponding "deep" subtrees.
Up to some maximum number of methods, it probably also makes sense to allow
including a "wide" dispatch in a single function. If a function has 10
methods and one wide index, it probably doesn't make sense to split the
dispatch into two levels. As an index's absolute shallowness approaches 1,
it makes more sense to just go ahead and code out its subtrees in-line,
even if it is fairly wide.
So, to clarify my terms a bit, let's say that an index's "depth" is the
average number of cases that will be remaining as possibilities once a
lookup has been done. If this depth is 1 (or less, although it can't
actually fall to zero), then the index is as shallow as it can possibly
be. If this depth is N/2 or greater, then the index is extremely deep.
Now, if we also measure how "wide" an index is by how many *distinct*
choices it has for branching, then we can determine its "area" by
multiplying the two. This "area" tells us roughly how large a generated
function would be. If an index is wide, but its total area is too large to
put into one function, we know we should create a top-level function that
lazily expands subtrees as child functions. (Note: by "distinct", I mean
how many distinct subtrees can occur, regardless of how many keys might
lead to a given subtree.)
The top-level function might include more than one level of dispatch,
however. If the next deeper index is close in "area" to the first, it may
make sense to make the top-level dispatch be multidimensional, as is always
done for the peak.rules.core dispatcher. This increases the effective
"width" (by multiplying the widths of the individual indexes) and decreases
the effective "depth" (by having fewer choices left on each possible branch).
So, to sum up the heuristics, it seems like we could simply look at all
candidate indexes for the first dispatch, ordered by increasing
depth. Create an N-dimensional dispatch index by taking each index whose
area is "too large", and spin those off into a top-level lookup
function. If there are no such indexes, then just write out a single,
fully expanded dispatch tree.
Structuring the heuristics this way means that I can treat this "leveling"
process as a feature to be added later. I can at first just implement the
"write it all out as bytecode" version and leave it at that. Only when the
functions get large enough in their total size will adding this "leveling"
optimization be necessary.
Interestingly, however, there doesn't seem to be any way to eliminate the
necessity of re-determining the index's depth at each new subtree, even in
the "write it all out as bytecode" case. However, now that I've gone
through this analysis, I understand better why the costs of determining
those depths aren't likely to be a problem in practice. If you're in a
subtree with a relatively deep indexes, the chances are good that most of
them aren't eligible candidate indexes anyway, due to other conditions
being prerequisites.
For example, pattern matching and parsing usually have deep conditions like
"object is of type x and attribute y of the object is type z", where you
can't test the second condition until the first is proven true. So, you
are likely only inspecting the "goodness" of a small number of candidate
conditions to begin with, and as long as the process of knowing whether an
index is eligible is fast, then you won't be checking out many indexes'
applicability. So the best bet for performance at this level is not going
to be so much in making it fast to find an index's average depth, but in
making sure that it's efficient to ignore indexes that aren't
eligible. RuleDispatch's current approach could perhaps use some small
improvements in that respect.
Whew. I think that's about all my brain can handle for today. :) It's
beginning to look like RuleDispatch is pretty close to as algorithmically
optimal as it's possible to be. Where it's not as optimal is in the
constant factors, like the speed of computing expressions and branching,
and those will be easily surpassed by bytecode generation in
PEAK-Rules. On the plus side, this means I'll be taking more from
RuleDispatch's indexing and tree building algorithms than I thought. On
the minus side, I don't see any simplifications taking place there, except
for those that come from having simple generic functions available for the
implementation, and from having a simpler way to encode expressions
(courtesy of BytecodeAssembler).
More information about the PEAK
mailing list