[PEAK] PEAK-Rules indexing performance
Phillip J. Eby
pje at telecommunity.com
Fri Jan 9 11:27:54 EST 2009
At 01:03 PM 1/9/2009 +0100, Alberto Valverde wrote:
>Hi,
>
>We were struggling trying to optimize the bad performance (well,
>horrible: about 4 seconds just to build the tree the first time a gf
>was called with a different type signature) of a function heavily
>extended with peak.rules. Finally, an "intuition driven"
>optimization [1] hit the nail and performance improved
>*dramatically* (to < 0.1 s). This is the magic changeset:
>
>http://hg.python-rum.org/tw.rum-exp/rev/1446664f1423
>
>Basically, some rules which were checking for set inclusion of an
>argument in a set bound to "self" were changed so the inclusion
>check was made against a list "inlined" into the same rule. Note
>that the performance improvement is not at runtime, the performance
>here is fine, but during the indexed tree building process when a gf
>is first called. The profiler reveals many, many calls to "implies"
>and "overrides". More details, wild guesses and profiler output are here:
>
>http://python-rum.org/ticket/53
>
>The thing is that we'd like to further optimize if possible
>following the same pattern or at least do it in a cleaner way.
>However, not knowing the real reasons doesn't make us feel too
>comfortable... So, please, can some light be shed over the reasons
>of this magical improvement?
The specific thing going on with your optimization is that your
original code was doing a lot of unindexable comparisons. When you
have an operation like 'x in y', and 'y' is not a registration-time
constant, it can't be indexed as a set of ORed == operations. It has
to be indexed as 'x in y: true' instead of a series of x==1: true,
x==2: true, operations.
Why this results in such a huge performance tank, I have no
idea. But you *can* do the optimization more cleanly. It's not
necessary to stringify the constants, they can just be module-global
values or even function-local values. (e.g. register_widget() in
your patch doesn't need to stringify 'actions' to get the benefit.)
They just have to be reachable in the declarer's namespace at the
time of the declaration, and the sequence will be treated as a
constant and unfolded into a series of ORed == operations.
(For that matter, you could use SomeClass.attribute, and it would
still be unfolded at compile time -- whether SomeClass is
module-global or function-local at the declaration point. It just
can't be an expression based on an *argument* to the target function.)
As far as performance goes, the Chambers-and-Chen tree building
algorithm *is* subject to exponential blowup, but the way PEAK-Rules
does it, it should be amortized over a larger number of
calls. PEAK-Rules builds dispatch trees lazily, it doesn't build
them all-at-once.
So what I suspect is happening, is actually expensive creation of
leaf nodes, *as different combinations of classes and actions are
invoked*. Looking at your profile, I see 13.7 seconds (?) spent in
build_leaf(), and build_leaf() is what constructs a combined method
for the final call operation.
Aha... yes, I see where the blowup would be 2**sequences... that is,
you have sequences like actions, new_actions, and
input_actions... which I would guess then a 2**3 == 8X multiplier on
the number of required leaf nodes.
You see, when the tree is optimized, what you have is a dispatch
table that looks like:
root[class][action] == method
That is, the root has an entry for each target class, and then that
entry has entries for each action, which then leads directly to a
target method (possibly a combined method).
When it's *not* optimized, however, it looks more like *this*:
root[class][action][action in self.new_actions][action in
self.edit_actions][action in self.input_actions] == method
Which is a crap-load more nodes to build out, a lot of duplication,
and extra runtime overhead, because all three of those sequence
lookups have to happen on *every* call (due to being dynamically
valued by way of 'self'). So 8X as many nodes to build, plus three
extra levels of dispatch -- and three extra indexes to prioritize in
best_expr()! -- probably accounts for all your performance loss. And
every additional condition you added of this nature would've halved
your speed again.
This 2X penalty applies whenever you use a criterion that's opaque to
optimization. It's rather like trying to do a join or select on a
function in a database without any ability to index functions: it
forces a table scan, so to speak. PEAK-Rules tries to confine the
effect by ensuring the test is only applied where it's applicable, so
if there are only a few types to which you're applying the
conditions, and the conditions *depend on each other*, it can prevent
the blowup. But in your case, the various 'action in' tests are
*independent*, so it can't trim the tree that way.
That is, if you have a pair of tests like 'A in B and A in C' and 'A
not in B and A in D', then PEAK-Rules can skip testing 'A in C' if 'A
in B' comes up false. So instead of a tree like:
root[A in B][A in C][A in D]
You get a tree where the "true" branch of 'A in B' goes to an 'A in
C' test, and where the false branch goes to an 'A in D' test. But in
your case, that optimization wasn't possible because PEAK-Rules
doesn't know the implication relationships between the sequences.
So, if there is a clear implication relationship of this sort, you
could explicitly write it into your rules (e.g. 'action not in
self.input_actions and action in self.new_actions') and that would
mitigate the blowup. However, I suspect from the code I see in your
patch that this isn't really possible, because there aren't any
mutually exclusive sequences in the code I saw.
In the long run, I'd like to handle the sort of use case you're doing
by allowing "predicate functions" to exist; then you could define a
GF like 'is_edit_action(ob, action)', and thus be able to redefine
the edit actions for a given class, but still get the optimization
benefits. The idea is that the predicate function would be expanded
into its full logical form inside the tree building of the outer function.
I have some notes on how to do this, but it requires a refactoring of
certain key logical functions in the core, to allow subscribing to
callbacks to recompute a logical expression. Otherwise, when you
added a new rule to 'is_edit_action' (or whatever), the GF's using
that function in their rules wouldn't get updated.
More information about the PEAK
mailing list