[PEAK] PEAK-Rules indexing performance

Fri Jan 9 11:27:54 EST 2009

At 01:03 PM 1/9/2009 +0100, Alberto Valverde wrote:
>Hi,
>
>We were struggling trying to optimize the bad performance (well, 
>horrible: about 4 seconds just to build the tree the first time a gf 
>was called with a different type signature) of a function heavily 
>extended with peak.rules. Finally, an "intuition driven" 
>optimization [1] hit the nail and performance improved 
>*dramatically* (to < 0.1 s). This is the magic changeset:
>
>http://hg.python-rum.org/tw.rum-exp/rev/1446664f1423
>
>Basically, some rules which were checking for set inclusion of an 
>argument in a set bound to "self" were changed so the inclusion 
>check was made against a list "inlined" into the same rule. Note 
>that the performance improvement is not at runtime, the performance 
>here is fine, but during the indexed tree building process when a gf 
>is first called. The profiler reveals many, many calls to "implies" 
>and "overrides". More details, wild guesses  and profiler output are here:
>
>http://python-rum.org/ticket/53
>
>The thing is that we'd like to further optimize if possible 
>following the same pattern or at least do it in a cleaner way. 
>However, not knowing the real reasons doesn't make us feel too 
>comfortable... So, please, can some light be shed over the reasons 
>of this magical improvement?

The specific thing going on with your optimization is that your 
original code was doing a lot of unindexable comparisons.  When you 
have an operation like 'x in y', and 'y' is not a registration-time 
constant, it can't be indexed as a set of ORed == operations.  It has 
to be indexed as 'x in y: true' instead of a series of x==1: true, 
x==2: true, operations.

Why this results in such a huge performance tank, I have no 
idea.  But you *can* do the optimization more cleanly.  It's not 
necessary to stringify the constants, they can just be module-global 
values or even function-local values.  (e.g. register_widget() in 
your patch doesn't need to stringify 'actions' to get the benefit.)

They just have to be reachable in the declarer's namespace at the 
time of the declaration, and the sequence will be treated as a 
constant and unfolded into a series of ORed == operations.

(For that matter, you could use SomeClass.attribute, and it would 
still be unfolded at compile time -- whether SomeClass is 
module-global or function-local at the declaration point.  It just 
can't be an expression based on an *argument* to the target function.)

As far as performance goes, the Chambers-and-Chen tree building 
algorithm *is* subject to exponential blowup, but the way PEAK-Rules 
does it, it should be amortized over a larger number of 
calls.  PEAK-Rules builds dispatch trees lazily, it doesn't build 
them all-at-once.

So what I suspect is happening, is actually expensive creation of 
leaf nodes, *as different combinations of classes and actions are 
invoked*.  Looking at your profile, I see 13.7 seconds (?) spent in 
build_leaf(), and build_leaf() is what constructs a combined method 
for the final call operation.

Aha... yes, I see where the blowup would be 2**sequences...  that is, 
you have sequences like actions, new_actions, and 
input_actions...  which I would guess then a 2**3 == 8X multiplier on 
the number of required leaf nodes.

You see, when the tree is optimized, what you have is a dispatch 
table that looks like:

        root[class][action] == method

That is, the root has an entry for each target class, and then that 
entry has entries for each action, which then leads directly to a 
target method (possibly a combined method).

When it's *not* optimized, however, it looks more like *this*:

        root[class][action][action in self.new_actions][action in 
self.edit_actions][action in self.input_actions] == method

Which is a crap-load more nodes to build out, a lot of duplication, 
and extra runtime overhead, because all three of those sequence 
lookups have to happen on *every* call (due to being dynamically 
valued by way of 'self').  So 8X as many nodes to build, plus three 
extra levels of dispatch -- and three extra indexes to prioritize in 
best_expr()! -- probably accounts for all your performance loss.  And 
every additional condition you added of this nature would've halved 
your speed again.

This 2X penalty applies whenever you use a criterion that's opaque to 
optimization.  It's rather like trying to do a join or select on a 
function in a database without any ability to index functions: it 
forces a table scan, so to speak.  PEAK-Rules tries to confine the 
effect by ensuring the test is only applied where it's applicable, so 
if there are only a few types to which you're applying the 
conditions, and the conditions *depend on each other*, it can prevent 
the blowup.  But in your case, the various 'action in' tests are 
*independent*, so it can't trim the tree that way.

That is, if you have a pair of tests like  'A in B and A in C' and 'A 
not in B and A in D', then PEAK-Rules can skip testing 'A in C' if 'A 
in B' comes up false.  So instead of a tree like:

        root[A in B][A in C][A in D]

You get a tree where the "true" branch of 'A in B' goes to an 'A in 
C' test, and where the false branch goes to an 'A in D' test.  But in 
your case, that optimization wasn't possible because PEAK-Rules 
doesn't know the implication relationships between the sequences.

So, if there is a clear implication relationship of this sort, you 
could explicitly write it into your rules (e.g. 'action not in 
self.input_actions and action in self.new_actions') and that would 
mitigate the blowup.  However, I suspect from the code I see in your 
patch that this isn't really possible, because there aren't any 
mutually exclusive sequences in the code I saw.

In the long run, I'd like to handle the sort of use case you're doing 
by allowing "predicate functions" to exist; then you could define a 
GF like 'is_edit_action(ob, action)', and thus be able to redefine 
the edit actions for a given class, but still get the optimization 
benefits.  The idea is that the predicate function would be expanded 
into its full logical form inside the tree building of the outer function.

I have some notes on how to do this, but it requires a refactoring of 
certain key logical functions in the core, to allow subscribing to 
callbacks to recompute a logical expression.  Otherwise, when you 
added a new rule to 'is_edit_action' (or whatever), the GF's using 
that function in their rules wouldn't get updated.