[PEAK] Generic Function API enhancement ideas

Fri Dec 31 01:46:48 EST 2004

Convenience APIs for Generic Functions
======================================

As I've started playing around with using generic functions for things that 
involve more than type and value tests, it seems to me that there are some 
things that really need to be added to the current API.  For example, 
yesterday I saw some examples in C# of using test-driven development to 
refine an algorithm, and I decided to take a whack at simply transcribing 
the requirements directly into a generic function.  The original code 
computed bowling scores, and this is what I came up with:

     import dispatch

     [dispatch.generic()]
     def score(rolls,frame=1):
         """Return score for bowling game, given list of a player's rolls"""

     [score.when("frame<11 and rolls and rolls[0]==10")]
     def score_strike(rolls,frame):
         return sum(rolls[:3]) + score(rolls[1:],frame+1)

     [score.when("frame<11 and rolls and rolls[0]<10 and sum(rolls[:2])==10")]
     def score_spare(rolls,frame):
         return sum(rolls[:3]) + score(rolls[2:],frame+1)

     [score.when("frame<11 and rolls")]
     def score_open(rolls,frame):
         return sum(rolls[:2]) + score(rolls[2:],frame+1)

     [score.when("not rolls or frame==11")]
     def scoring_done(rolls,frame):
         return 0

     assert score([0]*20)==0
     assert score([3]*20)==60
     assert score([4,6,5,3]+[0]*16)==23
     assert score([10,5,3,2,1]+[0]*14)==29
     assert score([10]*20)==300
     assert score([10,4,6]*5+[10])==200

Pseudo-Arguments/"Macros"
-------------------------

As you can see, the code suffers quite a bit from repetition, as the 
"frame<11 and rolls" condition (or its inverse) appear repeatedly.  Here's 
what I'm thinking I'd like the rules to look like instead:

     [dispatch.generic(
          finished = "not rolls or frame>10",
          strike   = "rolls[0]==10",
          spare    = "not strike and sum(rolls[:2])==10"
     )]
     def score(rolls,frame=1):
         """Return score for bowling game, given list of a player's rolls"""

     [score.when("not finished and strike")]
     def score_strike(rolls,frame):
         return sum(rolls[:3]) + score(rolls[1:],frame+1)

     [score.when("not finished and spare")]
     def score_spare(rolls,frame):
         return sum(rolls[:3]) + score(rolls[2:],frame+1)

     [score.when("not finished and not strike and not spare")]
     def score_open(rolls,frame):
         return sum(rolls[:2]) + score(rolls[2:],frame+1)

     [score.when("finished")]
     def scoring_done(rolls,frame):
         return 0

In other words, I'd like to be able to create "pseudo-arguments" for a 
generic function to represent commonly-used intermediate values or 
conditions that the individual methods can then use in their rules.  This 
would help reduce repetition in complex rules like these, as well as allow 
useful abstractions.  I'm thinking that this could also be allowed on the 
'.when()' decorators as well, except that there the definition would be 
local to that individual rule.  So, you could initially introduce 
abstractions on an individual rule, and then move them up to the generic 
function if you used them often.

I'm thinking that there could maybe be a Macro facility, e.g.::

      finished = Macro("not rolls or frame>10")

And then you could use it in an expression if it was present in your 
'locals()'.  The idea is that a Macro found in an expression just gets 
replaced with its expansion, as though it had been textually included in 
the expression using it.

Expression Binding
------------------

A related (but different) idea is expression binding.  Consider this 
snippet from dispatch.predicates::

     [expressionSignature.when(
         # matches 'isinstance(expr,Const)'
         "expr in Call and expr.function==isinstance"
         " and len(expr.argexprs)==2 and expr.argexprs[1] in Const"
     )]
     def convertIsInstanceToClassTest(expr,test):
         typecheck = _tupleToOrTest(expr.argexprs[1].value)

         if not test.truth:
             typecheck = NotTest(typecheck)

         return Signature([(expr.argexprs[0],typecheck)])

Notice that this code computes 'expr.argexprs[1]' twice -- not a big deal 
for this routine, but for compiler-like systems these things get 
progressively more complicated to compute.  Part of the issue is that a 
pattern matching facility is needed, such that the above could say 
something like:

     [expressionSignature.when(
         "match(expr,Call(isinstance,inst,cls))",
         inst="var()", cls="var(Const)"
     )]

     def convertIsInstanceToClassTest(inst,cls,expr,test):
         typecheck = _tupleToOrTest(cls.value)

         if not test.truth:
             typecheck = NotTest(typecheck)

         return Signature([(inst,typecheck)])

There are a couple of different ideas I'm presenting here, that I don't 
necessarily know how to fit together.  One is the pattern matching part, 
the other is binding variables from a test expression and supplying them as 
extra arguments to the method.  For example, above I've added 'inst' and 
'cls' parameters to the method at the beginning of the method list.  (If 
the generic function were a method of a class, these parameters would 
appear *before* 'self'.)

The idea here is that any parameter names of a method that match macros 
that were defined in the 'when()' or the original 'generic()', then they 
will receive the value of those expressions as of the current invocation.

That idea is pretty straightforward, and I'm fairly sure I know how I'd 
implement it.  My main question is whether that particular API would be 
confusing.  Here's another (silly) example of this idea:

     [dispatch.generic(
          finished = "not rolls or frame>10",
          strike   = "rolls[0]==10",
          spare    = "not strike and sum(rolls[:2])==10",
          next_frame = "frame + 1"
     )]
     def score(rolls,frame=1):
         """Return score for bowling game, given list of a player's rolls"""

     [score.when("not finished and strike")]
     def score_strike(next_frame,rolls,frame):
         return sum(rolls[:3]) + score(rolls[1:],next_frame)

     [score.when("not finished and spare")]
     def score_spare(next_frame,rolls,frame):
         return sum(rolls[:3]) + score(rolls[2:], next_frame)

     [score.when("not finished and not strike and not spare")]
     def score_open(next_frame,rolls,frame):
         return sum(rolls[:2]) + score(rolls[2:], next_frame)

     [score.when("finished")]
     def scoring_done(rolls,frame):
         return 0

Here, for no particular reason, we've made 'next_frame' a macro and then 
added it as a parameter to the methods that use it.  As you can see, it's 
optional -- scoring_done doesn't change its signature from the base 
signature.  The problem I see with it is that it's hard to see what 
parameter(s) are the extra ones.  Maybe if they had to be a tuple...

     [expressionSignature.when(
         "match(expr,Call(isinstance,inst,cls))",
         inst="var()", cls="var(Const)"
     )]

     def convertIsInstanceToClassTest( (inst,cls), expr,test):
         typecheck = _tupleToOrTest(cls.value)

         if not test.truth:
             typecheck = NotTest(typecheck)

         return Signature([(inst,typecheck)])

That seems a bit better, in that the extra args get visually grouped 
together.  I think I could live with that.

Pattern Matching
----------------

Anyway, the second idea here is pattern matching, represented by the 
"match()" and "var()" stuff.  We want a way to traverse a data structure, 
applying tests to it, and saving values that we find in variables for later 
use.  Ideally, this should be done in such a way that we end up generating 
an identical test structure for the generic function, as if we had coded it 
out the long way.

In the Cecil language's dispatch system (which pioneered predicate 
dispatching), one would render something like the above 'when' clause as:

     method expressionSignature(expr at Call{function=f, argexprs=ae}, test)
       when test(f==isinstance) and test(len(ae)==2)
        and let(inst,cls) = ae and cls at Const {
         ...
     }

which is cleaner than our rendering in some respects, but not in 
others.  (What I wrote is actually a strange blend of Python and Cecil; in 
Cecil you'd actually say something more like:

     method expressionSignature(expr at Call{function at isinstance, 
argexprs=ae}, test)
       when and test(len(ae)==2) and let(inst,cls) = ae and cls at Const {
         ...
     }

Because Cecil is a prototype-based language and '@' is sort of like a cross 
between 'is' and 'isinstance()' in Python.

Anyway, mostly what typing all of this is telling me is that pattern 
matching needs a lot more thought.  :)  I want a natural, Pythonic syntax 
for expressing patterns and causing variables to be bound while applying 
tests and traversing structure.  And preferably, it should be syntactically 
valid Python, so I don't have to write another parser.  :)  Then, there 
needs to be some way to integrate it with bindings, even if it's 
implemented via something like 'matches = aPattern.match(value)', and the 
variables are then retrieved via 'matches.x', where 'x' is the assigned 
variable name.  How about this:

     isA(Call, function=isinstance,
               argexprs=(let(inst=Any),let(cls=isA(Const)))
     )

Ugh.  Lots of Incredibly Superficial Parentheses, there.  And that's with a 
lot of DWIM-ming that has to be implemented by the parser (for example, the 
sequence of keyword arguments *counts*).

Here's something a bit closer to what I'd *like* the syntax to look like:

     Instance(Call, lambda
         function=isinstance,
         argexprs=Sequence(var.inst[object], var.cls[Const]): None
     ).matches(expr)

This is pure Python, but obfuscated to say the least, using tricks like a 
special 'var' object to create new variables, and 'lambda' just to obtain 
an ordered list of name/value pairs!  It's also pretty DWIMmish, guessing 
that classes or types are instance checks, and other non-special objects 
are equality tests.

Or how about:

     expr in Call(isinstance, var.inst, var.cls[Const])

This one would only work with special support from 'Call', of course, so 
it's not useful for all-purpose pattern matching.  Hm.  What if you could 
do something like:

     def isCall(function, *argexprs):
         """Match a 'Call' instance"""
         return Pattern(
             # direct test(s)
             [Call],
             # (traversalSpec,subpattern)...
             [("it.function", function), ("it.argexprs", argexprs)]
         )

And then you could use it like this:

     [expressionSignature.when(
         "match(expr)",
         match = isCall(isinstance, var.inst, var.cls[Const])
         inst  = "match(expr).inst",
         cls   = "match(expr).cls"
     )]

Which is a bit verbose, but it's getting better.  The ideal would be more like:

     [expressionSignature.when(
         "expr in isCall(isinstance, var.inst, var.cls[Const])"
     )]

As long as there was some way for the variables to propagate up so that 
'when' could mark them for binding to arguments of the method.  I'm going 
to have to think about the implementation a bit though.  There's going to 
have to be a way for signatures and expressions to expose their "bound 
variables", or at the least for the AST processing to propagate the bound 
variable map upwards, which could change the signatures of parsing methods 
throughout the system.  BUT, it would mean that our final form of "expr in 
isCall(...)" would create a decision tree identical to the one created by 
the original, highly-verbose spelling.

Ordered Classifiers
-------------------

One last feature that seems useful is ordered classifiers; as an example:

     frame_type = Classification()

     finished = frame_type("not rolls or frame>10")
     strike   = frame_type("rolls[0]==10")
     spare    = frame_type("sum(rolls[:2])==10")
     open     = frame_type.default()

     [dispatch.generic()]
     def score(rolls,frame=1):
         """Return score for bowling game, given list of a player's rolls"""

     [score.when("strike")]
     def score_strike(rolls,frame):
         return sum(rolls[:3]) + score(rolls[1:], frame+1)

     [score.when("spare")]
     def score_spare(rolls,frame):
         return sum(rolls[:3]) + score(rolls[2:], frame+1)

     [score.when("open")]
     def score_open(rolls,frame):
         return sum(rolls[:2]) + score(rolls[2:], frame+1)

     [score.when("finished")]
     def scoring_done(rolls,frame):
         return 0

The idea here is that a Classification is an *ordered* set of conditions, 
that automatically ensure that a later-defined classification will only 
occur if the earlier patterns all fail.  Once the 'default()' method is 
invoked, no further patterns can be created for that classification.

Of course, this particular example might as well be a bunch of if-elif's at 
this point, but there are certainly situations where you want to create a 
classification like this that can be applied to many things, sometimes in 
non-obvious combinations.  To be really useful, a typical classification is 
going to focus on one argument; it'd then be used like a class or interface 
test, e.g. 'win1 in Iconized and win2 in Maximized' where 'Iconized' and 
'Maximized' are classifications in a 'WindowState' ordered classifier.

Wrap-Up
-------

I'm about to fall asleep at the keyboard, so I'm going to wrap this up 
now.  I'm interested in hearing your feedback, as I am worried that some of 
the ideas in this thing get a little confusing as to when stuff should be 
quoted and when it shouldn't, and when it doesn't matter whether you quote 
it or not.  :)

Anyway, my general thoughts are that we need to add:

* Macros/pseudo-arguments (or some other form of abstraction for predicates)
* Expression bindings (to allow re-use of expressions computed during dispatch)
* Pattern matching (so tools like compilers can easily match complex patterns)
* Ordered classification groups (so that it's easy to do prioritized matches)

...but the syntax of these features is largely still up for grabs, and I'm 
open to suggestions.