[PEAK] Persistence styles, MDA, AOP, PyProtocols, and PEAK

Wed Jul 7 12:58:43 EDT 2004

At 08:31 AM 7/7/04 -0700, Robert Brewer wrote:
>Phillip J. Eby wrote:
> > What good does that do?  Well, think about storage.  Suppose
> > you defined a
> > generic function like this:
> >
> >      def save_to(ob, db):
> >          ...
>
>Is this to support saving an object to potentially multiple stores?

Yes.  For example, in database migration.  That's actually a use case that 
the current DM framework has support for.  But also, the same application 
may have multiple physical schemas for running on different platforms, so 
really the *actual* API will have specialization for the schema as well as 
physical database type and instance.

>Otherwise, I can't see the reason to make this a generic function, as
>opposed to a method of the db class. In other words, what does this buy
>you over:
>
>class MySQLManager(SQLManager):
>     def save(self, ob):
>         ....

Because of the ability to specialize on *both* parameters, and modularly 
expand the domain model *or* the variety of databases supported.

The difficulty in the present system largely comes from having to create 
all sorts of clever abstraction layers (e.g. SQL dialect abstraction, 
schema abstraction, persistence abstraction, etc.) to get around not being 
able to simply say, "when it's like this, do this."

Of course, you can do anything that GF's can do by creating enough 
interfaces and adapters and registries and the like.  I'm just getting 
bloody tired of inventing new kinds of registries for this stuff, and 
writing tests for them.  A generic function with predicate dispatch is the 
ultimate registry for this type of thing.

> > And what if you could define implementations of this function
> > for different
> > combinations of object type and database type?  Maybe something like:
> >
> >      [when("isinstance(ob,Invoice) and isinstance(db,XMLDocument)")]
> >      def save_to(ob,db):
> >          # code to write out invoice as XML
> >
> >      [when("isinstance(db,PickleFile)")]
> >      def save_to(ob,db):
> >          # code to write out arbitrary object as a pickle
> >
> > Doesn't this look a *lot* easier to you than writing DM classes?
>
>hmmm... not for the PickleFile, at least. ;)

Why not?

>I see the modularity being
>a big issue; in what module or layer would the first code example be
>placed? Invoice.py? XMLDocument.py? App/Model.py?

Probably something like 'my_app.xml_support', or 
'accounting.xml_support'.  If it was the 'bulletins' example, I'd probably 
put any storage stuff in 'bulletins.storage', just where the DM's go now.

The point is, you can pick a place appropriate to the application or 
library involved, and for simple things, it's quite simple.  And, much more 
complex things will be possible.

> > Which means that if you don't need, say, the pickle use case,
> > you could just not import that module, which means that
> > branch of our "virtual if statement" simply wouldn't exist,
> > thus consuming no excess memory or CPU time.  So, they can be
> > "write anywhere, run any time".  Now that's what I call
> > "modular separation of concerns".  :)
>
>I've always found object composition (via dynamic imports at config-read
>time) to solve this quite well.

Yes, this is what PEAK mostly does now, and for simple (i.e. 
single-dispatch) configuration it works quite well.

>  I guess it depends again on where you
>see your "save_to" function being defined and invoked.

Let's say for the sake of argument is that the usage API is going to be 
something like:

     ec = storage.EditingContext(self)

     invoice = ec.find(Invoice, inv_num="12345")

     invoice.status = "paid"

     ec.save(invoice)

The 'save' method of the editing context will, as part of its actions, 
invoke one or more generic functions like 'storage.save_to'.  To define its 
instances, you'll just have module-level code (in whatever modules) that 
says something like:

     save_to = storage.save_to

     [when("whatever")]
     def save_to(ob, db, ...):
         ...

That is, your code gets added to one global generic function object that's 
invoked by the editing context API.

In practice, there'll probably be more than one generic function involved, 
and it's unlikely that they'll specialize only on "ob" and "db".  More 
likely, they'll also specialize on a "concrete schema", or perhaps other 
things I haven't thought of yet.

In other words, the "save_to" example was just that: an example.  It wasn't 
an API proposal, I'm not that far along yet.  :)  My point is mainly that 
I'm really tired of designing new and exotic registries that are just 
special cases of predicate dispatching, just like before PyProtocols I got 
tired of writing the same type-checking if-then blocks all over the place 
to allow various kinds of parameters to be accepted.  (E.g. strings and 
interfaces both being valid input to binding.Obtain.)

So these days, I've stretched single-dispatch adaptation about to its 
limits, but there are still higher-order forms of dispatching that are 
needed for PEAK's ultimate goals.  So I'm extending PyProtocols to handle 
dispatch of arbitrary complexity, so we can move full speed ahead on the 
actual framework, instead of writing more kinds of registries.

Adaptation was a big step up in the programming paradigm, because it frees 
you from thinking about different types to focus on a higher-level 
abstraction of what you're trying to do.  In the same way, generic 
functions will free us from having to think about certain kinds of 
structural composition issues that are *noise* in the programming process.

While I wouldn't replace all methods on objects with generic functions, I 
*do* know that there are plenty of times when there really isn't a "right" 
place to put a method, because it's really an operation that involves more 
than one thing.  And then it takes time and thought and sometimes a certain 
amount of wizardry to get the thing right, by developing interfaces on both 
sides, and working out a handshake mechanism between the two sides so that 
object A checks with object B which checks back with object A.

Being able to specify symmetric operations really simplifies that.  And 
having predicate dispatch means I don't need to predict in advance what 
kind of rules somebody may have for how something gets stored or what 
should happen when it does.  I just say, "here's the hook point", and the 
application can do whatever it needs.

For example...  suppose somebody *really* needs to hand-tune the SQL for 
some query, perhaps to add optimizer hints or change the join order or some 
of the other crap that you have to do with real-life SQL and real-life 
databases.  As long as the SQL generation function is generic, an 
application *can* hand-tune the SQL for one query.

And I don't have to worry at all about *how* they'll specify what the 
conditions for tuning are!  I don't need to decide in advance, "oh, we'll 
register special conditions by thus and such classes/whatever".  Not only 
because I risk being wrong about what conditions an application is going to 
need to test for, but because if I give it any sophistication at all, the 
registration mechanism is going to be complex, placing a burden on *all* 
applications using it, and on everybody who has to learn *how* to use it.

Thus, generic functions lift a huge weight off me as the designer, and a 
lot of learning and runtime complexity off of the developer.  Indeed, a 
developer can also feel confident about the framework's scalability, 
because he or she knows that if they have a special case, all they need to 
do is add a special case rule, and they don't have to patch the framework 
to do it.

> > In addition to the functionality proof-of-concept, I've also got a
> > proof-of-concept Python expression parser (that so far
> > handles everything
> > but list comprehensions and lambdas) for what's needed to
> > implement the
> > fancy 'when/before/after()' API.  And there's a
> > proof-of-concept for the
> > "function decorator syntax" as well.
>
>FWIW, I handle when/before/after using triggers within attribute
>descriptors.

The when/before/after stuff is talking about CLOS-style method qualifiers, 
where you specify whether a method runs "before", "after", or "around" 
other methods with applicable rules.  (See the Practical Lisp chapter for a 
nice lucid explanation that's actually much shorter than my article...)

It's not really to do with attributes, although obviously descriptors can 
use generic functions as all-purpose static rule-based observer registries.

> > Further, the in-CVS prototype dispatcher automatically
> > recognizes common subexpressions between rules, so
> > that e.g. 'target.isDrinkable()' will get called only *once*
> > per call to the generic function, even if the expression
> > appears in dozens of rules. Also, the prototype dispatcher
> > automatically checks "more discriminating" tests first.
>
>Nice bit of coding, that. :)

The algorithm is by messrs. Chambers and Chen; they get the credit for the 
basic cleverness of the algorithm, although I've expanded their algorithm 
to support comparison against constant inequalities, which is often needed 
by business rules.  A paper on their original algorithm can be found at:

     http://citeseer.ist.psu.edu/chambers99efficient.html

> > I haven't narrowed these down 100%, but here's what I think
> > so far.  The query language is probably going to end up
> > being Python, specifically list or generator comprehensions, e.g.:
> >
> >      [(invoice,invoice.customer) for invoice in Invoices
> >       if status=="pastdue"]
> >
> > However, these will be specified as *strings*.
>
>Cheater. ;) Show an example using a date which has been obtained from
>the end-user.
>
>"""[(invoice, invoice.customer) for invoice in Invoices
>if status == %s and duedate > %s""" % (coerce(stat), coerce(due))

Nope, sorry, wrong answer.  :)  Try this:

"""[(invoice, invoice.customer) for invoice in Invoices
     if invoice.status == stat and duedate > due]"""

Or, if you really need those 'coerce' calls in there for some reason...

"""[(invoice, invoice.customer) for invoice in Invoices
     if invoice.status == coerce(stat) and duedate > coerce(due)]"""

IOW, the evaluation is within the current variable namespace, so local and 
global variables are both available for use in the expression.  No string 
interpolation or separate "bind variables" required.

Also, I'll be developing constant-expression optimization as part of the 
parser for predicate dispatching, so it won't be too hard to include it 
here.  That is, the 'coerce(stat)' and 'coerce(due)' would not need to be 
calculated on each iteration of the query.  Really, any part of the query 
that doesn't depend on a loop variable can be computed just once, during 
the initial processing of the query.

Anyway, these queries will actually be easier to use than SQL.  I don't 
like using strings to represent code, but the alternative approach of using 
expression proxies the way SQLObject and some other frameworks do, doesn't 
really work all that well for expressing arbitrary queries.  There's no way 
to specify arbitrary joins properly, for example.  And you can't call 
standard Python functions on things, you have to parenthesize your 
comparisons (because the bitwise & and | arithmetic operators have higher 
precedence than comparison operators), and so on.

But by using strings, I can access the full parse tree, and do things like 
call a generic function to ask how to represent a given Python function in 
the SQL dialect that the query is being rendered for...   e.g. something like:

     [when("function is int and isinstance(backend,Oracle)")]
     def function_as_SQL(backend,function,args):
         return "FLOOR(%s)" % args[0]

Hurray!  Yet another kind of registry that I've wanted PEAK to have for a 
while, but that I now won't have to design and code!

Of course, these ideas are all still only partly-baked, and the actual 
arguments, interfaces, etc. will still need *some* definition.  It's just 
that the *mechanics* of implementing these kind of lookups will become a 
no-brainer.  Even if we have convenience functions to simplify the 
typing  (e.g. 'register_sql_function(int,Oracle,lambda args: "FLOOR(%s)" % 
args[0])'), the full extensibility will be there if and when it's needed.