[PEAK] Persistence styles, MDA, AOP, PyProtocols, and PEAK
Phillip J. Eby
pje at telecommunity.com
Wed Jul 7 12:58:43 EDT 2004
At 08:31 AM 7/7/04 -0700, Robert Brewer wrote:
>Phillip J. Eby wrote:
> > What good does that do? Well, think about storage. Suppose
> > you defined a
> > generic function like this:
> >
> > def save_to(ob, db):
> > ...
>
>Is this to support saving an object to potentially multiple stores?
Yes. For example, in database migration. That's actually a use case that
the current DM framework has support for. But also, the same application
may have multiple physical schemas for running on different platforms, so
really the *actual* API will have specialization for the schema as well as
physical database type and instance.
>Otherwise, I can't see the reason to make this a generic function, as
>opposed to a method of the db class. In other words, what does this buy
>you over:
>
>class MySQLManager(SQLManager):
> def save(self, ob):
> ....
Because of the ability to specialize on *both* parameters, and modularly
expand the domain model *or* the variety of databases supported.
The difficulty in the present system largely comes from having to create
all sorts of clever abstraction layers (e.g. SQL dialect abstraction,
schema abstraction, persistence abstraction, etc.) to get around not being
able to simply say, "when it's like this, do this."
Of course, you can do anything that GF's can do by creating enough
interfaces and adapters and registries and the like. I'm just getting
bloody tired of inventing new kinds of registries for this stuff, and
writing tests for them. A generic function with predicate dispatch is the
ultimate registry for this type of thing.
> > And what if you could define implementations of this function
> > for different
> > combinations of object type and database type? Maybe something like:
> >
> > [when("isinstance(ob,Invoice) and isinstance(db,XMLDocument)")]
> > def save_to(ob,db):
> > # code to write out invoice as XML
> >
> > [when("isinstance(db,PickleFile)")]
> > def save_to(ob,db):
> > # code to write out arbitrary object as a pickle
> >
> > Doesn't this look a *lot* easier to you than writing DM classes?
>
>hmmm... not for the PickleFile, at least. ;)
Why not?
>I see the modularity being
>a big issue; in what module or layer would the first code example be
>placed? Invoice.py? XMLDocument.py? App/Model.py?
Probably something like 'my_app.xml_support', or
'accounting.xml_support'. If it was the 'bulletins' example, I'd probably
put any storage stuff in 'bulletins.storage', just where the DM's go now.
The point is, you can pick a place appropriate to the application or
library involved, and for simple things, it's quite simple. And, much more
complex things will be possible.
> > Which means that if you don't need, say, the pickle use case,
> > you could just not import that module, which means that
> > branch of our "virtual if statement" simply wouldn't exist,
> > thus consuming no excess memory or CPU time. So, they can be
> > "write anywhere, run any time". Now that's what I call
> > "modular separation of concerns". :)
>
>I've always found object composition (via dynamic imports at config-read
>time) to solve this quite well.
Yes, this is what PEAK mostly does now, and for simple (i.e.
single-dispatch) configuration it works quite well.
> I guess it depends again on where you
>see your "save_to" function being defined and invoked.
Let's say for the sake of argument is that the usage API is going to be
something like:
ec = storage.EditingContext(self)
invoice = ec.find(Invoice, inv_num="12345")
invoice.status = "paid"
ec.save(invoice)
The 'save' method of the editing context will, as part of its actions,
invoke one or more generic functions like 'storage.save_to'. To define its
instances, you'll just have module-level code (in whatever modules) that
says something like:
save_to = storage.save_to
[when("whatever")]
def save_to(ob, db, ...):
...
That is, your code gets added to one global generic function object that's
invoked by the editing context API.
In practice, there'll probably be more than one generic function involved,
and it's unlikely that they'll specialize only on "ob" and "db". More
likely, they'll also specialize on a "concrete schema", or perhaps other
things I haven't thought of yet.
In other words, the "save_to" example was just that: an example. It wasn't
an API proposal, I'm not that far along yet. :) My point is mainly that
I'm really tired of designing new and exotic registries that are just
special cases of predicate dispatching, just like before PyProtocols I got
tired of writing the same type-checking if-then blocks all over the place
to allow various kinds of parameters to be accepted. (E.g. strings and
interfaces both being valid input to binding.Obtain.)
So these days, I've stretched single-dispatch adaptation about to its
limits, but there are still higher-order forms of dispatching that are
needed for PEAK's ultimate goals. So I'm extending PyProtocols to handle
dispatch of arbitrary complexity, so we can move full speed ahead on the
actual framework, instead of writing more kinds of registries.
Adaptation was a big step up in the programming paradigm, because it frees
you from thinking about different types to focus on a higher-level
abstraction of what you're trying to do. In the same way, generic
functions will free us from having to think about certain kinds of
structural composition issues that are *noise* in the programming process.
While I wouldn't replace all methods on objects with generic functions, I
*do* know that there are plenty of times when there really isn't a "right"
place to put a method, because it's really an operation that involves more
than one thing. And then it takes time and thought and sometimes a certain
amount of wizardry to get the thing right, by developing interfaces on both
sides, and working out a handshake mechanism between the two sides so that
object A checks with object B which checks back with object A.
Being able to specify symmetric operations really simplifies that. And
having predicate dispatch means I don't need to predict in advance what
kind of rules somebody may have for how something gets stored or what
should happen when it does. I just say, "here's the hook point", and the
application can do whatever it needs.
For example... suppose somebody *really* needs to hand-tune the SQL for
some query, perhaps to add optimizer hints or change the join order or some
of the other crap that you have to do with real-life SQL and real-life
databases. As long as the SQL generation function is generic, an
application *can* hand-tune the SQL for one query.
And I don't have to worry at all about *how* they'll specify what the
conditions for tuning are! I don't need to decide in advance, "oh, we'll
register special conditions by thus and such classes/whatever". Not only
because I risk being wrong about what conditions an application is going to
need to test for, but because if I give it any sophistication at all, the
registration mechanism is going to be complex, placing a burden on *all*
applications using it, and on everybody who has to learn *how* to use it.
Thus, generic functions lift a huge weight off me as the designer, and a
lot of learning and runtime complexity off of the developer. Indeed, a
developer can also feel confident about the framework's scalability,
because he or she knows that if they have a special case, all they need to
do is add a special case rule, and they don't have to patch the framework
to do it.
> > In addition to the functionality proof-of-concept, I've also got a
> > proof-of-concept Python expression parser (that so far
> > handles everything
> > but list comprehensions and lambdas) for what's needed to
> > implement the
> > fancy 'when/before/after()' API. And there's a
> > proof-of-concept for the
> > "function decorator syntax" as well.
>
>FWIW, I handle when/before/after using triggers within attribute
>descriptors.
The when/before/after stuff is talking about CLOS-style method qualifiers,
where you specify whether a method runs "before", "after", or "around"
other methods with applicable rules. (See the Practical Lisp chapter for a
nice lucid explanation that's actually much shorter than my article...)
It's not really to do with attributes, although obviously descriptors can
use generic functions as all-purpose static rule-based observer registries.
> > Further, the in-CVS prototype dispatcher automatically
> > recognizes common subexpressions between rules, so
> > that e.g. 'target.isDrinkable()' will get called only *once*
> > per call to the generic function, even if the expression
> > appears in dozens of rules. Also, the prototype dispatcher
> > automatically checks "more discriminating" tests first.
>
>Nice bit of coding, that. :)
The algorithm is by messrs. Chambers and Chen; they get the credit for the
basic cleverness of the algorithm, although I've expanded their algorithm
to support comparison against constant inequalities, which is often needed
by business rules. A paper on their original algorithm can be found at:
http://citeseer.ist.psu.edu/chambers99efficient.html
> > I haven't narrowed these down 100%, but here's what I think
> > so far. The query language is probably going to end up
> > being Python, specifically list or generator comprehensions, e.g.:
> >
> > [(invoice,invoice.customer) for invoice in Invoices
> > if status=="pastdue"]
> >
> > However, these will be specified as *strings*.
>
>Cheater. ;) Show an example using a date which has been obtained from
>the end-user.
>
>"""[(invoice, invoice.customer) for invoice in Invoices
>if status == %s and duedate > %s""" % (coerce(stat), coerce(due))
Nope, sorry, wrong answer. :) Try this:
"""[(invoice, invoice.customer) for invoice in Invoices
if invoice.status == stat and duedate > due]"""
Or, if you really need those 'coerce' calls in there for some reason...
"""[(invoice, invoice.customer) for invoice in Invoices
if invoice.status == coerce(stat) and duedate > coerce(due)]"""
IOW, the evaluation is within the current variable namespace, so local and
global variables are both available for use in the expression. No string
interpolation or separate "bind variables" required.
Also, I'll be developing constant-expression optimization as part of the
parser for predicate dispatching, so it won't be too hard to include it
here. That is, the 'coerce(stat)' and 'coerce(due)' would not need to be
calculated on each iteration of the query. Really, any part of the query
that doesn't depend on a loop variable can be computed just once, during
the initial processing of the query.
Anyway, these queries will actually be easier to use than SQL. I don't
like using strings to represent code, but the alternative approach of using
expression proxies the way SQLObject and some other frameworks do, doesn't
really work all that well for expressing arbitrary queries. There's no way
to specify arbitrary joins properly, for example. And you can't call
standard Python functions on things, you have to parenthesize your
comparisons (because the bitwise & and | arithmetic operators have higher
precedence than comparison operators), and so on.
But by using strings, I can access the full parse tree, and do things like
call a generic function to ask how to represent a given Python function in
the SQL dialect that the query is being rendered for... e.g. something like:
[when("function is int and isinstance(backend,Oracle)")]
def function_as_SQL(backend,function,args):
return "FLOOR(%s)" % args[0]
Hurray! Yet another kind of registry that I've wanted PEAK to have for a
while, but that I now won't have to design and code!
Of course, these ideas are all still only partly-baked, and the actual
arguments, interfaces, etc. will still need *some* definition. It's just
that the *mechanics* of implementing these kind of lookups will become a
no-brainer. Even if we have convenience functions to simplify the
typing (e.g. 'register_sql_function(int,Oracle,lambda args: "FLOOR(%s)" %
args[0])'), the full extensibility will be there if and when it's needed.
More information about the PEAK
mailing list