[PEAK] Persistence styles, MDA, AOP, PyProtocols, and PEAK

Thu Jul 8 19:50:28 EDT 2004

Phillip, thanks for the detailed update on the directions you'll be taking 
PEAK in the near future.  It's exciting to hear that you've come up with a 
way to bring efficient and effective multiple dispatch to Python.  I've also 
been learning a bit about CLOS lately, and have to agree that multiple 
dispatch can dramatically simplify the solutions to a number of problems.  
The traditional single dispatch ("message passing") model favors 
encapsulation over flexibility, placing artificial restrictions on the way 
problems are solved.

...
> The idea of the domain model is that you put an application's core 
> behaviors into objects that reflect the application domain. 
> Unfortunately, this idea scales rather poorly when dealing with large 
> numbers of objects.  Object-oriented languages are oriented towards 
> dealing  with individual objects more so than collections of them.  
> Loops over large collections are inefficient when compared to bulk
> operations on an RDBMS, for example.  This means that building practical
> applications requires these bulk operations to be factored out, somehow.

Indeed, you've really hit the nail on the head here.  The domain model 
emphasis on converting data to objects before performing any operations on 
them is *terrible* for working on large data sets, and completely negates the 
performance advantages of RDBMSs.  Without any standard way to get around 
this limitation, individual developers will come up with ad hoc solutions of 
varying efficiency and effectiveness.  Usually this means going directly to 
SQL, but then the solution can only work with a SQL-based storage medium.  
I'll be quite interested to see how your "fact base" ideas play out.

> Going Beyond AOP (with 40-Year Old Technology!)

Lisp is >40 years old, but isn't CLOS itself only about 20 years old?  Was 
there a much older implementation of multiple dispatch somewhere?

> "Greenspun's Tenth Rule of Programming: any sufficiently complicated C or
> Fortran program contains an ad hoc informally-specified bug-ridden slow
> implementation of half of Common Lisp."
>
>    -- Phil Greenspun
>
> Apparently, this rule applies to Java as well.

And apparently Python, too. :)  And of course Paul Graham likes to apply it to 
any language that isn't Lisp.

> I recently ran across the 
> book-in-progress, "Practical Common Lisp" (available at
> http://www.gigamonkeys.com/book/ ), only to discover a surprising
> similarity between this chapter:
>     
> http://www.gigamonkeys.com/book/object-reorientation-generic-functions.html

Another useful introduction to generic functions and multimethods in CLOS is 
in another upcoming Lisp book ("Successful Lisp"):

http://www.psg.com/~dlamkins/sl/chapter14.html

> Now, before anybody panics, I am *not* planning to rewrite PEAK in Common
> Lisp!  

Funny, I've been thinking how CL could be improved if it had a number of 
PEAK's features. :)

...
> And what if you could define implementations of this function for different
> combinations of object type and database type?  Maybe something like:
>
>      [when("isinstance(ob,Invoice) and isinstance(db,XMLDocument)")]
>      def save_to(ob,db):
>          # code to write out invoice as XML
>
>      [when("isinstance(db,PickleFile)")]
>      def save_to(ob,db):
>          # code to write out arbitrary object as a pickle
>
> Doesn't this look a *lot* easier to you than writing DM classes?  It sure
> does to me.

Yes, *much* nicer.  I'm wondering where the metadata mapping hooks would go, 
though.  Perhaps more generic functions that would be called here?

The decorator syntax seems a bit clunky to me, but there's probably no better 
way to do it without introducing new syntax to Python.

In any case, the overall effect is pleasantly reminiscent of function guards 
(and to a lesser extent pattern matching) in functional programming 
languages.  The upside is that with all the parameter checking logic out of 
the function itself, function bodies become very clear and concise, and hence 
easier to understand and maintain.

...
> So, actually, all of the major technical pieces needed to make this happen
> (expression parser, decorator syntax, and dispatch algorithm) have been
> developed to at least the proof-of-concept stage.  The parser will reduce
> normal Python expressions to fast-executing objects, so there's no need to
> eval() expression strings at runtime.

At what point does this reduction occur?  At import time?

> Further, the in-CVS prototype 
> dispatcher automatically recognizes common subexpressions between rules, so
> that e.g. 'target.isDrinkable()' will get called only *once* per call to
> the generic function, even if the expression appears in dozens of rules.

Very nice!

> Also, the prototype dispatcher automatically checks "more discriminating"
> tests first.  So, for example, if one of the tests is on the type of an
> arguments, and there are methods for lots of different types, the type of
> that argument will be checked first, to narrow down the applicable methods
> faster.  Only then will pure boolean tests (like 'target.isDrinkable()') be
> checked, in order to further narrow down the options.

Very very nice!

...
> The net result is that the production version of this code should be
> roughly comparable in speed to a series of hand-written 'if:'
> statements!  For some tests, like type/protocol tests and range/equality
> tests, it's likely to actually be *faster* than 'if:' statements, due to
> the use of hash tables and binary searches.  (But this will likely be
> balanced out by other overhead factors.)

This is very impressive.  Perhaps this belongs in Python core, but then there 
would no doubt be a battle over how multiple dispatch just isn't 
"Pythonic"...

...
> The data management API and mapping mechanisms will probably end up being
> mostly generic functions, possibly accessed via methods on an "editing
> context" object (as I've mentioned in previous mailing list posts), but the
> back-end implementation code that you write will look more like the
> 'save_to' function examples I gave earlier in this article, rather than
> looking anything like today's DM objects.  Indeed, this will ultimately be
> the death of DM's, and I believe everyone will be happy to see them
> go.  Overall, the storage API is probably going to end up looking somewhat
> like that of Hibernate, a popular Java object-relational mapper.  (The main
> difference being that instead of their specialized object query langugage,
> we'll just use Python.)

I've been looking into Hibernate lately, as well, and also Modeling.  Will the 
PEAK "editing context" concept be anything like that in Modeling?  Will there 
be support for nested editing contexts as in Modeling?  Will the 
"storage.beginTransaction(X)" approach be a thing of the past?

...
> Currently, my intent is to put generic functions in a 'protocols.generic'
> subpackage, which you'll use in a way that looks something like:
>
>      from protocols.generic import when, before, after
>
>      [when("something()")]
>      def do_it(...):
>          ...
>
> However, I'm at least somewhat open to the possibilities of:
>
> * Making a separate top-level package for generics (e.g. 'from generics
> import when') instead of a subpackage of 'protocols'

This seems like a good approach, since multiple dispatch functionality is 
generally useful on its own, regardless of whether other PyProtocols features 
are being used.  You might also get a lot more adoption (and 
testers/feedback) if it was available separately to the Python community.

At the beginning of the article you said:
> This is a rather long article ... it addresses a coming "sea change" 
> that will affect PEAK at many levels, including at least:
> 
> * PyProtocols
> * peak.model and peak.storage
> * peak.metamodels
> * the AOP/module inheritance facilities in peak.config.modules

Some general thoughts/concerns:

peak.model: You didn't get into specifics here, but mention the possibility of 
destabilizing changes.  Since changes to the model layer of an application 
can have far-reaching affects throughout the application's code, I'd like to 
hear a lot more about what you have in mind here.  Do you have specific 
changes in mind?  How safe is it to continue using peak.model as it is today?

peak.storage: Because of the separation of model and storage layers, I'm less 
concerned about the impact of peak.storage changes.  The gains in flexibility 
will more than make up for any short-term inconvenience.  How far along are 
your thoughts on the new peak.storage API?

A customer is quite interested in using PEAK for general application 
development, with a specific interest in flexible object persistence.  On the 
RDBMS side, they would like some of the higher-level ORM features of Modeling 
and Hibernate.  The ideas you have in mind for queries look quite good.  Do 
you have any specifics in mind yet for how the physical-to-logical schema 
mapping features will work?

peak.metamodels: With peak.metamodels I'm primarily interested on XMI-based 
code generation capabilities.  Will this feature be retained?  It's one that 
might help distinguish PEAK (and Python generally) as a platform for serious 
enterprise application development.

PyProtocols: You mention there will be changes to PyProtocols for 1.0 -- no 
doubt you'll be using multiple dispatch quite a bit behind the scenes, but 
will there be much change at the API level?

AOP/module inheritance: I've not used these features, and have no plans for 
them, so no opinions or concerns here.