[PEAK] Progress on peak.schema

Mon Feb 28 22:38:40 EST 2005

Just an FYI for anybody who's wondering what ever happened to peak.schema, 
the future "workspace" persistence API, and conceptual queries.  I have in 
fact made some great progress in the last month.

OSAF (www.osafoundation.org) has a short-term consulting contract with my 
corporation that involves, among other things, designing a developer 
platform API for the Chandler open-source PIM project.  This month I've 
been working on Spike, which is a Chandler sandbox project at:

     http://cvs.osafoundation.org/viewcvs.cgi/internal/Spike/

I've been keeping (somewhat) quiet about it because it wasn't clear what 
its future with respect to Chandler was; things have now settled a little 
bit on that front.  That's not to say that it's going into Chandler, but 
it's no longer a completely speculative project and there's now been an 
official announcement regarding it:

     http://lists.osafoundation.org/pipermail/dev/2005-February/002482.html

Anyway, all that aside, the relevant bit for PEAK users is that Spike's 
'spike.schema' module is in fact a rough draft of many of the ideas I had 
for peak.schema and SOAR (Simple Objects Archived Relationally).  And, 
Spike's architecture overview (src/spike/overview.txt) now contains a rough 
plan of what the workspace API will end up looking like.

As you probably know, these are both APIs that I've been babbling about for 
years, but never quite gotten around to making them a reality.  Working on 
a tight deadline for OSAF with a focused set of requirements has helped 
tremendously in narrowing down my vision to concrete APIs.

Some cool features of the implementation and architecture include:

* Fully-event driven model (i.e. change events for every attribute)

* Completely ZODB-free

* Monkey-typing for data (you can ask for 'SomeClass.someAttr.of(anyObject)')

* Relationships can exist independently of entity types (i.e. you can 
create and persist relationships between types without those types needing 
to know about it)

* Compact schema notation compared to peak.model (see src/spike/schema.txt 
for examples)

* Schema objects have UUIDs for database synch and schema evolution (and 
there's a tool to automatically edit your model files and tack UUIDs on at 
the end for you, so they don't clutter up the schema definition and you 
don't have to do it by hand)

I've also already figured out a process for automatically mapping the 
schema to a relational database using the SOAR patterns that Ty and I 
developed, so applications that don't need access to a legacy database 
schema will be able to have worry-free automatic persistence.  The 
persistence API will be quite simple too, something like:

     class ISet(Interface):
         """Set with change events"""

         def add(item): pass
         def remove(item): pass
         def reset(iterable=()): pass
         def subscribe(receiver,hold=False): pass
         # a bunch of other methods including query support

     class IWorkspace(ISet):
         # add a bunch of undo/redo/flush stuff

That is, a workspace is logically just a set of objects with the same query 
capabilities as any other set, and undo/redo/commit/rollback 
support.  Multi-valued attributes are modelled as sets, so you can perform 
queries starting from any object, not just from "the database".

My earlier conception of workspaces is that they would be used to access 
specially-altered classes using dotted names, but I've pretty much tossed 
that out now.  Class-replacement wiring can be managed a bit more easily 
via command objects anyway.

Queries are now getting clearer, too, and I just drafted a rough API that 
covers pretty much any kind of "single object per row" query, which is to 
say it doesn't handle aggregation except of the IN (...) and EXISTS (...) 
varieties.  I do expect to be able to pick those other features up later, 
but they're out of scope for Chandler in the near future so for now I've 
got to think about them on my own time.  ;)

Anyway, here's a couple of ways to express the same simple query using the 
"bulletins" example schema::

     aUser = ws.groupBy(User.loginId)['joe']

     aUser = ws[User.loginId.eq('joe')]

There will presumably also be some sort of string syntax for queries, but 
I'm not settled yet on what it will look like, except that it will be 
syntactically valid Python.  The intermediate form of filter objects, 
however, will be objects like those in 'peak.model.query', only they won't 
be lambdas.  Instead, they will be introspectable so that for example the 
SOAR backend can convert them into SQL, the Chandler backend can convert 
them into repository-speak, etc.

*All* sets will have the query interface, not just workspaces, so you could 
do this (again using the bulletins schema):

     someBulletins = 
aCategory.bulletins[Bulletin.postedBy[User.loginId.eq('joe')]]

Which would retrieve all posts in 'aCategory' that were posted by users 
whose login ID equals 'joe'.  In the case of a SOAR backend, all queries 
end up getting pushed all the way down to the backend and implemented as 
SQL, and there should be a generic function somewhere to allow custom query 
tuning where needed.

Anyway, in case it wasn't clear, this model involves *no DMs*, so you don't 
have to write custom ones for every class.  Instead, the backend of a 
workspace just has to be able to map from class and attribute objects to 
whatever storage mechanism it uses.  (Probably using a generic function, so 
that you can use mixins to do cross-database stuff.)

PEAK's overall requirements are a lot broader and deeper than I currently 
know how to handle with the architecture I'm doing for Spike, but the 
really nice thing is that the API is now much more concrete.  Most of 
PEAK's specialty requirements (like cross-database storage and joins, 
legacy DB support, workspaces that represent files or documents, etc.) can 
be handled on the back-end in a way that's relatively invisible to the 
API.  PEAK also has lots of related APIs that have to be integrated, like 
peak.events and the binding package metadata facilities.

So, to sum up...  there's still no concrete timeframe for anything 
regarding peak.schema, but you can see many hints of coming attractions in 
spike.schema and related modules.  Right now my days are occupied with OSAF 
work, and my nights and weekends, when I have any time to spare, are spent 
trying to finish my PyCon stuff. After PyCon, though, I'll hopefully have a 
bit more time to start actually implementing peak.schema and SOAR.