The PEAK Developers' Center   ChandlerSharingModel UserPreferences
HelpContents Search Diffs Info Edit Subscribe XML Print View
The following 808 words could not be found in the dictionary of 50 words (including 50 LocalSpellingWords) and are highlighted below:
All   Also   An   And   Annotations   Any   As   At   By   Called   Chandler   Chandlers   Change   Checking   Content   Contents   Continuing   Conversion   Core   Cosmo   Currently   Defining   Dependencies   Details   Ds   Each   Error   Evolution   Export   Finally   First   For   Format   Framework   Here   However   If   Implementation   Import   Importer   Importers   In   Information   Inheritance   Inter   Interface   Is   Issues   It   Item   Just   Key   Keyword   Kinds   Let   Logical   Management   Methods   Modified   More   Morgen   Most   Multiple   New   No   None   Note   Notice   Old   On   One   Only   Open   Otherwise   Over   Overview   Parcel   Processing   Proposal   Record   Records   Recordtype   Registry   Schema   Second   Selection   Sharing   Similarly   So   Some   Such   Suppose   Table   Tag   That   The   Then   There   These   They   This   Thus   To   Type   Types   Typically   Utility   View   We   When   Which   able   about   above   accept   acceptable   active   actual   actually   add   added   adding   additional   additive   affects   after   algorithm   all   allow   allowing   allows   along   already   also   alternative   although   always   ambiguities   amount   an   and   annotation   annotations   another   any   applicable   application   applied   applies   apply   appropriate   appropriately   are   aren   argument   arguments   arise   as   aside   aspects   associated   assume   assumed   assuming   assumptions   at   attribute   attributes   automatically   available   avoid   backward   base   based   be   because   becomes   been   before   begin   begins   behave   being   below   better   between   big   bit   body   boilerplate   both   breaks   bunch   but   by   call   called   calls   can   cannot   case   certain   change   changed   changes   checking   choose   circular   class   classes   cleanup   clear   clearing   clients   cls   code   collaborate   collection   collections   combining   common   compatibility   compatible   complete   completed   complex   complicated   conceptually   confine   consume   contain   containing   contains   content   contentitem   contents   control   conversion   convert   converted   core   corner   correctly   cost   could   covered   create   created   creates   creating   current   currently   cut   data   database   date   declare   declared   decoration   decorator   def   default   define   defined   defines   defining   definition   delete   deleted   deletion   delta   dependable   depended   dependencies   dependency   dependent   depending   depends   describe   described   describes   description   designate   designed   detail   detect   determine   determined   developer   developers   didn   difference   differencing   different   diffs   discoverable   discussed   display   do   document   does   doing   domain   don   done   down   downgrades   due   dump   duplicate   during   each   easy   effectively   either   elementary   eliminating   else   empty   end   enough   ensure   es   etc   even   ever   every   everything   evolution   exact   example   examples   except   exist   existing   exists   expect   expected   explicitly   export   exported   exporter   exporters   exporting   extensible   few   field   fields   find   fine   finish   finished   first   fixed   followed   following   for   form   formal   format   formats   forth   forward   found   framework   from   function   further   gap   gaping   generator   generators   get   gets   given   going   handle   handled   handling   has   have   haven   having   help   here   hold   hole   how   however   hybrid   idea   identified   identify   if   ignore   immediately   implement   implementation   implementations   implication   import   important   imported   importer   importers   importing   impossible   in   include   including   index   indicate   indirectly   individual   information   inherit   inheritance   inherited   inheriting   inherits   initialize   initialized   input   installed   instance   instances   instead   inter   interoperability   interoperate   into   introducing   investigation   invoke   invoked   invokes   involved   is   isinstance   it   item   itemrecord   items   iterable   its   itself   just   keep   key   keys   keyword   kind   kinds   kludge   know   known   last   latest   least   less   let   like   likely   linking   list   listed   little   load   logical   long   look   looks   lookup   machine   machines   made   make   making   many   mapping   matching   me   meaning   means   mechanism   members   meta   method   methods   might   minimum   model   modified   more   most   multiple   must   name   named   names   nature   necessary   need   needed   needs   never   new   no   nor   not   note   nothing   now   null   number   numbers   object   objects   obvious   of   offering   often   okay   old   older   on   one   ones   only   open   operation   operations   optional   optionally   or   order   org   oriented   osafoundation   other   othertype   our   out   output   outputs   overridden   override   own   parcel   parcels   part   particular   parts   passed   passing   patterns   peculiar   performance   performed   perhaps   persistent   pim   place   plain   point   position   possibility   possible   potentially   pre   precisely   present   previous   previously   primary   principle   probably   problem   problems   process   processed   processing   produce   produced   producing   programs   prohibit   proposal   proposed   provide   provided   punting   purpose   quirks   raise   rather   re   read   reasonable   reasons   received   receiver   record   records   recordtype   redefine   reduce   refer   reference   references   regarding   registered   registering   related   relational   relationship   relationships   relatively   relevant   reload   rely   remain   remedied   remove   replace   replaced   replacement   replacing   repository   require   required   requirement   requires   respectively   responsible   resulting   results   retain   retained   return   returned   returns   right   rows   running   rv   same   say   schema   schemas   second   section   see   seen   select   selected   selection   self   sequence   set   setup   share   shared   sharing   short   should   shown   similar   simple   simplest   simply   since   situation   so   sole   some   sometype   sort   special   specific   specifically   specified   specify   start   state   still   store   stored   strategy   string   strings   structures   subclass   subclasses   subtraction   such   sufficient   sufficiently   suitable   supplied   support   supported   supports   suppose   synthesize   system   table   tag   tagging   tagname   tags   take   takes   talked   tear   temporary   terminology   terms   than   that   the   their   them   themselves   then   there   therefore   these   they   think   this   three   time   title   to   topic   track   transfer   transmission   transmit   transmits   transmitting   treats   tries   two   type   types   undefined   understands   understood   unique   unless   unlike   until   up   updated   upgraded   upgrades   uri   us   usage   use   used   user   users   using   utility   uuid   value   values   various   versa   version   versions   vice   view   visible   want   was   way   we   well   were   what   whatever   when   whether   which   who   will   wish   with   within   without   won   words   work   would   write   writing   written   yet   yield   yielded   yielding   yields   you   your  

Clear message

Proposal: A Logical Format API for the Chandler Sharing Framework

Table of Contents


This document describes a proposed API for the Chandler sharing framework to allow individual parcels to support backward and forward-compatible sharing, even when their domain model changes between parcel versions and the clients doing the sharing do not all have the same version of the parcel installed.

The proposed API does this by allowing parcel developers to specify "sharing schemas" for their items. A sharing schema is a kind of logical transmission format, that breaks items down into simple records containing elementary data types that are easy to store or transmit for use by other programs.

Sharing schemas defined using this API will also be used to implement "dump and reload" including schema evolution during upgrades or downgrades. As a parcel's item schema changes, its sharing schema(s) must be modified so that data produced by previous versions of the parcel can still be imported. A parcel can also optionally provide support for exporting data in such a way that it can be read by older versions.

Typically, a parcel will provide its own sharing schema for the Kinds and Annotations it contains. However, it's also possible for a parcel to define one or more sharing schemas for other parcels that it depends on.

Parcel developers define a sharing schema by defining one or more record types (using the @sharing.recordtype decorator), and one or more sharing.Schema subclasses. The record types define the format of the data to be shared, and the sharing.Schema classes provide code that convert items to records and vice versa. The sharing.Schema base class will provide many utility methods to automatically handle common mapping patterns, so that most schemas will include relatively little code.

Records and Record Types

The API treats all data as "records", similar to the rows of a table in a relational database. Each record is of some "record type", and contains a fixed number of fields. As in a relational database, each field can hold at most one value of one elementary data type, such as a number, string, date/time value, etc. A field may also hold a value of None, which is conceptually similar to the "null" value in a relational database. There is also a second kind of "null" value, called sharing.NoChange, that can be used to create "diff" or "delta" records that indicate only certain parts of the record are changed.

To define a record type, a parcel developer will write a short function using the @sharing.recordtype decorator. For example:

def itemrecord(itsUUID, title, body, createdOn, description, lastModifiedBy):
    """Record type for content items; note lastModifiedBy is a UUID"""

The above defines a record type with 6 fields, named by the arguments to the function. The string passed to recordtype() must be a unique URI, and will be used to allow other programs (such as Cosmo) to identify whether a particular record type is known to or understood by it.

(Note that any unique URI is acceptable, including URIs of the form "uuid:...". That is, you need not have control of a domain name in order to create your own unique URI, as you can use a UUID to create one.)

Type Checking and Conversion

In the simplest case, a recordtype function need not contain any code or return any value. In such a case, the argument names -- and default values, if any -- are sufficient to describe how the resulting record type should behave. However, if you wish to provide type checking or conversion of arguments, you will need to write a bit more code in a record type. For example, here's a new version of the example above, that does a bit more work to ensure it is used correctly:

def itemrecord(itsUUID, title, body, createdOn, description, lastModifiedBy):
    """Record type for content items"""
    if isinstance(lastModifiedBy, schema.Item):
         lastModifiedBy = lastModifiedBy.itsUUID
    if not isinstance(lastModifiedBy, UUID):
        raise TypeError("lastModifiedBy must be an item or a UUID")
    return itsUUID, title, body, createdOn, description, lastModifiedBy

Note, however, that although a recordtype function can accept items as input, it cannot return items as output. They must be converted to UUIDs, strings, numbers, or other elementary values. The EIM API is record-oriented, not object-oriented.

Inter-Record Dependencies

Just as in a relational database, records may contain references to other records. For example, let's suppose that we want to have a record type to record "tags" associated with a content item. And, we want tags to be a kind of content item themselves. Here's what we would do:

    itsUUID = itemrecord.itsUUID
def tag(itsUUID, tagname):
    """Record type for a tag; "inherits" from itemrecord"""

    item = itemrecord.itsUUID, tag = tag.itsUUID
def tagging(item, tag):
    """Record type linking tags to items"""
    if isinstance(item, schema.Item):
        item = item.itsUUID
    if isinstance(tag, schema.Item):
        tag = tag.itsUUID
    if not isinstance(item, UUID):
        raise TypeError("must be an item or a UUID", item)
    if not isinstance(tag, UUID):
        raise TypeError("must be an item or a UUID", tag)
    return item, tag

Keyword arguments passed to the recordtype() decorator allow you to define relationships between the fields in the record type being defined, and the fields of existing record types. As you can see above, we use itemrecord.itsUUID and tag.itsUUID to refer to the itsUUID fields of the itemrecord and tag record types. This creates a dependency between the record types, and affects the order in which records will be imported or exported.

In the examples above, the order of record processing will always begin with itemrecord, followed by tag and tagging records. More specifically, before a tagging record is processed, any tag and itemrecord records that have matching itsUUID fields will be processed. And before a tag record is processed, any itemrecord with the same itsUUID will be processed first.

Recordtype Evolution

As an application's schema changes, it may be necessary to add new fields to existing record types. This can be done, as long as:

  1. New fields are added to the end of the existing fields in the record type function.
  2. New fields must have a default value defined, and import code for the record type must be able to handle a value of sharing.NoChange. (This allows two Chandlers with different versions of a parcel to interoperate, even if one supports fields that the other does not.)
  3. The record type's URI must not change, and all existing fields's names must remain the same and in the same order.

In other words, if you want to change the name, meaning, or position of an existing field (or remove fields), you must create a new recordtype with a new URI to replace the old one. Such replacement also means that you must create a new sharing.Schema in order to retain backward compatibility with older sharing clients. (This topic will be covered in more detail below, since we haven't talked about Schema classes yet.)

Defining a Sharing Schema

By themselves, record types only define a format for sharing and import/export. To complete a parcel's sharing definition, it must also define how to convert between items and records, by creating a sharing.Schema subclass. At minimum, such a subclass must include a unique URI, a version number, and a user-visible description:

class ContentSchema(sharing.Schema):
    uri = ""
    version = 1
    description = _("Core content items")

The sharing system will use these attributes to determine what formats it "understands", and to allow users to select what version of a particular format should be used for a particular "share", if applicable. (This is so that users can choose an older version in order to collaborate with users who don't have the latest version.)

It's important to note that unlike the Chandler application schema, not every change to parcel's schema will require a change in its schema version number. A sharing.Schema version number only needs to change when a record type is to be replaced. That is, as long as you are only adding new record types, or adding new fields to existing record types (as described in the previous section), there is no need for the version number to change. That's because older code will still be able to read the records and fields that it understands, and ignore the new record types and fields that it does not.

When a schema gets a new version number, you will often want to create a second sharing.Schema subclass, to keep backward compatibility. For example, we might have:

class OldContentSchema(sharing.Schema):
    uri = ""
    version = 1
    description = _("Core content items")
    # code to read/write old format here
    # ...

class NewContentSchema(OldContentSchema):
    version = 2
    # code to read/write new format here
    # ...

This allows the parcel to support sharing (or import/export and dump/reload) of older formats. Any aspects of the old schema that are retained by the new one can potentially be inherited, eliminating the need for duplicate code. (Notice that in the above example we're also inheriting the uri and description attributes.)

Export Methods

In order to function, a sharing.Schema subclass must define "exporter" and "importer" methods. Continuing our simple item/tags example, let's look at some exporters:

class ContentSchema(sharing.Schema):
    uri = ""
    version = 1
    description = _("Core content items")

    def export_contentitem(self, item):
        yield itemrecord(
            item.itsUUID, item.title, item.body, item.createdOn,
            item.description, item.lastModifiedBy
        for t in item.tags:
            yield tagging(item, t)

    def export_tag(self, item):
        yield tag(item.itsUUID, item.tagname)

An exporter method is declared using @sharing.exporter(cls, ...), to indicate what class or classes of items are handled by that method. Methods may be generators that yield records, or they can just return a list or other iterable object that yields records.

More than one exporter can be called for the same item. In the example above, assuming that pim.Tag is a subclass of pim.ContentItem, then the export_contentitem() method will be called before export_tag() for each pim.Tag item being exported. The same principle applies for export methods that apply to annotation classes; the export method for each applicable annotation class will be called. All of the records supplied by the various export methods are then output.

Notice that this means that export methods must be written in such a way that they do not produce duplicate records. Each export method should therefore confine itself to writing records specific to the class(es) it is registered for, and allowing the base class export methods to handle the base classes' data.

If you subclass your sharing.Schema, the subclass inherits all of the export methods defined by the base class. If you wish to redefine the export handling for some particular item or annotation class, you must do so by explicitly using a new @sharing.exporter() decoration; it is not sufficient to just override a method with the same name. (This is because for performance reasons, the lookup mechanism is not based on method names.)

Finally, you can declare more than one exporter for the same type in the same sharing.Schema class; both will be called for items they apply to.

Importer Methods

Each sharing.Schema must declare "importer" methods to handle each record type that it outputs. Here are some importers for the record types we defined previously:

class ContentSchema(sharing.Schema):

    # ...

    def import_contentitem(self, record):
            record.itsUUID, pim.ContentItem,
            title = record.title,
            body = record.body,
            createdOn = record.createdOn,
            description = record.description,
            lastModifiedBy = self.loadItemByUUID(record.lastModifiedBy)

    def import_tag(self, record):
        self.loadItemByUUID(record.itsUUID, pim.Tag, tagname=record.tagname)

    def import_tagging(self, record):
        the_item = self.loadItemByUUID(record.item)
        the_tag = self.loadItemByUUID(record.tag)

Notice that importer methods do not need to return a value; their sole purpose is to do whatever processing is required for the received records.

Only one importer can be registered for a given record type in a particular Schema subclass. Importers registered by base classes are inherited in subclasses, unless overridden using the appropriate decorator in the subclass. If you don't want to inherit or override support for a particular record type, the record type can be listed in the do_not_import attribute of the class, e.g.:

do_not_import = sometype, othertype, ...

Utility Methods

The loadItemByUUID() method shown in the importer examples above is a utility method provided by the sharing.Schema base class. It takes a UUID, an optional item or annotation class, and keyword arguments for attributes to set. The return value is an item of the specified class, or a plain schema.Item if no class was specified and the item didn't already exist.

If an item with the given UUID already exists, it's returned. If a class was specified, the item's kind is upgraded if necessary. For example, the importer for the tag recordtype above invokes it like this:

self.loadItemByUUID(record.itsUUID, pim.Tag, tagname=record.tagname)

If a pim.ContentItem of the right UUID exists, its kind is upgraded to pim.Tag. If it does not exist, it is created as a pim.Tag. If an item exists, and it has a kind that is a subclass of pim.Tag, its kind will not be changed. This algorithm allows items' types to be upgraded "just in time" as information becomes available.

If any of the attribute values supplied to loadItemByUUID() are sharing.NoChange, no change is made to the attribute. Similarly, if the UUID supplied to loadItemByUUID() is sharing.NoChange, sharing.NoChange is returned instead of an item.

Over time, there will be additional utility methods added to sharing.Schema as common usage patterns are identified, to help reduce the amount of boilerplate code that needs to be written.

The Sharing Interface

For each import or export operation to be performed, the sharing framework will create instances of the appropriate sharing.Schema subclasses, passing in a repository view. So in our running example, the sharing framework would invoke ContentSchema(rv) to get a ContentSchema instance with an itsView of rv.

Then, depending on the operation(s) to be performed, the sharing framework will call some of the following methods, which all have reasonable default implementations provided by sharing.Schema:

Called before an export process begins, to allow the Schema instance to do any pre-export setup operations. The default implementation does nothing, but can be overridden to initialize any data structures that might be needed during the export operation.

Called to export an individual item, it should return a sequence or be a generator yielding the relevant records for the supplied item. The default implementation automatically looks up the registered export methods and calls them, combining their results for the return value. This method can be overridden if you have a sufficiently complex special case to need it, or if you want to create a different way of registering exporters. Note also that it's okay for this method to return an empty sequence.

(Note: the sharing framework must not make any assumptions about a relationship between the records returned, and the item passed in, since some of the records may be for related items. Also, a schema can choose not to export records for individual items, but instead just track which items are to be exported and then provide all of the records when finishExport() is called.)

Called after an export operation is completed, this method should return a sequence or be a generator yielding records. These records will be exported along with any that were yielded by calls to exportItem(). The default implementation of this method just returns an empty list, but can be overridden to return or yield records, and perhaps to tear down any temporary data structures created by beginExport() or exportItem().
Called before an import operation begins. The default implementation does nothing, but can be overridden to initialize any data structures that might be needed during the import operation.
Called for each record to be imported, in an order determined automatically by the declared inter-recordtype dependencies. (That is, this method will not be passed a record until all the records it depends on have been imported first.) The default implementation of this method simply looks up and calls the relevant importer method.
Called after an import operation is completed. The default implementation does nothing, but can be overridden to do any necessary cleanup or finish-out of the import process.

Notice that for both importRecord() and exportItem(), there is no requirement that all processing for the given item or record take place immediately. Some complex schema changes (or complex schemas) may need or want to simply keep track of what items are being exported or what records are being imported, and then do the actual importing or exporting in finishImport() or finishExport().

Thus, the sharing framework must not assume that it has seen all records until all finishExport() methods (for each schema being exported) have been called. Similarly, it cannot assume that items in the repository are in their finished state until all of the active schemas' finishImport() methods have been called.

Implementation Details and Open Issues

Processing "Diffs"

Most of the API and examples above are written in terms that assume a more-or-less "complete" and "additive" transfer of records, rather than being difference-oriented.

It is assumed that sharing.NoChange will be used in record fields to indicate that the field's value has not changed, and that the sharing framework will be responsible for replacing records appropriately. Record objects will probably support subtraction to produce diffs, e.g. diffRecord = newRecord - oldRecord. It's possible that the sharing API will do this by exporting both old and new versions of the same collection, and then differencing the records that are in common, and perhaps creating some kind of "deletion" record for records found in the old, but not the new.

At present, however, the API as designed has no support for deletion as such. For well-defined collections (such as the .tags attribute in the examples), this could be handled by clearing the collection when the first record is received, at the cost of re-transmitting all members of the collection. The alternative possibility is to never delete items from collections, only add. (Which is what the above examples do; i.e., tags are always added, and items are always created or updated, but nothing is ever deleted.)

Key Management

The proposed API doesn't have a way to specify what fields of a record are "keys" or are expected to be unique, except indirectly. Inter-record dependencies define some keys by implication, in that the depended-on field must be unique in order for a dependency to have meaning.

However, producing diffs for a record requires that the record know of one or more fields that produce a "primary key" in database terminology, because a difference record must always contain enough information for the receiver to identify what the difference is to be applied to!

At this point, it's not clear to me if we will need some special way to designate a primary key. One obvious way to do it would be to assume that the first field is always the primary key, except that this doesn't work for records like the tagging example, which effectively have all their fields as part of the primary key.

Type Information

Currently, there is no way to define or look up what types are used in what fields, nor is there any formal definition of what types are acceptable. This is a big gaping hole in the current proposal that must be remedied before we can expect any sort of dependable interoperability (e.g. w/Cosmo). For now, we are punting on this until we get a better idea of what's actually needed.

This gap in the proposal also means that we aren't in a position to e.g. define a bunch of record types to describe other record types. This kind of meta-description is important for being able to define an extensible/discoverable sharing format between Chandler and Cosmo.

Multiple Inheritance

There are a few quirks regarding multiple inheritance. First, I think that we're going to have to prohibit a sharing.Schema class from inheriting from more than one other sharing.Schema class, in order to avoid possible ambiguities as to what inherited importers or exporters should be invoked when both base classes have different ones defined, and the subclass doesn't override them.

Second, there is a peculiar corner case that can arise when sharing data between two machines, when multiple parcels and multiple inheritance are involved. Suppose that there are two parcels "a" and "b" containing classes "A" and "B" respectively, both of which are subclasses of pim.Item. And then there is a parcel "c", containing class "C", which inherits from both "A" and "B".

Let us further say that machine 1 has all three parcels installed, but machine 2 has only parcels "a" and "b". As long as these two machines are only sharing instances of "A" and "B", everything will be fine, but if machine 1 transmits a "C" instance to machine 2 there will be a problem.

When machine 2 tries to process the records related to pim.Item or to "A" instances, everything will work correctly. However, the "C" instance will have created both "A" and "B" records, making it impossible for loadItemByUUID() to find a suitable kind. Morgen and I discussed the possibility of having it simply synthesize one, but this could produce some problems of its own, in that the Chandler UI might not know how to correctly display this peculiar A/B hybrid, without additional information that can only be found in parcel "c" -- which machine 2 does not have.

For the first version, we will probably have to have some kind of kludge to detect this situation and handle it -- but precisely how we will handle it is still open to investigation. We may have to create the problem first in order to get a better handle on it.

Schema Registry and Selection

sharing.Schema classes and @sharing.recordtype objects will have to be part of a parcel's persistent data, stored in the repository at the same time that the parcel's kinds and annotations and so forth are initialized. The sharing parcel will probably have some kind of persistent object(s) stored in the repository that reference schemas and index them by their supported record types and kinds, so that the sharing framework can look them up.

The exact nature of these data structures is currently undefined. The data structures needed are dependent on how schemas will need to be selected by the sharing framework, so it's likely that a first cut implementation of the API won't actually create any, and rely on the sharing framework to just explicitly select what schema(s) to use for a particular share.

The selection strategy is further complicated by the possibility that more than one schema might be offering to produce or consume records of the same record type.

And last, but not least, due to the persistent nature of schema classes and recordtype objects, it's likely that the Chandler application will need to either set aside another parcel to contain the core types' sharing schema, or else define that schema within the sharing parcel itself. (Otherwise, we would be introducing circular parcel dependencies between the core types and the sharing parcel.)

EditText of this page (last modified 2007-08-11 11:02:27)
FindPage by browsing, title search , text search or an index
Or try one of these actions: AttachFile, DeletePage, LikePages, LocalSiteMap, SpellCheck