[TransWarp] XMI Peristence Requirements

Sun Jun 30 16:55:35 EDT 2002

One unusual application of the jars model is persistence in XML form.  One 
could create a jar which took an XML document as a parameter, and then 
returned persistent objects from it.  I'd like to use this to resurrect 
TransWarp's XMI support in a more-usable form.  Since XMI can be used to 
represent data according to any model which can be expressed in terms of 
the MOF, it's more than adequate to represent application data for 
persistence.  XMI is also a good format for metadata interchange, 
especially with today's UML tools.

Why use XMI as a persistence format?  I see it as a vehicle for the 
following, in decreasing priority order:

1. Expressing metadata: data models, object models, workflow models, etc.

2. Permitting easy tests of domain-level operations against data that can 
be read and edited by humans, as well as easily compared against "correct" 
result texts.

3. An XML pickling format for archiving or moving objects or transactions 
between databases.

So here's what I'd like see PEAK support in the way of XMI persistence 
capabilities:

CRITICAL: support XMI 1.1, the current standard, which is *much* easier for 
humans to read and edit than XMI 1.0.

CRITICAL: retrieve model elements from an XMI file using persistence and 
ghosts, so that an entire model need not be immediately marshalled into 
objects from the XML.

CRITICAL: the mapping between the objects to be loaded and the XMI format 
should be specifiable in terms of a MOF model loaded from another XMI 
document.  That is, we shouldn't have to custom-write mappings for UML and 
CWM documetns, since they have XMI-format MOF models.  And any new UML 
structural models created for applications can be translated to MOF models 
and hence saved as an XMI-format model.  In other words, we shouldn't have 
to hand-write XMI persistence mappings ever, except for maybe one to 
bootstrap the MOF metamodel in the first place.

CRITICAL: support saving modified objects as a new XMI document, with 
whitespace and formatting optional.

CRITICAL: support the XMI document itself being a persistent object whose 
state (text) is saved in another database, such as ZODB, an RDBMS, or even 
in the filesystem.  (This actually shouldn't require any special actions, 
as long as the XMI document object itself is written to be 
persistent.  Committing the objects loaded from the XMI document will 
modify the XMI document, causing it to get committed as well, which will 
write it to the underlying DB, which will then get committed...)

IMPORTANT: support saving as a modified in-place XMI document, with any 
vendor XMI extensions/annotations remaning in place.

HELPFUL: support saving a modified in-place XMI document with all comments, 
existing whitespace, etc. intact.

HELPFUL: support XMI 1.0 (this might move up to CRITICAL if the UML tools 
Ty and I use don't end up supporting XMI 1.1 very well).

NICE-TO-HAVE: support external references to model elements outside the XMI 
document, via appropriate plug-ins to look them up with.

NICE-TO-HAVE: support writing a commit transaction as an "XMI.diff"-format 
file, and support reading and applying it to the *objects* (not the XMI 
document) it refers to.  This would give us a kind of domain-level 
"transaction log" or "record and play back commands" capability.  But it'll 
probably be a long while before we can even figure out *how* to do this.

Open issues in the design:

* Some kind of "metamodel registry" is needed, so the XMI jar can find the 
right metamodel mapping, based on the header info in the XML file.

* What data structure should be used for the document itself?  One of the 
Python DOMs (big and slow)?  The RXP "lightweight" DOM (small and 
superfast, but lacking in backpointers, and I believe it drops out comments 
and whitespace)?  A custom construct of our own, based on SAX/SOX?

* What should be used as persistent ID's?  XMI elements aren't required to 
have an ID, so both the XMI.id and XMI.uuid attributes can only be used as 
alternate keys.  XPath is probably too heavy, Python id() would only work 
if we kept some kind of mapping back to the nodes, and using the nodes 
themselves as ID's won't work for the RXP lightweight DOM if we need to get 
to a parent object.  It's probably important to note that it's not just XML 
elements that can represent an object; Attributes in XMI can effectively 
represent a PersistentList that has to be considered an object with a 
persistent ID for our purposes.  That is, attribute nodes need to be 
referenceable too.

* Reference counting/GC.  Objects are written in an XMI file as nested 
within their containing objects if the relationship is compositional, or on 
a first-reference basis if they have no composition container.  This means 
that if the relationship that caused them to be written in full in the 
source document, is removed in the transaction, the object's XML elements 
must be moved to another location, if one exists, or deleted if no other 
reference to the object exists.  If we are saving objects by creating a new 
document, this isn't a big deal because we'll just write what we need.  If 
we're trying to simply *modify* a DOM rather than writing a new document, 
we'll need to know where all the references to an item are and do some 
serious editing.

* The precise nature of the metamodel mapping.  I have some uncertainties 
right now about whether an XMI DTD can be unambiguously decoded using data 
which is strictly in the MOF model from which the DTD was generated.  It 
seems to me there are places where the implementer of the DTD has a choice 
in how to express items.  :(  This is not a problem for our own DTD's and 
custom metamodels, but is an issue in reading and writing intelligible UML 
models for tools that might only be able to handle reading something that's 
written the way they write it, and not another way that's perfectly valid 
according to spec.  :(

Summary: It looks like there are many issues that need research before work 
on the design itself can proceed.  I need to look more thoroughly at both 
the DOM implementations that are out there, and the XMI specs.