[TransWarp] Mapping Elements to/from records in tables
Phillip J. Eby
pje at telecommunity.com
Sun Jun 30 21:07:37 EDT 2002
Background: for large and complex data models, it may be best to use a lot
of simple, automatically generated jars whose contents simply represent
rows, or collections thereof. But domain-level objects will often have
data from more than one table, and of course they will need to be instances
of domain level classes, rather than of a generic "row" class. So, we need
a "dispatch" layer which maps between domain and data-level concepts.
We could implement the dispatch layer using plain AbstractJars, defining
load() and save() methods that retrieved and mapped and saved the necessary
attributes. However, this would force loading state from all of the tables
that made up the object, even if they were never used. It would also force
writing back to all the tables, even if only one of them had changed data.
In principle, that doesn't sound like a bad thing, especially since Ty and
I already have apps that effectively do that. Where this *won't* work
well, is multi-row queries, where the query doesn't join in all the tables
that provide the object's state. Either the ghost() method will have to
reject these partial states (except perhaps to determine the a ghost's new
class), or it will have to issue a single-row query to get the rest of the
data. Either way, the result won't be efficient for any kind of
multi-object processing.
I believe it would be more modular and more efficient (in terms of DB
retrievals) to consider the "state" of an Element as being mapped to some
number of "records", where a record is a row from one of the tables that
make up the Element's state. If an Element's "state" is just a dictionary
of names to records, then its state can be loaded without necessarily
loading the state of any of its records, except to the extent needed to
determine the Element's class. Any state preloading done by multi-row
queries will be on the records themselves, but of course the element can
take advantage of it.
Now the issue is that either 1) the element classes need to be changed to
access their state from records, or 2) there needs to be some kind of
intervening dispatch object which the element can use to interact with its
records.
The first approach is certainly possible, since one can simply use module
inheritance and add extra information to the element classes' features to
tell them what attribute of what record they should use, and any mapping
information.
But the second approach may have some added benefits. Notably, it might be
possible to define the mapping in a UML model, or to use an automated
mapping for data models that follow standardized mapping patterns.
Either way, we need to consider the types of structural features that
elements can have (fields, references, collections, and sequences), and
look at the options for implementing them in common storage mechanisms (SQL
and LDAP).
Fields
------
Fields contain immutable values, whose contents, if any, are also
immutable. Strings, numbers, and tuples consisting only of strings,
numbers, or tuples are obvious examples. There are only two possible
mappings for these:
* One-to-one -- the Element field maps to one field in an underlying
record. It may or may not have the same name, and there may be read and
write expressions to convert the data type or value in some way. So this
mapping's knowledge needs to include: which element field, which record,
which record field, and any transformation expressions.
* One-to-many -- the Element field corresponds to more than one
field. This might occur, for example, when the Element wants a date/time
value, but the DB has separate fields for the date and the time. This is a
more complex mapping, in that the write transformation probably needs to be
a function rather than just an expression. But it's still pretty
straightforward.
References
----------
References are a (possibly None) pointer to another Element. At the
Element level, they're actually implemented as a special form of
Collection. At the database level, they really only have one kind of mapping:
* Foreign key reference -- A field in a record contains a value that can be
used as an object ID for another jar to retrieve the referenced
object. The mapping for this needs to know what jar the object should be
gotten from, and an expression to compute a new foreign key value when the
reference is changed. Of course, it also needs to know the element's name
for the reference feature and the underlying record and field. (The
foreign key expression will typically be just the referenced object's oid
as given by the referenced Element jar, since for simplicity you'll
normally want the Element-level jars to use the same primary oids as the
tables they're based on. Alternate key jars will suffice to get you to any
domain-level keys that aren't database oids.)
It would probably be more correct to say that this mapping has two
expressions: one to compute the oid for lookup in the referenced element
jar, and one to compute the value placed in the record when the reference
is changed. By default, both of these expressions won't do any
transformation. I keep thinking that it's also possible you might have a
1-1 query jar to implement something like this, but I can't for the life of
me think of a use case. So YAGNI. Heck, having expressions at all for
this mapping is probably YAGNI. (YAGNI=You Ain't Gonna Need It.)
Collections
-----------
Collections are (possibly ordered) sequences of related Elements. There
are two ways to store them in a database:
* Pointer/ID list -- this is the way LDAP often does it. You'll have a
multi-valued attribute that contains DN's, which are effectively the OIDs
of the targeted objects. This mapping needs to know the name of the
underlying field, and the jar to retrieve the items from. This is where
"lazy" state management of a dispatch layer really shines. If we were
loading state for one of these without a dispatch layer, we'd have to get
ghosts for all of these references right away. Which would be a big
headache, because guess what, LDAP objects are polymorphic! That means
that to load a ghost, you need to know the class, and to know the class,
you have to get the state, which means that just to load the state of an
LDAP group object you'd have to load the entire freaking state of every
member and admin of the group. Anyway, the mapping here is still pretty
straightforward. To load, it creates a list of ghosts from the ID values,
and to save, it creates a list of IDs from the ghosts. (Note: the
model.Collection feature implementation flags the Element as having changed
when the collection changes, so we don't need to have a persistent list or
anything like that. The downside is that it'll flag a change for the other
kind of collection mapping, too, even though it's not needed... Probably
all of these mappings should check when saving things that they're actually
changing the underlying values!)
* Inverse Foreign Key relationship -- this is the way SQL does it. There's
no field on any of the records that comprise the Element; instead, you need
to use a query jar that returns a ghosted PersistentList of the objects
that point to the Element. This mapping just needs to know the jar to use,
and how to determine the OID. Normally, you'll just use the OID of the
Element itself. This mapping is the only one that doesn't have to
implement any kind of "save" capability, because it's the sole
responsibility of the other side of the relationship.
Sequences
---------
A sequence is an ordered Collection. It has the same mapping types as a
Collection, but with some added twists. For pointer/ID list mappings,
there's nothing really special to do. The order is the order; either the
underlying DB supports ordering or not. But for foreign key relationships,
the query jar needs to have a sort key, *and* it needs to actually
implement a "save()" operation on its PersistentList! That save()
operation will go through the list of referencing objects and ensure that
their sort keys match the sequence in the list. This is a highly
application- and data model-specific operation. But STASCTAP. (Simple
Things Are Simple, Complex Things Are Possible.)
Wrap-Up, and Implementation Notes
=================================
It sounds like an application will have a minimum of: one jar per table,
plus one jar per root Element type (root types are ones with no DB-level
types they inherit from), and one jar per inverse foreign key reference
type. Plus of course any actual *query* jars, and any DB-level jars that
the foreign key and query jars rely on.
For a WarpCORE-based DB, there will be few Element jars, because WarpCORE
has few root types (objects and events are the main ones). And if the
table jars and DB-level foreign and alternate key jars are instantiated
from metadata, there's little programming to be done there. Probably some
of the Element-level alternate key and foreign key jars could be generated
from metadata as well. A set of mapping classes that implemented the
roughly 5-7 mapping strategies described in this posting, would probably
make the specification of the rest pretty straightforward coding. Probably
the design of a compact notation for expressing the parameters of the
mappings will be the most interesting part of developing them.
Let's look again at the issue of dispatch layer code vs. dispatch
objects. It still seems like both are practical approaches to take; none
of the mapping styles inherently rule out being used by either. However,
many of the mappings are computationally intensive, implying that values
requiring transformation should be cached. But cached values can become
out-of-date, and if the lower level objects aren't directly updated on
writes, they can become out-of-date as well. I'm going to have to think
about this issue some more, as there are different ways to handle it with
each dispatch approach.
Another open issue is record format: a standardized way of representing
DB-level states. We may wish to devise classes that have tuple-like
storage requirements, ala ZODB's Record type. But this is partly dependent
on how we end up addressing the caching/transformation issues.
More information about the PEAK
mailing list