[PEAK] The path of model and storage

Wed Jul 28 17:21:52 EDT 2004

Me:
> Are you just forcing the mutexing down to consumer code then?

Phillip:
> Not if you don't need to share the objects across threads, no.

But if you _wish_ to share them, it looks like the client is
responsible. 'Need' is a wonderfully relative term. ;)

> ...conflicts will be resolved via the physical database's 
> locking facilities.
> 
> So, there's absolutely no need to duplicate that logic, either at the 
> consumer code level or in the framework, unless for some 
> reason (that I can't presently fathom) you *want* to.

To save processor time. I'd say about half of my data benefits from
being retained in memory, as fully-formed objects, in a single process
throughout the lifetime of the app. At the same time, it's the sort of
data which can't simply be hard-coded--it needs to be user-configurable
like other domain objects, and benefits greatly from the same framework
tools that other domain objects enjoy. I'm talking about things like
system tables for populating lists, which have lots of reads (sometimes
every page view) but aren't _quite_ immutable; the users might have some
interface for occasionally mutating the list. MRU's also fall in here,
as does most of the content on any random Amazon page.

> (Note that even if what you want is an in-memory database of 
> some kind, you still just implement a workspace wrapper for it,
> and each thread's access to it is mediated by that thread's
> private workspace instance.  Conflict resolution is managed
> by the workspace and the DB, not the client.)

I'm not interested so much in an in-memory-DB-as-endpoint (you could
just use a null-file bsddb, for example), but effectively a chain from
client->cache->DM.

> It's a rare business application that has a use case for actual 
> concurrency (as opposed to asynchrony).

It's not so much a question of concurrency vs asynchrony as it is one of
overhead; plenty of enterprise apps could benefit from having a variety
of lifecycle* mechanisms available. When 100 users request the same
dynamic page (which takes 1 second to build) within a 1-minute window,
one can't help but wonder if there is a way to avoid the continuous
processes of 1) database reads, 2) ORM DB-to-object creation and
coercion overhead, and 3) workspace creation and population overhead. In
Python, these options are limited because calls to DB's (written in
static lang's) often still outperform naive caching implemented in pure
Python. So sharing such intermediate caches is not feasible for a good
chunk of your data; a tool like the one I described would go a long way
to helping that situation. And I won't even mention the Prevalence folks
who have use cases where *all* your data fits in memory. Oops, I just
mentioned them. Dang.

I've tried to write dejavu in such a way that such lifecycle decisions
are deferred from developers to deployers, because:

1) The number of possible combinations of DB's (or other storage),
webservers, GUI's, platforms, distributed object utopian nightmares(!)
and plain 'ol app-specific needs is vast, and
2) Most developers who use your framework won't notice they have
concurrency issues until their app has been deployed.

I've tried to give deployers the tools to quickly and easily test which
concurrency/lifecycle mechanisms work best for their particular
combination of components; significantly, this is configurable per
domain object, not per app.

Long diatribe. I'll stop now. But it's a fun discussion. :)

Robert Brewer
MIS
Amor Ministries
fumanchu at amor.org

* I think I've been using the word 'persistence' in both memory and DB
contexts; I'll use 'lifecycle' from now on to mean the in-memory
management of objects, as opposed to on-disk storage mechanisms.