[PEAK] Diffs, caching, and constraints in peak.query
Phillip J. Eby
pje at telecommunity.com
Tue May 11 17:33:58 EDT 2004
Enforcing constraints is a tricky business. In principle, it's as simple
as defining queries that identify constraint violations, and executing
those queries at a desired point in time. In practice, that only works if
performance doesn't matter. :)
So, to efficiently verify constraints, it's necessary to:
1. Verify only facts that are added or removed
2. Verify only constraints that can be economically verified, given the
current execution context.
Item 2 matters because the cost of checking constraints is quite
variable. Many kinds of constraints are trivial to check, such as
one-to-one, NOT NULL, etc. The simplest of arity constraints can often be
checked without the need for even any database access. Uniqueness and
referential integrity constraints require database access to verify, but
are usually simple and relatively efficient to query, and for some kinds of
data (such as a table of country names) can even be cached across
transactions. But constraints that involve transitive relationships,
acyclic graphs, etc., queries may be complex and time consuming.
A simple approach to dealing with this would be to assign a "cost factor"
to constraints, and then have an option to request verifying constraints up
to a given cost, with "zero cost" constraints being checked
immediately. Typical cost threshholds would be:
* Instantaneous (in-memory, simple computation, check immediately, even
while typing)
* GUI Edit (fast enough to check between fields)
* Background (needs to run in background on a GUI so as not to slow response)
* Transaction (must wait till user clicks OK, or use to validate a web form
submission)
* Audited (after-the-fact business rules enforcement)
When committing a transaction, an editing context would force the cost
threshhold to the "transaction" level, to ensure that no transaction-level
constraints are left unchecked.
The threshhold could simply control the distribution of change events to
the constraint fact types. In essence, a constraint would subscribe to
change event sources that were rated as to the current cost threshhold of
the editing context, and that would buffer change events whenever the
threshhold is too low. So, if there is a lot of thrashing of a fact's
values (because e.g. the user keeps changing it in a GUI), these changes
will get consolidated to just one fact change before the change is seen by
any expensive-to-check constraints.
How to apply diffs to complex queries? (map so as to include key identity
data in joins?)
If we can apply diffs to complex queries, this is potentially the same as
constraint checking.... but maybe not efficiently. :(
Need to evaluate different kinds of constraints...
* Mandatory role - object exists and not role exists
* Uniqueness - self-join on key
* Mutual exclusion - join on excluded role pairs (factorial!)
* External uniqueness
* Ring constraints
* Subtyping?
* Derivation?
(To be continued at a later date...)
More information about the PEAK
mailing list