[PEAK] Diffs, caching, and constraints in peak.query

Phillip J. Eby pje at telecommunity.com
Tue May 11 17:33:58 EDT 2004

Enforcing constraints is a tricky business.  In principle, it's as simple 
as defining queries that identify constraint violations, and executing 
those queries at a desired point in time.  In practice, that only works if 
performance doesn't matter.  :)

So, to efficiently verify constraints, it's necessary to:

1. Verify only facts that are added or removed

2. Verify only constraints that can be economically verified, given the 
current execution context.

Item 2 matters because the cost of checking constraints is quite 
variable.  Many kinds of constraints are trivial to check, such as 
one-to-one, NOT NULL, etc.  The simplest of arity constraints can often be 
checked without the need for even any database access.  Uniqueness and 
referential integrity constraints require database access to verify, but 
are usually simple and relatively efficient to query, and for some kinds of 
data (such as a table of country names) can even be cached across 
transactions.  But constraints that involve transitive relationships, 
acyclic graphs, etc., queries may be complex and time consuming.

A simple approach to dealing with this would be to assign a "cost factor" 
to constraints, and then have an option to request verifying constraints up 
to a given cost, with "zero cost" constraints being checked 
immediately.  Typical cost threshholds would be:

* Instantaneous (in-memory, simple computation, check immediately, even 
while typing)
* GUI Edit (fast enough to check between fields)
* Background (needs to run in background on a GUI so as not to slow response)
* Transaction (must wait till user clicks OK, or use to validate a web form 
* Audited (after-the-fact business rules enforcement)

When committing a transaction, an editing context would force the cost 
threshhold to the "transaction" level, to ensure that no transaction-level 
constraints are left unchecked.

The threshhold could simply control the distribution of change events to 
the constraint fact types.  In essence, a constraint would subscribe to 
change event sources that were rated as to the current cost threshhold of 
the editing context, and that would buffer change events whenever the 
threshhold is too low.  So, if there is a lot of thrashing of a fact's 
values (because e.g. the user keeps changing it in a GUI), these changes 
will get consolidated to just one fact change before the change is seen by 
any expensive-to-check constraints.

How to apply diffs to complex queries?  (map so as to include key identity 
data in joins?)

If we can apply diffs to complex queries, this is potentially the same as 
constraint checking....  but maybe not efficiently.  :(

Need to evaluate different kinds of constraints...

* Mandatory role - object exists and not role exists

* Uniqueness - self-join on key

* Mutual exclusion - join on excluded role pairs (factorial!)

* External uniqueness

* Ring constraints

* Subtyping?

* Derivation?

(To be continued at a later date...)

More information about the PEAK mailing list