[TransWarp] More configuration design: the "schema sandwich"

Thu Jul 4 21:23:06 EDT 2002

One fundamentally frustrating aspect of designing a good configuration 
system for our purposes, is that the configuration system must make it 
possible to specify, without programming, things that are hard to do 
without programming.  For example, referencing pre-existing objects, 
performing calculations based on other settings, and so on.  Most of the 
handy sources of configuration data are text-based, and offer little in the 
way of data structure support.

Configuration consumers want objects, not text.  But consumers don't know 
anything about the underlying configuration providers, and shouldn't have 
to.  It's up to the providers to supply information that is of the correct 
type, and it's up to the people who are setting configuration policies and 
assembling the configuration stack to do so in such a way that the correct 
data is provided.

However, if an administrator fails to set a configuration value correctly, 
a system may fail with an unhelpful error message, that does not show the 
source of the bad data.  Further, the failure may be difficult to reproduce 
if the configuration setting in question is infrequently accessed by the 
running program.  Better to at least have a TypeError or 
"ConfigurationError" raised with the configuration variable name and a 
description of the problem, ideally with some kind of "reverse traceback" 
that shows where the incorrect setting came from.

To do the checking and calculations, we need schemas for our settings.  A 
schema really just consists of a callable that's passed a name, a value, 
and the place the value was found.  It validates or transforms the value, 
and returns it or raises an error.  The value passed in may be a singleton, 
"UNDEFINED", to indicate that no value for the name was found.  In that 
event, the schema may provide a default value, by looking up other settings 
in the same "place" where the setting was considered undefined.

We'll implement the configuration stack as a chain of filters, each of 
which can introduce a collection of schemas (or values, or both).  It's 
likely that we'll also have the schema-introducing filters also 
automatically interpret any LinkRef() instances as names to be looked up 
using the configuration stack from the filter down as the configuration for 
the InitialContext used in the lookup.  This ensures that the schema 
checking will run against objects, not names.

A typical configuration stack will probably start with a collection of 
schemas representing meta-configuration policies.  These schemas will 
simply be algorithms for producing base configuration defaults, using 
environment variables or other contextual information to "prime the pump" 
for loading more complex configuration files or other systems.

A second-tier configuration class or function will take this starter 
meta-configuration and look things up in it, following the directions 
supplied to build up the stack with configuration providers for config 
files, lookups from databases, etc.  Then, it'll finish off the top of the 
stack with a schema filter based on a global schema registry, into which 
imported modules will place their property schema definitions.  Thus, any 
access to configuration settings will pass through the appropriate schema 
checking and calculation of defaults.  Caching can occur at all levels, 
even if the stack becomes a tree, through different parts of the system 
tee-ing off their own top-level filters.  But under most circumstances, few 
if any levels will exist beyond the top of the "schema sandwich" 
(meta-config schemas on the bottom, app-config schemas on the top, 
configuration providers in the middle).

Most components in a tree will simply share their root object's 
configuration stack, but there will be a facility whereby they can define a 
method that will add schema information or other filters to the top of the 
stack for their own use.  The filtered stack will then be inherited by 
their child components.

Wow.  That seemed easy enough to explain, considering it took me all day to 
work it out.  I decided to spare the list all the text I wrote in the 
process of figuring it out, and just write a new letter (this one) 
supplying the answer.  :)  It was a very complex process to work out 
exactly where to place each of the functions of name resolution, type 
checking/conversions, default calculations, etc.  The end result is very 
STASCTAP, though.

At this point there are still some questions in my mind about how certain 
things will look, what they'll be called, etc., but I think I'm going to 
have to actually start writing code to sort those issues out.  My wrists 
are starting to act up from all this typing over the last week or two, so I 
think it's best if I stop for tonight.  It may be a couple days before I'll 
be able to resume posting or coding in volume.