[TransWarp] Configuration System Overview

Thu Jul 4 13:16:23 EDT 2002

The purpose of the PEAK configuration system is to allow components to 
expose hook points for binding to configuration sources from outside 
themselves.  PEAK's binding machinery already provides an 
acquisition-by-name mechanism for such binding, but this is primarily 
intended for relatively-local sharing of components that are configured in 
source code.

The peak.running.config package, possibly in connection with functions or 
classes made available via the binding package, is intended to allow hook 
points to be configured from outside of program source code, through 
external configuration files, OS environment variables, or other sources 
such as LDAP directories, databases, CORBA services, etc.

A key issue in using these external resources for configuration, is that 
the resources available for doing configuration vary significantly between 
platforms and operating environments.  Further, the security issues 
involved in using these resources can vary between platforms as well, and 
the tools available for working with the configuration resources can be 
equally varied.

Therefore, PEAK should not impose its own special set of configuration 
mechanisms upon the application developer or administrator.  Instead, it 
should be possible for the application developer to specify configuration 
policies that will be used.  Ideally, these policies (which represent *how* 
configuration data is obtained), will be specifiable in a very brief form, 
suitable for adding to the beginning of a startup script, or to be 
incorporated in a module that will be imported for all programs requiring 
that configuration policy.  Administrators should then have maximum 
flexibility in configuring an application using the tools they have available.

Apart from a few useful "standard" configuration sources, the configuration 
package will not focus so much on getting configuration data, as on 
establishing an architecture within which configuration data can be 
funnelled from configuration providers to configuration consumers.

Configuration Policy
--------------------

Configuration policy (or "meta configuration", as it's called in Zope 3) 
consists of providing information about where and how to retrieve other 
configuration information.  In the case of PEAK, the configuration policies 
will determine how the default configuration stack will be arranged.  That 
is, what sources will provide configuration information, and what 
precedence order they will have relative to one another.

The basic API will be something like 'config.setup(namespace, 
**keywordArgs)'.  The namespace represents a prefix that will be added to 
keywordArgs before they are saved in the root configuration space.  So 
you'd do something like:

from peak.api import *
import os

config.setup('peak.config.policy',
     config_file_name = 'app.cfg',
     config_file_home = os.environ.get('HOME', os.getcwd())
)

This would set the configuration properties 
'peak.config.policy.use_environment' and 
'peak.config.policy.config_file_name'.  Which brings us to the next point...

Configuration Namespaces
------------------------

Configuration variables must be named in a way that allows multiple 
configuration namespaces to co-exist without interference.  A logging 
package and a python import utility may both want a 'path' configuration 
variable that means something completely different.  This means that any 
usage of a configuration variable must be using a qualified name of some 
sort.  We will follow the Java (and I believe XWindows) approach of using 
'.'-qualified names, since this also allows for resonance with Python 
package names, where relevant, and is easily distinguished from component 
path lookups (which use '/'-separated names).

Most configuration sources, however, will probably not have a notion of 
namespaces, and many will not support '.'-qualified names at all.  For 
example, Posix/Win32 environment variables have poor or non-existent 
ability to handle '.' in a variable name.  Qualified names are also tedious 
to deal with in Python syntax, since keyword argument names can't contain 
'.', and it would be tedious to type out the qualifications even if they 
could be used.  So the config system will need some utility methods to deal 
with namespaces, similar to the previous example where a qualifier string 
was used in conjunction with keyword arguments.

Configuration source objects will need namespace mapping capabilities, to 
express that say, an environment variable should be used to set certain 
properties, or that a property should be looked up in a series of 
environment variables until it is found.  (In other words, namespace 
mapping is potentially many-to-many.)

Such mappings are themselves tedious to produce and maintain, and so should 
also be reusable, for example in a package.  They should also not be part 
of code, unless it is to provide defaults or set core policy.  Namespace 
mappings instead should themselves be defined via configuration namespaces.

Yes, my head is starting to spin, too.  In practice, you won't need all 
this flexibility - at least, not very often.  Environment variables are 
kind of a worst-case scenario: we need to support them, but they're a huge 
shared namespace filled with all sorts of things, from the wonderful, to 
the useless, to the downright dangerous.  You don't want a broad mapping of 
any configuration namespace, no matter how qualified, directly onto 
environment variables, for security reasons if nothing else.

For dealing with environment variables, it may be best not to simply not 
support them directly at all!  Instead, as in the example above, 
configuration policy code could be used to explicitly read environment 
variables and place them in the configuration stack.  Or, perhaps an 
"EnvironmentVars" configuration class could be used, as follows:

ev = config.EnvironmentVars(
     PYTHON_PATH = [
         'something.that.wants.python.path',
         'something.else'
     ],
     HOME = ['where.config.files.go']
)

The resulting object could be placed in the configuration stack, and any 
access attempt to a property in the string lists would be looked up from 
the specified environment variable.  I think this is about as far as we 
can/should go with environment support.

The other kind of mapping (search in multiple places), might be spelled 
like this:

newConfig = config.Remapper(searchConfig, 'some.prefix',
     prop1 = ['place.1', 'place.2', ...]
     prop2 = ['place.2', 'place.1', ...]
)

Items like this would be first-class configuration objects, capable of 
being used to compose a configuration stack.  Looking for 
newConfig["some.prefix.prop1"] would cause the remapper to look for 
searchConfig["place.1"], searchConfig["place.2"], and so on.

This second kind of mapping is probably more likely to be used on the 
component side (configuration consumer), rather than on the administration 
side (configuration provider).  It's more likely that as an administrator, 
you'll want to have the variables and properties that make sense to your 
operating environment, application model, etc., and have policy settings 
that "push" them towards the namespaces of things that want them.  There 
will likely be two levels of policy here: "system-wide" type policies, and 
app-specific policies.  Probably the system-wide policy will be used to 
specify where app-specific policies will be looked for, e.g. in the 
application's home directory, or using the name of the __main__ script, 
etc.  App-specific policies may refer to system-level policies to set 
defaults for their policies.

(I keep thinking this is all far too complicated, and it 
is.  Unfortunately, it's because the needs are complicated.  Luckily, we 
can apply heavy does of STACSTAP via sensible defaults.  We just need to 
have the hook points available.)

Configuration Stacking
----------------------

It should be possible to assemble configuration objects in an ordered way, 
such that higher-precedence configuration settings override 
lower-precedence settings.  Or, depending on your viewpoint, such that 
lower-precedence settings provide defaults for higher-precedence 
settings.  This should be done through a simple API like:

lowConfig.withOverrides(highConfig), or

highConfig.withDefaults(lowConfig)

which both produce an identical configuration stack.

Originally, we thought of using addition ('+' operator) to assemble stacks 
from configuration objects, but there is a certain ambiguity to the meaning 
of the ordering.  The notation above, while more verbose, is unequivocal 
about what is happening.  It can also be chained, as in:

medium.withDefaults(low).withOverrides(high).withDefaults(extraLow)

The order of the resulting stack will be: high, medium, low, 
extraLow.  'withDefaults()' always adds to the *bottom* of the stack, and 
'withOverrides()' always adds to the top of the stack.

There is much more to say about configuration stacks, including how their 
"write many, read once" semantics will work, caching, configuration 
schemas, "deferred" settings, etc., but I will leave that to another post, 
as I think this is good enough for an initial overview.