[PEAK] Re: Proposal for another Wiki "tutorial"
Phillip J. Eby
pje at telecommunity.com
Wed Jul 14 16:59:20 EDT 2004
At 06:42 PM 7/14/04 +0100, Paul Moore wrote:
>Paul Moore <pf_moore at yahoo.co.uk> writes:
>
> > I'll start an initial Wiki page in the next day or two. Have a look
> > and see what you think.
>
>OK, I've put up some initial stuff. All talk and no code so far, but
>maybe it gives a flavour of where I intend to go. Comments gratefully
>accepted (here, on the Wiki, or via email).
A few design comments... hope you don't mind.
If you are going to develop a framework like this with PEAK, the place to
start is with "enduring abstractions". That is, things that will always be
a part of the application. Or, to put it another way, what is the
application's "domain"?
In the case of your monitoring system, the application domain is the status
of systems. Or perhaps more precisely, the services provided by those
systems. Thus, systems and services are your application domain. Services
have a status. Services are offered by systems, which may be in groups
such as clusters or networks.
Although the mechanics of what you *do* with them may change over time,
your application(s) are going to always be dealing with services, systems,
groups, and statuses, so there are your enduring abstractions.
So, what do you *do* with these things? The kinds of functionality you can
have are:
* Report on the current status of a service, system, or group, either in
summary or detail
* Report on historical statuses (e.g. average uptime %)
* React to a change in status, e.g. send an e-mail
So, now that we know both the "nouns" and the "verbs" of our enduring
application domain, we can create interfaces for these things. For
example, we could define the interface for services so that they expose an
'events.IValue' for their current status, making it easy for reactive
systems to "listen" to that event.
To go much further, though, we really have to flesh out what a "status"
is. We could look at it as a simple "upness" or "downness" status, and
that might be useful for some things. More relevantly, we could view a
status as a metric, or collection of metrics, that apply to a particular
service (or aggregation thereof), at a particular point in time. (This
last item is important for historical analysis, reacting to events, and
perhaps even for service monitors to decide whether they should "ping" a
service again.)
You'll notice here that I have not addressed controllers or repositories or
anything like that. Those are what we call "solution-domain" components,
as opposed to "problem-domain" components. In general solution-domain
components are much less enduring and reusable than problem-domain
components. Also, if we design our problem-domain components well, often
the solution-domain components evaporate into little more than glorified
startup scripts!
Consider this: if you had objects to represent services, hosts, and so
forth, that offered current status info and could trigger callbacks to
systems that recorded history or took action, and they automatically
handled the monitoring, what would be left to write? Two things:
* "plug-ins" to perform specific tasks like monitoring events of a
particular type and sending notifications
* reporting scripts to walk the domain objects and generate output of an
appropriate format
These are specific day-to-day programming tasks to be accomplished with
your framework's components, rather than being part of the framework
themselves.
Also, the idea of a "repository" isn't really that useful either. This is
just an issue of storage, and that can and should be abstracted away. In
the case of reactive plugins, they won't care because they will just get
attached to the right domain objects. In the case of your scripts, they
can simply reference the specific DM they want to load from, or use ZConfig
to load a specified configuration file containing all the
service/system/whatever objects.
So, now we begin to see that we actually have/want some sort of monitoring
"server", in the sense that we don't want every little script doing its own
status testing. This is sort of like your "controller" concept, only much
simpler. All the "server" really is, is a script that loads up the domain
objects, attaches reaction and reporting plugins to the domain objects, and
runs the system's event loop.
Now, it may be that I have just designed something more like John Landahl
wants than what you want. :) But, my main intent is to show how in
designing with PEAK, you can start with what you "really want", focusing on
the essentials of the problem rather than on accidents of
implementation. In essence, "repository" and "controller" are just
computer words that aren't part of what you're really trying to do. PEAK
shoves these considerations off to the side using generic abstractions like
DMs, commands, event loops, and executable configuration. Thus, the bulk
of the code that *you* write is about the problem domain, i.e. services and
statuses, reports and reactions.
Does that make sense to you?
Anyway, the most interesting part, I think, of designing a framework like
this, is the status metrics, because they represent a point of change over
time. Metrics may be discrete (e.g. boolean or enumerations) or numeric
with units. And they need names. Some metrics may be derived from other
metrics. With an appropriate design for this part of the system, it should
be possible to make fairly generic reporting and reaction tools, as well as
developing advanced metrics that summarize various aspects of system state,
like for example a color-coding scheme that takes various other
measurements into consideration.
Actually, measurements should probably not just be point-in-time, but also
support across-time measurements. E.g. a metric for "% uptime over period".
Hm. Anyway, I better stop now, because at this point I'm halfway to making
your framework into a generalized enterprise management reporting system
that could just as easily report on people or departments and products as
it could on systems... :)
More information about the PEAK
mailing list