[PEAK] Re: Proposal for another Wiki "tutorial"

Thu Jul 15 14:29:12 EDT 2004

On Wed, 14 Jul 2004 16:59:20, Phillip J. Eby wrote:
> If you are going to develop a framework like this with PEAK, the place to 
> start is with "enduring abstractions".  That is, things that will always be 
> a part of the application.  Or, to put it another way, what is the 
> application's "domain"?

I think I am coming at the whole problem from a different angle. It's worth
stating that I have already written this suite of applications, a number of
times now.  The core logic is very simple, and frankly far below the level of
complexity that I suspect PEAK is aimed at. (For example, something like your
"monitoring server").

However, what has bitten me over and over is the problem of coupling between
parts of the system (specifically, the 4 "components" I identify). I'm looking
at PEAK as a way of reducing that coupling.

For example, the repository component. I've oversimplified here (as we're
just at the start of the tutorial...) in that my "real life" repository
doesn't only deliver a list of systems, but also stores results from some of
the more substantial "testers". So I fully expect the IRepository interface to
grow additional methods as the requirements expand. But the key point in terms
of "low-level PEAK" is that my current scripts are quite tightly coupled to
the repository implementation. The main app needs to query the repository for
systems. Some of the scripts have testers that need to send data to the
repository - this complicates the tester interface, as I have to pass the
repository object in, "just in case". And when I multi-thread all this, it
gets worse again, as the repository access must be serialised somewhere. By
using PEAK, I can have the repository as a component, and then testers that
need it, *and only them* can do a binding.Obtain(IRepository). And if I need
to change the repository implementation, none of the other components need
change.

That is what I'm trying to get from these first stages of the tutorial.
Decoupling of components, so that they can be replaced at will (my first-cut
code uses a repository which just returns a static list of "servers", and my
tester just says "OK" or not based on the server name - unrealistic, but a
great way to get working code from square one).

The next stage of the tutorial will develop this by using the naming and
config modules to bind the components at runtime, taking the "what repository
implementation do I use" decision out of the Python code, and into the config
file. At this stage, the value of this is a little hidden (why would I have 2
repository implementations at any one time?) but I think (and I stress that I
haven't got there yet, so I don't know!) that some of the later developments
(repository in a SQL database, where what composes a "server" depends on the
query I supply at runtime) will make the value of this more apparent.

So, a key point of the point I want to make (or, conversely, the benefit I'm
looking for from PEAK) is the ability to improve an existing application by
increasing the decoupling and configurability incrementally. Ground-up design
isn't what I'm looking at right now (and it's covered by other documentation,
anyway). The biggest learning issue for me with PEAK was the "how do I get
there from here" question. I have working code, I want to improve it. I can't
see how to switch to PEAK ini-style files, DMs, etc without rewriting the
whole thing from scratch (and reworking how I think about the problem at the
same time). I'm looking for baby steps here :-)

> In the case of your monitoring system, the application domain is the status 
> of systems.

I'm not sure it is. Or maybe the phrase "application domain" is the stumbling
block. I tend to think procedurally about this - one script is "check all the
production servers, and report which ones are down", another is "query free
space on all the servers for customer X and log the results in the
repository", etc, etc. Very "do it, and do it now".

For background again, our systems monitoring is covered by an existing product
(Oracle Enterprise Manager, for what it's worth), but for non-interesting
reasons we have to supplement this with some pretty basic query scripts, which
are run on a regular basis from the OS scheduler. I don't want to reimplement
OEM, and nor do I particularly want to build a scheduler within the
application. That's why I've got some fairly hard limits on the complexity I
want to include. My though is that by having such a limited scope, a tutorial
remains accessible. If readers look at it and think "but this is the basis of
a full-scale monitoring system", and go on to think about how the ideas I
present can be taken that step further, then great - I've communicated the
basics and sparked some interest. But *I* don't want (need) to take that extra
step, either for the learning benefit, or for my real-world application needs.

> Although the mechanics of what you *do* with them may change over time, 
> your application(s) are going to always be dealing with services, systems, 
> groups, and statuses, so there are your enduring abstractions.

This is very hard to argue with, but it still "feels wrong" to me. It's
pushing straight down the "monitoring server" route that I want to avoid. It
may also be that your interpretation of my application domain is off target to
an extent, and I'm not understanding your terminology well enough to
articulate a counter-proposal.

> To go much further, though, we really have to flesh out what a "status" 
> is.

This is an area where I'm conscious of some definite holes in my thinking. At
present, I have 3 components (ignore the controller, it *is* a red herring -
see below) which effectively sit in a "sea" of data. The exact data defining a
"server", or a "status" are highly unclear. And in many ways, I don't want
them to get pinned down. There's another level of decoupling here in my mind -
I want to be able to plug together a tester and a reporter as long as they can
agree on a common structure for the "status" data. But as my only current data
models are simple booleans and "tables" (results from a SQL query), (and at
the level I have reached in the tutorial I haven't got past booleans yet!) I'd
rather defer setting things in stone right now.

It's the same theme of incremental improvements and deferring design choices
until they become imperative, that I've focused on throughout. Call it an XP
methodology if you like, modified by using strong decoupling as a tool for
easing the refactoring process.

> You'll notice here that I have not addressed controllers or repositories or 
> anything like that.  Those are what we call "solution-domain" components, 
> as opposed to "problem-domain" components.

I believe that the repository is "application domain" in your sense, I've just
glossed over some of the details at this stage (as I describe above). But the
controller is almost certainly a mistake. The reason I identified it was that
much of my current application design is coloured by the details of the
controller. A basic design would just test each server in turn. This is
trivial, but slow when testing 200 servers, with an "I can't get through"
response having a fixed 30-second timeout (Oracle connections, not network
pings, in this case). So I switched to a threaded approach. But for that, I
need to "schedule" the tasks, and then "wait" for them to complete. And
communication changes, to use Queues rather than just returned values. Going
beyond that, I wanted to add thread timeouts, to reduce the 30-second timeout
to something more realistic (2 seconds or so...)

I now think the controller is the wrong abstraction. But there is probably a
"scheduler" concept that I need to tease out here. I can't use PEAK's event
system, as true threading is crucial to my needs, so the PEAK co-operative
iterator-based threading doesn't work here. Also, I want to avoid PEAK's
higher level modules for now, to fit in with my "bottom-up" tutorial approach.

Whether a "scheduler" component is application-domain or solution-domain, I
can't really say. OK, it's an implementation issue, but it's also fundamental
to what I'm trying to design, so solution/application doesn't alter its
importance.

> * "plug-ins" to perform specific tasks like monitoring events of a 
>   particular type and sending notifications

Your plug-in is roughly my tester, I think.

> * reporting scripts to walk the domain objects and generate output of an 
>   appropriate format

So your solution to my "sea of data" issue is for the tester/plug-in to store
its result data on the server object itself, and then for the reporter (script
or component) to walk the set of server objects. OK, but doesn't that just
couple the plug-in to the server? My testers (your plug-ins) could, in theory,
be running a completely ad-hoc query. So there would have to be an attribute
on the server of the form "arbitrary chunk of data".

Maybe if I offer (in my model) a more complex use case, to help you see where
I'm going?

Repository: a SQL database holding details of customer, server, database ID,
username, password, etc. At run-time, pass in a query that selects all
databases for a particular customer, whose status is "Production".

Server: server, database ID, username & password

Tester: Connects to the DB, queries it for datafile, size, free space. Passes
this table of results to the reporter.

Reporter: Collects the query results, adds server and "when collected" info,
and then produces a tabular display on stdout.

This is, in effect, a SQL query utility, across multiple databases. The query
is arbitrary, as are the conditions used to select the list of databases. But
the framework (get list of services, execute a tester against each, which
passes its results to a reporter, which produces a final output) is entirely
generic. And many of the components (eg, the reporter) can be completely
generic as well.

> Also, the idea of a "repository" isn't really that useful either.  This is 
> just an issue of storage, and that can and should be abstracted away.  In 
> the case of reactive plugins, they won't care because they will just get 
> attached to the right domain objects.  In the case of your scripts, they 
> can simply reference the specific DM they want to load from, or use ZConfig 
> to load a specified configuration file containing all the 
> service/system/whatever objects.

My repository is something of your "fact base" form of DM, containing facts I
want to query and subset at runtime. So maybe you're right that it's "just
storage", but if the existing DM is (as you say) a bad fit for fact base data,
I'd possibly be better holding off using the DM stuff until a better fitted
replacement is available. I'll think on this, though.

> So, now we begin to see that we actually have/want some sort of monitoring 
> "server", in the sense that we don't want every little script doing its own 
> status testing.  This is sort of like your "controller" concept, only much 
> simpler.  All the "server" really is, is a script that loads up the domain 
> objects, attaches reaction and reporting plugins to the domain objects, and 
> runs the system's event loop.

Hmm. Maybe so. I still have the "can't get there from here" issue to address,
but I *think* I see how your model and mine mesh.

> Now, it may be that I have just designed something more like John Landahl
> wants than what you want.  :)  But, my main intent is to show how in
> designing with PEAK, you can start with what you "really want", focusing on
> the essentials of the problem rather than on accidents of implementation.

It may be that some of the issues here come from the fact that I don't know at
this stage what I "really want". I'm not keen on the "design up front"
approach, preferring to take a more XP-oriented route, implementing "just
enough to work" at each stage, and incrementally adding complexity.

> In essence, "repository" and "controller" are just computer words that
> aren't part of what you're really trying to do.  PEAK shoves these
> considerations off to the side using generic abstractions like DMs,
> commands, event loops, and executable configuration.  Thus, the bulk of the
> code that *you* write is about the problem domain, i.e. services and
> statuses, reports and reactions.

> Does that make sense to you?

To an extent. It still feels like I have to build a lot to get a working
sample. Last night, I built a 47-line script, tying a dummy repository, tester
and reporter together, in 10 minutes or so. I'm not sure I could do that with
your design.

> Anyway, the most interesting part, I think, of designing a framework like 
> this, is the status metrics, because they represent a point of change over 
> time.  Metrics may be discrete (e.g. boolean or enumerations) or numeric 
> with units.  And they need names.  Some metrics may be derived from other 
> metrics.  With an appropriate design for this part of the system, it should 
> be possible to make fairly generic reporting and reaction tools, as well as 
> developing advanced metrics that summarize various aspects of system state, 
> like for example a color-coding scheme that takes various other 
> measurements into consideration.

But you're basing all this on an implicit assumption of fixed, well-defined
metrics. My "metrics" in reality are much more ad-hoc queries.

Thanks for your comments. They have certainly made me think. I'll see if I can
get a quick proof-of-concept implementation of my dummy sample based around
your model. If nothing else, it will be interesting to assess how difficult
that is to do.

I'll stop now, as I'm starting to rival you for the "longest message in this
list" award :-)

Paul
-- 
The only reason some people get lost in thought is because it's
unfamiliar territory -- Paul Fix