[PEAK] PROPOSAL: Performance tracking service for PEAK

Thu Dec 18 20:11:53 EST 2003

Sometimes it's important to be able to measure performance of a system, 
perhaps one that's in "active duty" like a web application.  I'd like to 
add a 'running.IPerformanceService' interface (and implementation) to PEAK 
for this.

The basic idea is that the service will offer the ability to create 
timers.  A timer has a dotted name (key), just like a logger or a 
property.  The timer has the ability to add arbitrary key/value pairs to 
help identify what is being measured (such as a hit number, user ID, SQL 
snippet, etc.), and a 'stop()' method to record the stop time.

So, the interfaces might be something like:

class IPerformanceService(Interface):

     def getTimer(key):
         """Return timer named by 'key'"""

     def addListener(key,listener):
         """Add a listener for measurements of 'key' (may be wildcard)"""

     def addStartListener(key,listener):
         """Add a listener for *start* events on 'key' (may be wildcard)"""

class IPerformanceTimer(Interface):

     def start(**info):
         """Start timing after setting info (like 'reset(); resume()')"""

     def resume(**info):
         """Resume timing (i.e. don't reset) after adding info"""

     def reset():
         """Reset the timer to zero clear info""""

     def stop():
         """Stop timing"""

     def addInfo(**info):
         """Update info"""

     def addListener(listener):
         """Add a listener for measurements of this timer"""

     def addStartListener(listener):
         """Add a listener for *start* events on this timer"""

class IPerformanceListener(Interface):

     def timerStopped(service,key,elapsed,processor,info):
         """Timer 'key' of 'service' stopped with specified performance+info"""

class IPerformanceStartListener(Interface):

     def timerStarting(service,key,elapsed,processor,info):
         """Timer 'key' of 'service' about to start"""

The idea here is that you'll define listeners to do things with the 
performance counts.  If your program doesn't need instrumentation, there 
will be no listeners, and the timer methods will be no-ops.  However, by 
changing configuration and restarting, you'll be able to have listeners 
receive performance data and "do something" about it.

For our current web application monitoring tool, our apps write out a 
special file on each web hit that indicates the status of that application, 
logs SQL and other performance-related parameters.  The mechanism is 
awkward, and hardwired into both the monitoring tool and the 
applications.  The interfaces above are intended to let us decouple 
this.  What we should be able to do is write listeners that write out the 
existing file format upon the starting or completion of various designated 
timer keys.  Then, we could create additional listeners to do other 
things.  For example, we could create listeners that would dump performance 
data to a database, so we could report on and analyze the 
information.  Such listeners could wait until the process is between web 
requests to dump out the data, so that a request in progress is only slowed 
down during the time it takes to accumulate the data in memory.

There are a number of open issues here still.  For example, what happens if 
a timer isn't stopped, due to an error?  I don't think we want to force the 
use of try-finally blocks, but the alternative requires that we simply 
ignore timing in progress whenever we resume or reset an already-running 
timer.  The majority of listeners will listen to stop events, not start 
events, so this is no big deal.  We'll simply have to say that receiving a 
start event doesn't guarantee that the timer will also produce a stop event.

Note that timers should use whether there are any *stop* listeners to 
decide whether they're a no-op.  Start listeners don't count, for the 
simple reason that stops are what do the "measuring".  So, if a timer has 
no stop listeners, its start listeners will receive no messages either, and 
in fact the start() and resume() methods will be complete no-ops in that 
case.  But the stop() method will always check the current time values 
immediately, so as to minimize measurement error in the event that it is 
actually going to do something with the measurements.  (Although I suppose 
there are ways to deal with that, too, like shunting the methods to 
non-empty versions as soon as a stop listener is added to the timer.  Or, 
the timer class could be written in Pyrex for the absolute minimum overhead.)

The only other significant issue remaining I think is the issue of how to 
configure the listeners.  Ty and I have some ideas about that, too, but I'm 
going to save them for another e-mail, as it's getting quite 
late.  (Similar issues exist for configuring other event-like mechanisms, 
like logging, so hopefully we'll be able to resolve the issue across a 
range of such kinds of services.)

In the meantime, your feedback on the proposal so far would be appreciated.