[TransWarp] PROPOSAL: refactoring of peak.running.daemons

Tue Apr 15 18:13:16 EDT 2003

This week I'm working on basic infrastructure to run PEAK apps under Zope 
X3.  One such infrastructure piece is the "plumbing" from the web server to 
zope.publisher.  I have some code I wrote previously to do this for 
ZPublisher, that runs under both plain CGI and FastCGI.  It includes 
support for things like idle timeout, maximum number of requests, etc.  I'd 
like to migrate this code to use the PEAK configuration system, and 
integrate with the planned 'peak.running.app' subsystem.

While contemplating possible design approaches, it occurred to me that the 
structure of the "plumbing" code is rather similar to the daemons in 
peak.running.daemons.  It has many similar parameters, such as maximum time 
the process can live.  It even has a similar fetch-work/do-work event loop.

I thought, "Cool.  So we can combine daemons and a FastCGI server in the 
same process, and perhaps have some "idle tasks" that automatically run 
between web hits."  But I ran into a little snag.  The FastCGI library 
we'll probably be using has a blocking 'Accept()' function, so there's no 
way to actually get it to idle between requests.  And, we probably don't 
want to sit in a polling loop for web hits anyway, it should be a select() 
loop.

My first thought was to define a PEAK interface for event loop services, 
and providing a default implementation based on the standard library's 
'asyncore' module, while making it possible to create adapters to use 
Twisted, Tkinter, wxPython, or other GUI toolkits' event loops where 
appropriate.

So I went to do a little research on such things, and found that Twisted 
has already pretty much done that.  'from twisted.internet import reactor' 
is all you need to do to get an object that handles scheduled calls, 
select() loops, startup/shutdown events and a lot more.  It includes 
support for the event loops of Qt, wxWindows, Tk, GTK, and even 
Jython.  These are wheels that I really don't want to reinvent.  But...

Anticipated Consequences
========================

First, the external-ish issues...

* PEAK now has dependencies to Twisted.  This isn't such a big deal, since 
it's only for the parts we need, and only from daemons and Zope X3 
publishing.  We can define our own interfaces for the parts we need, and 
wrap all our contact with Twisted into a PEAK component, set up as a 
utility available via 'peak.ini'.

* Twisted doesn't use Zope Interface objects.  Instead, it defines an 
abstract base class, 'Interface', and a few API functions to deal with 
interfaces.  But, it *does* use '__implements__', in a way that's thus 
completely incompatible with both PEAK and Zope X3.  That, as they say, 
"sux0rs".  But, I believe I can fix this with PEAK's new 'declareModule()' 
facility.  I should be able to write replacement API functions based on the 
Zope Interface package, and install them in 
'twisted.python.component'.  Then, all of Twisted will be forced to do my 
bidding!  Mwahahahaha!  Okay, I got a little carried away there.  Anyway, 
it looks like it might be as simple as replacing the Interface class, and 
three short functions: implements(), getInterfaces(), and superInterfaces().

Okay, so those are issues, but they're pretty tractable to deal with.  You 
want to run web apps, you gotta install Zope and Twisted, and you have to 
import something from PEAK first that monkeypatches Twisted to use the 
common interface type of PEAK and Zope.  On to the PEAK-specific design 
issues...

'peak.running.daemons' will need to get refactored, which is a fancy way of 
saying "rewritten from the ground up".  This is not such a bad thing, since 
I don't expect anybody has code based on the existing module.  But it does 
bring up some interesting design questions...

* Simple daemons would look a lot like they do now.  The main changes would 
be in the '__call__' method, setting it up to be able to reschedule itself 
at the next poll interval.  There would be no run() loop.  Instead, there'd 
be some kind of 'start()' method to schedule the daemon's first 
invocation.  Also, we'd move the "idle backoff" algorithm to the scheduling 
of the daemon's next invocation, instead of leaving it as part of a 
top-level daemon scheduling process.

* MultiDaemons don't make a lot of sense any more.  Priority-driven 
scheduling can move to a global priority mechanism (see 
later).  Round-robin scheduling of items with a given polling interval is 
automatic.  The only part of MultiDaemon that still seems to makes sense: 
the 'daemonSeed' daemon that determines what daemons should exist in that 
multidaemon.

We use daemon seeds to do things like scan a set of queue directories and 
create a daemon for each queue.  This would be both more and less complex 
than in the old scheme.  The old scheme required a multidaemon to manage 
scheduling and a seed daemon to manage the list of sub-daemons.  In the new 
scheme, the seed daemon would simply create any new daemons and '.start()' 
them, and perhaps '.stop()' ones that are no longer needed.

* The main loop of an all-daemons application might be a bit different than 
now, as well.  Instead of looping and waiting for its "hard deadline", it 
could simply schedule a call to the reactor's 'stop()' method at the "hard 
deadline", and start the reactor's main loop.  Handling the "soft deadline" 
is a bit trickier.  Currently, the "soft deadline" is the time at which a 
daemon stops running, if it has no work to do.    This could be implemented 
by a recurring call (first scheduled to run at the soft deadline) that 
checks whether the time until the next scheduled event (reactor.timeout()) 
is greater than some desired threshhold of idleness, or is None (meaning 
nothing else is scheduled to run!) and there are no sockets being monitored 
by the reactor.

* Our signal handling code goes away; Twisted's reactor classes have a lot 
more thorough signal management than anything we're doing.

* In principle, all daemons in an app could decide to shut down, leaving 
the reactor loop running until one of the deadlines kicked in.  So there 
might need to be a watchdog that ensured application shutdown if the 
reactor had nothing to do, before the soft deadline.  Nah.  It's a 
YAGNI.  If we interpret "soft deadline" as "minimum lifetime", then the 
previously described handling for soft deadline suffices.

* It sounds like there are more parameters to tune, all in all, than before:

- Hard/soft deadlines (per app, already existed)
- initial poll period (per daemon, already existed)
- Idle increment, idle maximum (per daemon, used to be per app)
- soft deadline idle threshhold, soft deadline check interval (per app, new)

I guess that's not too bad, since most of the new settings are per app, and 
all can be assigned some pretty reasonable defaults.  For example, the idle 
increment for daemons can just be 0 by default, and the maximum idle can 
default to the same as the initial poll period plus three times the idle 
increment.  Thus, setting a poll period of 30 seconds and an increment of 
10 would be a 60 second maximum idle.  I suppose we could also expose the 
three as a "idling backoff count", or maybe even allow an exponential backoff.

We probably don't even need a check interval for the soft deadline.   We 
could compute the time to do the next check automatically, based on the 
current outstanding schedule. The check should take place at the next 
moment where a schedule gap exists that is equal to or greater than the 
idle threshhold.  So, we only need the idle threshhold, not a check interval.

Okay, so this actually sounds pretty good.  A lot of stuff that was part of 
BaseDaemon now moves to a DaemonApp class, that will probably derive from a 
peak.running.app base class.  That may mean DaemonApp will live in 
running.app.  All the deadline processing would be methods of DaemonApp.

This is probably a good time to add ZConfig support as well, in the form of 
ZConfig schema containing "section type" descriptions for the configuration 
data needed by an individual daemon and by a DaemonApp.  DaemonApp would 
probably take parameters like:

<DaemonApp foo>
StayAliveFor 15m    # formerly SOFT_DEADLINE
ShutdownAfter 30m   # formerly HARD_DEADLINE
ShutdownIfIdle 1m   # stop after StayAliveFor if nothing scheduled for this 
long

<DirQueueManager mail-queues>
RunEvery 5m
BaseQueueDirectory /foo/queues/mail

   <QueueInfo>
    Type peak.running.queues.MailQueue
    RunEvery 10s
    # ...  any other parameters here
   </QueueInfo>

</DirQueueManager>

<FileCleanup session-cleanup>
RunEvery 5m
RemoveOlderThan 30m
ScanDirectory /foo/bar/spam/sessions
</FileChecker>

<LDAPReplicator collective>
RunEvery 30s
IncreaseIdleBy 10s   # if nothing to replicate, wait 40s, then 50s...
MaximumIdle 2m       # up to a maximum of 2m between checks
RecordCount 10       # up to 10 records at a time
# ... etc.
</LDAPReplicator>

</DaemonApp>

So, there'd be a section type for a daemon, and the DaemonApp section type 
would include an attribute to collect all the specified daemons.  When the 
DaemonApp started, it would invoke each daemon's 'start()' method, and drop 
its reference to the daemon.

Each daemon would expose a "priority" and "desired next run time" value to 
the daemon app.  The daemon app would at all times maintain a priority 
queue of active daemons, and take the highest priority one for 
processing.  After running the daemon, the app would check the daemon's 
"desired next run time" and schedule an application method to re-add the 
daemon to the priority queue at that time.  Then, if the priority queue 
were not empty, the app would schedule itself to be re-run after some small 
time interval.  (If the queue were empty, that would be the perfect time to 
do "soft deadline" idle checking if the "StayAliveFor" time has passed.)

The net result of this behavior is that daemons would be scheduled as they 
desired, except that they will not actually *run* until any active higher 
priority daemons have "yielded the floor".

So.  That's about all I can squeeze out of my brain at this 
sitting.  Comments, anyone?