[TransWarp] Pre-forking event-driven services
Phillip J. Eby
pje at telecommunity.com
Fri Aug 15 08:58:35 EDT 2003
Inspired by a recent post on the PyWeb list about high-performance Python
web services (specifically, the eGroups/Yahoo!Groups implementation), it
seems to me it would be useful to add a "pre-forking" process manager tool
to PEAK.
The process manager in Apache's mod_fastcgi has many problems controlling
multi-process servers, especially ones that start slowly. For example, if
an application process takes a while to start, it's nearly impossible to
tune mod_fastcgi such that it is both responsive when an application is not
currently running, and yet doesn't slam the box with process startup
overheads as soon as *any* load appears.
Meanwhile, if you configure so as to be able to shut down running processes
without mod_fastcgi starting them again, it will play a fun game called,
"start processes and then kill them before they do anything", whenever
processes are started just before its periodic "kill idle processes"
event. (The slower the process start, the worse this hurts, because the
the window in which it can occur gets larger, and because more work is
wasted when it does happen.)
I believe we can eliminate most of these problems with a generic process
manager tool. The idea is that you'd use a ZConfig file that looks
something like this:
========================================================================
#!invoke peak procman
Command CGI MyPackage:MyApp # Run this in subprocesses
PidFile /var/run/myApp.pid # Shell-lock this file/use as pid
StdErr /var/log/httpd/error_log # Point sys.stderr here
Prefork 3 # start 3 processes
MinProcesses 1 # always have at least 1
MaxProcesses 5 # but no more than 5
# ensure that these modules/classes are loaded before forking
#
import zope.publisher.browser
import zope.publisher.publish
========================================================================
The "command" directive specifies a 'peak' command line to be inserted
before any subsequent command line arguments. Thus, if the above ZConfig
file were named "foo", running "foo bar" would be equivalent to starting 3
copies of "peak CGI MyPackage:MyApp bar".
The idea here is that the main program would load up and create the object
specified by the "Command" (as well as importing the 'import'
directives). It would then fork off child processes to actually do
whatever the application does. The parent process would loop forever,
restarting children when they die off below MinProcesses. It would also
start up new processes when application-specific conditions were
met. These conditions might be driven by periodic tasks, or perhaps by
some IPC mechanism between parent and children. In any case, they would be
configured via sections in the ZConfig file, e.g.:
<StartOnSemaphore>
Priority 1
Semaphore /var/tmp/MyApp.busy
RunEvery 5
</StartOnSemaphore>
The example above being that every 5 seconds, new processes are started if
the semaphore '/var/tmp/MyApp.busy' has a count equal to the current number
of child processes, and there are fewer than MaxProcesses running.
The process manager could also support various useful signals. For example
SIGHUP might indicate that the manager should stop its children, wait for
them to exit, and then re-"exec" the manager. To stop the children, the
manager would SIGHUP them, then issue a SIGTERM after a time period, and
finally SIGKILL. (The intervals for this process could be set in the
ZConfig file as well.) Normal PEAK event-driven processes should treat
SIGHUP as meaning "issue a 'reactor.stop()', so that we will exit as soon
as the current task is completed".
Pre-fork preparations are likely to be rather interesting. Some
applications may want to preload DB caches, for example. To support this,
the process monitor should initialize its subcommand component inside a
transaction. On the application side, initialization code would use
peak.binding "assembly events" in order to be invoked. If the
initialization code uses database connections, it should invoke the
'closeASAP()' method on those connections when done. This will ensure that
the DB connections will close at the end of the initialization transaction,
before the process is forked. This would avoid any unwanted
connection-sharing between forked processes. (Of course, once a child
process uses the DB connection object again, it will automatically reconnect.)
All in all, having such a process manager should make FastCGI applications
much easier to handle than they are currently. We could start such
applications in rc.local, or better yet, continue to use the mod_fastcgi
process manager, but insist that no more than one instance of an
application be started. (We should also investigate developing patches for
the mod_fastcgi problems not fixed by this, like the "start 'em up and kill
'em off" bug.)
Anyway, to support the process manager in PEAK, it's going to be necessary
to implement a few things first:
* A "signals stack", similar to Ty's "rlhist" module in N2. This will
probably be done in e.g. 'peak.util.signal_stack', with two functions
'pushSignals(ob)' and 'popSignals()'. You'll call 'pushSignals()' with an
object that has methods named for the signals it wants to handle. So, if
it has a SIGHUP method, that method will be set as the handler for SIGHUP,
and so on. Any signals that lack a method on the pushed object, will
retain their previous handlers. 'popSignals()' will restore the changed
signal handlers. The idea here is to make it easy for the parent and child
processes to "keep their signals straight".
* Refactor peak.running.commands so that interpreters are factories rather
than being runnables in themselves. That is, if you invoke 'peak runIni
foo', the 'Bootstrap' command should get back the 'foo' object, instead of
an 'IniInterpreter'. The idea is to make command determination more
"eager". If we didn't do this, then running the process manager on an .ini
file wouldn't actually pre-load anything before forking. Each child
process would read the .ini file, instead of it being read once by the
parent process. In addition to doing this with .ini/ZConfig files,
something similar should be done by other "modifier" factories such as
'CGIInterpreter', in that they should ensure that their subject application
components are created when the command line is initially parsed, rather
than waiting for the 'run()' call to occur.
We'll probably also want to incorporate some basic signal handling into our
running.IMainLoop implementation, perhaps via a configuration property that
specifies a signals handler for the mainloop to push. If we want to have
the process manager able to define signal handling for its children, then
we'll need its ZConfig schema to include a "ChildConfig" directive that
specifies an .ini to load when creating the root component for the child
processes.
More information about the PEAK
mailing list