[PEAK] Process management and forking
Phillip J. Eby
pje at telecommunity.com
Wed Nov 19 14:44:08 EST 2003
This is a follow-up on the plan to build a pre-forking process manager, as
described in:
http://www.eby-sarna.com/pipermail/peak/2003-August/000697.html
At this point, the prerequisites are largely met. As it turned out, the
signal stack wasn't really the right idea, and I will probably end up
getting rid of it. It'll be replaced by a global 'running.ISignalManager'
object. It needs to be a global singleton, because signal handlers are
singletons, too. The signal stack had a problem of figuring out when to
activate or deactivate a particular handler, and the activations or
deactivations had to happen in a nested fashion. But the global
SignalManager can have handlers added or removed at any time and in any
order, and multiple handlers can in fact be active simultaneously.
The new draft ProcessManager takes advantage of this to scan for terminated
child processes, by registering itself for SIGCHLD. The idea of the
ProcessManager is to have a component that tracks active child
processes. You tell the ProcessManager to 'invoke()' an object that
implements IProcessFactory (not yet defined). The ProcessManager performs
an os.fork(), and in the parent process, the ProcessManager asks the
factory to create an IProcessProxy that will be used to observe the status
of the child. Then, in the child process, the ProcessManager asks the
factory to perform its child task.
So far, so good. Now if our child process is just going to 'exec*()'
another program, everything's fine since the child task will never return
to the caller of invoke(). However, if we really wanted our fork to
continue in the child, there are a number of potential issues that come up.
For the prefork manager, we don't really want its children to have the
prefork code active. We'd like the child process command to use a
different configuration root, for example, and its own reactor, if
any. This is particularly important if the prefork manager needs a reactor
of its own. For example, the prefork manager may need to schedule periodic
tasks to check on its subprocesses. But, the child process shouldn't run
these tasks, or even be able to access them!
This separation is simple at first. We design the prefork manager
component so it embeds its own reactor, mainloop, and task queue, e.g.:
reactor = binding.Make(
'peak.running.scheduler.UntwistedReactor',
offerAs=[running.IBasicReactor]
)
taskQueue = binding.Make(
'peak.running.daemons.TaskQueue',
offerAs=[running.ITaskQueue]
)
# etc...
In this way, any subcomponents will use the prefork manager's components
for these purposes, rather than default ones supplied by the configuration
root. The application being run in the child processes, however, will be
given its own separate configuration root, where it will be free to use any
defaults it wants. Thus, the forked process will have complete
independence from the parent's control mechanisms.
Sounds good so far. But there's an interesting hitch. In the child
process, when we return from 'ProcessManager.invoke()', we'll still be
*inside* the parent's reactor loop!
It works something like this... let's assume that there's a periodic task
in the parent that spawns new children. The reactor invokes the task, the
task runs invoke(), and invoke() returns in both the parent and the
child. In the parent, we just return to the reactor, and everything's
fine; we proceed normally. But what about the child? If we simply return,
then the child's copy of the parent reactor will keep running right where
it left off, possibly spawning children from the children, in an
ever-multiplying cascade until the box dies. (Bad sorcerer's apprenctice;
no cookie!)
Even if we raise StopRunning to cancel the task, other tasks will keep on
running, unless we call reactor.stop(). And even if we do call
reactor.stop(), the way things are now, previously-scheduled 'callLater()'
operations may still occur, which includes the possibility of executing a
previously-scheduled task. What to do?
We probably need to introduce a reactor.crash() capability, as Twisted
has. Then, we could stop the reactor immediately without attempting to
maintain any invariants. Control would then return to the prefork
manager's IMainLoop, and from there to the prefork manager's '_run()'
method. The tricky bit at that point would be for the '_run()' method to
know that "hey, I'm a child process, so don't *do* anything except run the
child process".
We could change the IMainLoop interface to add a method that indicates a
fork has occurred with the current process being the child process. The
mainloop could save this information, stop the reactor, and exit
ASAP. _run() methods calling mainloop.run() would do something like:
self.mainLoop.run(...)
if self.mainLoop.childProcess:
self.mainLoop.childProcess.runChild()
So, we'd now be running the child process code outside the reactor loop, a
lot closer to the desired exit point. That's better, but not perfect. At
the point where this code runs, 'self' (the component holding the reactor,
mainloop, process manager, etc.) is stil "live". So, all those components
would hang around, uncollectable and unuseful, taking up memory until the
process is ready to exit. Not good.
But what can we do? Assuming the parent is running under the 'peak'
script, then we could possibly make a modification there to cause the child
process to be invoked *after* the parent process' configuration root and
command are discarded.
Suppose we changed the 'peak' script to read:
#-----
from peak.running.commands import Bootstrap
from peak.api import config
import sys
sys.exit(
int(
Bootstrap(config.makeRoot()).run()
)
)
#-----
(The change is the 'int()' wrapped around the Bootstrap().run()
call.) Now, here's what we could do. If the 'run()' method returned an
object with an '__int__()' method, that method would be called *after* the
last reference to the Bootstrap object (and all objects reachable from it)
goes away.
Now we can refactor the existing commands such that IMainLoop.run() will
return either 0 or a "process wrapper" with an __int__ method, and anything
using IMainLoop returns the value of IMainLoop.run() to its calling
command. IMainLoop will also grow a 'childForked(child)' method, to accept
the child process info and crash the reactor, if the mainloop and reactor
are running. If 'childForked()' is called when the reactor and mainloop
are not running, an error should occur.
All in all, it sounds like the prefork manager can be made to work,
although there will need to be considerable care taken in building the
process factory classes so that they don't keep dangling references to
objects that are part of the parent process' component tree. When the
'int()' call takes place, the parent process' objects should be unreachable
garbage as far as the interpreter is concerned.
For other uses of fork() than the prefork manager, similar care will be
needed. Commands that perform activities *after* mainloop exit, or which
don't use a mainloop, will have to come up with alternative approaches for
performing the child's side of the fork. Maybe we will have something like
this as the standard way of forking:
proxy, stub = processMgr.invoke(factory)
if stub:
self.mainLoop.childForked(stub)
return # or otherwise exit
else:
# do things w/proxy
The idea here is that 'invoke()' returns either 'proxy, None' or 'None,
stub', depending on whether you're in the parent or child process. The
'stub' is the object to be invoked by the child, and the 'proxy' is a
component used by the parent to monitor and/or communicate with the
child. Depending on the 'factory' used, the 'proxy' might have attributes
representing the parent's side of various pipes. (And similarly for the stub.)
So, if the caller is responsible for invoking IMainLoop.childForked(),
there's no coupling between the ProcessManager and the architecture of the
app (e.g. whether it has an event loop). It seems a minor inconvenience,
all in all, for being able to be so flexible in terms of forked children.
And what happens if a child process needs to fork? Implicitly, this is
taking place inside the previous child process' '__int__()' method, so
there is now no longer an opportunity to escape it in the same way as we
did for the first process. This means that the 'peak' script will have to
become slightly more complicated. Instead of making a simple int() call,
it will need to either call __int__() directly and see if it gets an
integer, looping until it does, OR, it will need to inspect the return
value from Bootstrap().run() and call some method on it repeatedly until
the nth child's completion has occurred.
Yuck. I really wanted to keep the 'peak' script simple; adding a loop is
not exactly "simple". Part of the reason I wanted to keep it simple was so
that people who want to *not* use the 'peak' script can easily emulate it
and do so without error.
I guess what I'll do is this. I'll write a 'runMain(cmd)' function that
takes an ICommandLineApp and invokes its 'run()' method. If the return
value is another ICommandLineApp, it'll invoke *that*, and so on until it
gets a non-ICommandLineApp return value. At that point, it'll call
sys.exit() on the return value. The 'peak' script will then change to:
#-----
from peak.api import config
from peak.running import commands
commands.runMain(
commands.Bootstrap(config.makeRoot())
)
#-----
Which is now even shorter than it was when we started. But... there is
still a hitch. Now, the Bootstrap object passed to runMain() still has a
reference held to it in the calling frame, even after runMain() discards
it. The only way around this, then, is to do:
#-----
from peak.api import config
from peak.running import commands
commands.runMain(
lambda: commands.Bootstrap(config.makeRoot()).run()
)
#-----
The idea here is now that you pass 'runMain()' a callable that it will
invoke. This will result in 'runMain()' having the only reference to the
original Bootstrap() object. Whew. I think that's about it.
I probably won't make these changes today, but they will take place very
soon unless there are comments or objections. Note that none of these
changes affects Windows compatibility. The ProcessManager and the pre-fork
manager will only be useful on Unix, at least at first. It's possible that
ProcessManager could later grow additional capabilities for managing
spawn*() on a cross-platform basis, but that's not a priority right
now. But existing parts of the framework are not going to be affected by
these changes if you're not using the ProcessManager, and if you're on
Windows, you won't be.
To recap the planned changes:
* Add 'runMain()' to peak.running.commands
* Revise 'peak' script to use it
* Add IBasicReactor.crash(), and fix UntwistedReactor.iterate() so it'll work
* Add IMainLoop.childForked(), with appropriate doc change for IMainLoop.run()
* Fixup ProcessManager to return a 'proxy,stub' pair to both parent and child
* Make ProcessManager clear its process table and unregister its signal
handler in the child process
* Begin work on IProcessFactory and on the prefork manager tool
Comments or questions, anyone?
More information about the PEAK
mailing list