[TransWarp] PROPOSAL: Use PEP 282 'logging' as the core of PEAK's log management

Fri Apr 18 19:10:07 EDT 2003

Overview
========

Why the PEP 282 'logging' package?

* Logging isn't a core competency for PEAK; let somebody else handle it

* Zope Pope-tested, BDFL-approved

* AFAICT, it does everything we could do with 'peak.running.logs' or 
'AppUtils.Logging', as well as many things we thought about but never 
implemented, or implemented but never used.

Why NOT the PEP 282 'logging' package?

* When we first reviewed it, the weird configuration subsystem gave us more 
than a little pause (perhaps because it seemed like overkill for just a 
logging system).  Apparently, many other people felt the same way, and in 
later versions the configuration subsystem was split into its own module, 
and one need not ever import it.  Therefore, we won't.

* Switching cost?  (Not very high, unless there are a lot of people out 
there using PEAK's LOG_DEBUG, etc. calls.  PEAK doesn't do very many of 
those itself.  Switching from AppUtils.Logging was going to incur a similar 
cost anyway, and PEP 282 includes some methods that have an almost 
identical signature to the AppUtils.Logging.LoggingInterface, so conversion 
should be fairly mechanical.)

* Ummm...  anything else?

All that having been said, on to the gory details...

Architectural Comparisons
=========================

AppUtils.Logging
----------------

AppUtils' Logging package is heavily based on 'syslog' conventions.  It 
uses priority levels imported straight from the 'syslog' module, it uses 
syslog's logging format, and it offers the ability to send data to syslog 
(again via the syslog module).  (I don't know if we've ever actually used 
that ability.)

Usage of AppUtils.Logging is via simple calls on objects that implement 
'LoggingInterface', that then either do something with the message, or not, 
depending on its priority level.  Usually these would be 'LogFile' objects 
that simply write directly to a file, opening and closing it after every 
message, in order to support external log rotation.

peak.running.logs
-----------------

PEAK's logging package is a bit more sophisticated, and in fact borrowed 
some aspects of its design from PEP 282.  In particular, PEAK uses "event" 
objects that represent a log message, and these event objects "bubble up" 
from a source location until they find a suitable destination.  However, 
unlike PEP 282, PEAK logging events handle their own formatting, rather 
than having a separate "formatter" object, and they compute a variety of 
data values lazily and once, rather than eagerly or more than once as the 
PEP 282 implementation sometimes does.

The current PEAK logging architecture also has some potential performance 
issues that tend to stop me from using logging as much as perhaps I 
should.  Although PEAK logging events compute most of their contents 
lazily, the logging events themselves have to be created before there's a 
check to see if the event will actually be published.  Also, if you're 
going to have a formatted message with string interpolation, this also must 
be done in advance of knowing whether the event will be published.  PEP 282 
doesn't create a LoggingRecord or interpolate your message arguments until 
it has verified that the logger or one of its parents will at least 
consider publishing it (subject to filters on criteria other than the 
event's priority level).

PEAK was attempting to unite the PEP 282, zLOG, and AppUtils.Logging 
priority schemes.  This is already silly because zLOG is not used by Zope 
X3; PEP 282 and AppUtils.Logging are the only things we need support.  To 
handle the full syslog priority set with PEP 282, we need only make a few 
'logging.addLevelName()' calls as follows:

logging.addLevelName('EMERG', 70)
logging.addLevelName('ALERT', 60)
logging.addLevelName('NOTICE', 25)

Once this is done, even PEP 282's 'SysLogHandler' will instantly support 
all of the standard 'syslog' priorities (although they will only be 
available via 'log(level,...)' calls, not as 'notice()', 'alert()', etc. 
methods unless we also define and install a custom Logger class, which 
doesn't seem worth it).  The new 'whenImported()' hook in PEAK will let us 
configure these levels to be installed as soon as the 'logging' module is used.

PEP 282
-------

The PEP 282 logging package is based on a notion of "loggers".   A logger 
is a thing that accepts and distributes log messages.  Loggers are arranged 
in a conceptual hierarchy using dotted names, similar to Python package 
names or PEAK property names.  By default, loggers propagate messages to 
their "parent" loggers, so sending a message to logger 'foo.bar' will also 
send it to logger 'foo'.  Loggers inherit their priority filter from parent 
loggers, so if you don't specify a priority level for 'foo.bar' it will 
accept messages of the same priority as 'foo' will accept.

Conceptually, an infinite number of loggers exist.  Practically, they don't 
exist until you access them, and unless you explicitly configure them, they 
will just send things to their parents.

Loggers don't actually do anything with log messages except decide whether 
they should be published (by checking the priority), then creating a 
LogRecord and passing it through any Filters.  If the record is acceptable 
to the filters, the record is then forwarded to the appropriate Handlers 
(their own and those of their parents, if propagating to the parents).

Handlers actually do something with the message, like write it to a file, 
send it on a socket, etc.  Handlers can also have their own filters, and 
each logger can have multiple handlers.  It's one big happy 
pipes-and-filters system.  PEP 282 offers *many* fancy handlers, including 
one that does automatic logfile rollover and renaming.  (No compression, 
but we could probably subclass for that if we needed it...)  There's even 
an SMTPHandler that mails out log messages!

All of this flexibility implies a certain complexity of 
configuration/setup.  But, if no options are specified, the defaults are 
fairly reasonable: messages at WARNING or higher priority are written to 
sys.stderr using the default format, all others are discarded.

The big architectural difference between 'logging' and 'peak.running.logs' 
is that 'logging' is optimized for a *static* choice of "logger", while 
PEAK assumes that this is dynamic, based on a component's position in a 
component hierarchy.  I see two ways to deal with this:

1. Use a static approach to the logs, putting 'logger = 
logging.getLogger("foo.bar")' calls into a module's code and then just 
using 'logger.warning(...)' calls in the code.  This is how most examples 
using the 'logging' package do it.  Pro: output by a particular subsystem 
is together.  Con: might be a bit harder to control things related to a 
specific component.

2. Use a dynamic approach using lazy binding.  Add a 'logger:' URL scheme 
to 'peak.naming', and use 'log=binding.bindTo("logger:foo.bar")' as a 
component attribute.  Calls would be made with 'self.log.warning(...)' et 
al.  Pro: fine-grained control can be had.  Con: now you have to decide for 
every darn component what log you want things to go to...  but you could 
still opt for a module-based scheme by default, and override just one 
component if you need to.

I think I'm leaning towards approach 2, because approach 1 doesn't lend 
itself to being changed by anything short of module inheritance.  However, 
any given module can make its own decision on this, and can change its mind 
later.  Also, the ability to have a variety of custom handlers, each with 
their own filters, may be more useful than per-component message 
routing.  With our existing packages, per-component routing was the only 
option.

One other issue we need to address is formatting.  The PEP 282 default 
format just dumps raw message text, with no timestamps or other info.  We 
could use a 'whenImported()' hook to change the format, but I'm not sure 
everyone will agree with our format choice.  Also, formatters are set per 
output handler, not systemwide, so its somewhat moot anyway.

What I think we really need for dealing with logging configuration is a 
ZConfig file format, and a PEAK configuration property for a ZConfig file 
to be loaded at the time the logging package is imported.  We can supply a 
default version of that configuration file that sets up a few things for 
PEAK, like a syslog-like default format.  To do all this, we'll need to 
create a ZConfig schema for the logging package, and write a few helper 
routines.  However, this is probably easier than creating our own scheme 
and implementing and maintaining the rest of PEP 282.  Also, Chris 
McDonough's recent PyCon presentation on ZConfig included some examples 
based on configuring the 'logging' package, so there's something to start 
from.  Who knows, maybe if we wait a little bit, somebody else will create 
a ZConfig schema for 'logging'.  Certainly, Zope X3 could use such a 
thing.  Perhaps we can share the effort with the Zope 3 developers.

Interestingly, because the 'logging' package is designed to have 
configuration changes "on-the-fly", it's possible to meaningfully load 
multiple configuration files during application startup or even later, so 
our loading a single configuration file when logging is first used is 
unlikely to create a problem -- except that multiple loads of the same 
file, or the same information from two files, will result in duplicate 
logging (if the same handler appears more than once).  Of course, it should 
probably be easy to notice that this is happening!

twisted.python.log
------------------

I almost forgot Twisted.  Twisted has its own quirky little logging system, 
very minimalistic and evidently designed with raw performance in 
mind.  Instead of having different logs or loggers, it instead has the 
notion of who's the current "owner" of *the* log.  The "owner" gets to 
determine precisely what bytes will be written to the log, given an input 
string.  This is used mainly to prefix output lines with timestamp info, etc.

There is no priority scheme.  You can write stuff to the log.  Exceptions 
go to the log.  That's about it.  Clearly, this architecture isn't even in 
the same ballpark as PEAK, AppUtils, and PEP 282.  Apparently, Twisted 
automatically sets the "owner" of the log whenever it task-switches between 
protocol handlers, so that each protocol handler doesn't have to know where 
to find its log.  Given Twisted's emphasis on persistent protocol handlers, 
this makes a certain amount of sense.  Putting the log in the object would 
have added that to its persistent state.

Integrating this with *anything* else is going to be tricky, and I don't 
think it's going to happen without the Twisted developers doing a 
significant part of it.  There isn't enough information (AFAICT) in the 
Twisted log API to create PEP 282 logging records *from*, so although in 
principle one could create a pseudo-logfile object to forward everything 
from Twisted to a specified PEP 282 Logger, in practice all it would do is 
give you the ability to "tee" Twisted log data to a variety of places, and 
maybe do some regex filtering.

On the plus side, developers of both Twisted and Zope want those two 
systems to share code and interoperate, so perhaps they'll be motivated to 
address this.  As a BDFL-blessed standard for Python, PEP 282 carries more 
weight than anything we might develop as a logging package.

Anyway, PEAK doesn't rely on Twisted for anything as yet, so this is not an 
issue we need to care very strongly about.  If at a later time we start 
doing networking code with PEAK based on Twisted, we may care a bit more -- 
or we may find we'd rather have Twisted's output going to a separate file 
anyway.

Migration Plan
==============

So what does the move to PEP 282 entail for PEAK?

* We need to distribute the 'logging' package so that PEAK has "batteries 
included" on Python 2.2 in the absence of a Zope X3 installation.

* The current contents of 'peak.running.logs' needs to be replaced with:

   1. a Formatter implementation that supports the PEAK/AppUtils log format.

   2. support code for setting up the "missing" log levels in the 'logging' 
package, and exporting them for use by PEAK code.

   3. support code for loading ZConfig files into the 'logging' package, 
automatically and manually

   4. any PEAK-specific Handler or Filter objects (probably none to start with)

* We add a 'logger:' naming scheme to make it easy to refer to 
loggers.  (We could make ones for handlers, too, but this seems better 
"handled" directly by the config file format.)

* We create a ZConfig schema for 'logging', and add it to a peak.running 
'component.xml' file.  (This is actually a bit tricky, because we need two 
schema components: one for all the built-in objects supplied by 'logging', 
and one for PEAK-specific formatters, handlers, and filters.)

* The current LOG_XXX() functions in 'peak.api' need to be removed, and 
their use expunged from the codebase.  (So if you're using them now, get 
started on moving to using the 'logging' package instead!)

Notice that for the most part, we won't really *do* anything with logging 
in PEAK any more, except provide some useful front-end bits.  Sounds like a 
big win to me.  What do you think?