[PEAK] Organizing documentation of Python code
Phillip J. Eby
pje at telecommunity.com
Wed Sep 22 16:09:21 EDT 2004
At 02:20 PM 9/22/04 -0500, Ian Bicking wrote:
>It also keeps track of all the ordering of the module. I used it at one
>point to do separators, like:
> """These functions are only for use in subclasses"""
> def ...
Yep, that's certainly handy.
>Unfortunately it can't read comments at all. But otherwise all the
>interesting information is there. Right now it is creating a DOM, which
>is the docutils DOM plus some source-code-specific nodes. It needs
>something to translate that DOM into a standard docutils DOM, which can
>then be rendered. It also needs a lot of work and thought to deal with
>inter-module references, but it would still be useable without that. Oh,
>and subclassing -- that's a hard one too (being able to display a picture
>of a class's methods, including all superclass definitions). And, for
>PEAK, you'll probably want to be able to extend the whole thing for
>special PEAK constructs.
Right - and that's *really* hard to do with a source processing tool,
because it can't figure out that
'protocols.advise(instancesProvide=[IFoo])' actually means something we
want to document specially.
The advantage of object-extraction tools (in principle) is that this kind
of information is extractable, as long as you have an extensible way to
control its extraction. The current crop of object extractors aren't
capable of that, and I had been putting off implementing my own because of
the sequencing/categorization issues. As I said, this idea was just the
thought that I could get sequencing and categorization by "decorating"
modules with calls to a small documentation API.
In thinking further about what that API would look like, I think it's going
to actually consist of just a couple of classes: Symbol and Category, where
both are hierarchical. Symbols will know their "true name", docstring,
categories, a sequence/mapping of contained names, and a mapping for
annotations and relationship info. Categories will know the sequence of
their contained categories, and have a docstring. Categories will be
compared by object identity, so you'll actually want to import them. For
example, I might have a 'peak.binding._docs' module that defines the
category objects for that package, and then the other modules would import
them to use in symbol definitions. Or, perhaps more likely, I'd define the
doc categories in the __init__.py of the package, rather than using a
separate module just for that.
A module's symbol would be stowed in the module under a name like
'__docsymbol' or some such. A documentation formatter would basically
create a symbol instance to represent the document as a whole, and import
each of the module symbols it uses, adding them to the top-level
symbol. It would then ask that top-level symbol to do metadata population
and generate any indexes the formatter needs. The formatter's job at that
point is then just generating some set of web pages (or PDF output, etc.)
for a given set of symbols, by reading directly from metadata.
Metadata population consists of a symbol using an adapter or generic
function to "ask" the object it refers to, to decorate the symbol with any
available information, then recursively applying the same operation to
child symbols. Of course, this should only be done once per target object,
so there should be a memoization parameter to track the objects in, and
some kind of index to be used for tracking relationships. Perhaps we will
need a third kind of object, "documentation set", that will house the
indexes and memoization facilities, as well as perhaps options for what
metadata and category indexes are required, allowing it to control most of
the build process.
Anyway, metadata population would primarily be for relationships like
sub/superclass, implements/is-implemented-by, and so on, but also to e.g.
add in inherited attributes
The main open issues are things having to do with the natural organization
of categories. It seems to me that there may be two kinds of categories;
ones that are local to a particular scope/symbol (such as among a class'
methods), and ones that are global (like categories of all interfaces, all
modules, and so on). I haven't entirely reconciled yet how these should
interact. I suppose I should distinguish perhaps between a "topic" (local)
and a "category" (global).
But do categories have hierarchy? What about topics? Can they
overlap? For example, I might want to note that a particular method is
part of the XYZ functionality, but also want to list it under "Methods that
must be overridden in a subclass", and maybe even "Private Methods" as
well. Maybe topics and categories have no hierarchy, and are used
exclusively to subdivide the contents of some set of symbols, and they just
have a local/global flag, while symbols list the topics that they contain.
Yeah, that makes more sense. So you could reuse the "Private Methods"
category all over the place, but you wouldn't end up with a global index of
private methods unless the formatter explicitly requested it. Instead,
you'd just get a listing of them in each symbol that has some. There would
probably need to be some formatter configuration regarding whether a topic
should just list the names that apply to it, or whether it should be used
as a grouping of the items themselves.
I suppose, in fact that you could group topics into "dimensions" like
interface implemented, intended use, etc., and then indicate the dimension
along which a given symbol was best subdivided. Categorization on other
dimensions could then be in a different form.
Boy, I just really work against myself, don't I? The easiest thing for me
to do is to make a design progressively more sophisticated, until I stop
being interested in implementing it. :)
More information about the PEAK