[PEAK] Organizing documentation of Python code

Wed Sep 22 16:09:21 EDT 2004

At 02:20 PM 9/22/04 -0500, Ian Bicking wrote:
>It also keeps track of all the ordering of the module.  I used it at one 
>point to do separators, like:
>
>class Whatever:
>     ...
>     """These functions are only for use in subclasses"""
>     def ...

Yep, that's certainly handy.

>Unfortunately it can't read comments at all.  But otherwise all the 
>interesting information is there.  Right now it is creating a DOM, which 
>is the docutils DOM plus some source-code-specific nodes.  It needs 
>something to translate that DOM into a standard docutils DOM, which can 
>then be rendered.  It also needs a lot of work and thought to deal with 
>inter-module references, but it would still be useable without that. Oh, 
>and subclassing -- that's a hard one too (being able to display a picture 
>of a class's methods, including all superclass definitions). And, for 
>PEAK, you'll probably want to be able to extend the whole thing for 
>special PEAK constructs.

Right - and that's *really* hard to do with a source processing tool, 
because it can't figure out that 
'protocols.advise(instancesProvide=[IFoo])' actually means something we 
want to document specially.

The advantage of object-extraction tools (in principle) is that this kind 
of information is extractable, as long as you have an extensible way to 
control its extraction.  The current crop of object extractors aren't 
capable of that, and I had been putting off implementing my own because of 
the sequencing/categorization issues.  As I said, this idea was just the 
thought that I could get sequencing and categorization by "decorating" 
modules with calls to a small documentation API.

In thinking further about what that API would look like, I think it's going 
to actually consist of just a couple of classes: Symbol and Category, where 
both are hierarchical.  Symbols will know their "true name", docstring, 
categories, a sequence/mapping of contained names, and a mapping for 
annotations and relationship info.  Categories will know the sequence of 
their contained categories, and have a docstring.  Categories will be 
compared by object identity, so you'll actually want to import them.  For 
example, I might have a 'peak.binding._docs' module that defines the 
category objects for that package, and then the other modules would import 
them to use in symbol definitions.  Or, perhaps more likely, I'd define the 
doc categories in the __init__.py of the package, rather than using a 
separate module just for that.

A module's symbol would be stowed in the module under a name like 
'__docsymbol' or some such.  A documentation formatter would basically 
create a symbol instance to represent the document as a whole, and import 
each of the module symbols it uses, adding them to the top-level 
symbol.  It would then ask that top-level symbol to do metadata population 
and generate any indexes the formatter needs.  The formatter's job at that 
point is then just generating some set of web pages (or PDF output, etc.) 
for a given set of symbols, by reading directly from metadata.

Metadata population consists of a symbol using an adapter or generic 
function to "ask" the object it refers to, to decorate the symbol with any 
available information, then recursively applying the same operation to 
child symbols.  Of course, this should only be done once per target object, 
so there should be a memoization parameter to track the objects in, and 
some kind of index to be used for tracking relationships.  Perhaps we will 
need a third kind of object, "documentation set", that will house the 
indexes and memoization facilities, as well as perhaps options for what 
metadata and category indexes are required, allowing it to control most of 
the build process.

Anyway, metadata population would primarily be for relationships like 
sub/superclass, implements/is-implemented-by, and so on, but also to e.g. 
add in inherited attributes

The main open issues are things having to do with the natural organization 
of categories.  It seems to me that there may be two kinds of categories; 
ones that are local to a particular scope/symbol (such as among a class' 
methods), and ones that are global (like categories of all interfaces, all 
modules, and so on).  I haven't entirely reconciled yet how these should 
interact.  I suppose I should distinguish perhaps between a "topic" (local) 
and a "category" (global).

But do categories have hierarchy?  What about topics?  Can they 
overlap?  For example, I might want to note that a particular method is 
part of the XYZ functionality, but also want to list it under "Methods that 
must be overridden in a subclass", and maybe even "Private Methods" as 
well.  Maybe topics and categories have no hierarchy, and are used 
exclusively to subdivide the contents of some set of symbols, and they just 
have a local/global flag, while symbols list the topics that they contain.

Yeah, that makes more sense.  So you could reuse the "Private Methods" 
category all over the place, but you wouldn't end up with a global index of 
private methods unless the formatter explicitly requested it.  Instead, 
you'd just get a listing of them in each symbol that has some.  There would 
probably need to be some formatter configuration regarding whether a topic 
should just list the names that apply to it, or whether it should be used 
as a grouping of the items themselves.

I suppose, in fact that you could group topics into "dimensions" like 
interface implemented, intended use, etc., and then indicate the dimension 
along which a given symbol was best subdivided.  Categorization on other 
dimensions could then be in a different form.

Boy, I just really work against myself, don't I?  The easiest thing for me 
to do is to make a design progressively more sophisticated, until I stop 
being interested in implementing it.  :)