[PEAK] Organizing documentation of Python code
Phillip J. Eby
pje at telecommunity.com
Thu Sep 23 14:16:54 EDT 2004
At 11:17 AM 9/23/04 -0500, Ian Bicking wrote:
>Or, you could do a hybrid. Load the objects, then do minimal parsing on
>the module, adding attributes to objects you loaded to indicate their
>order, and perhaps some other small things, like finding interstitial
>docstrings.
Sure, you could. But why? Isn't it almost as easy to do:
[doc.begin("Abstract Methods (must be overridden in subclasses)")]
or:
[doc.begin(ABSTRACT_METHODS)]
as it is to use an interstitial docstring?
This technique needs no parsing at all, and is nicely explicit as to what
the docstrings are for.
The only reasons I see for wanting to parse source are:
1. Support documenting arbitrary modules with sequence info
2. Document modules that can't be imported on the platform where the docs
are generated
The first is a non-goal for PEAK, but I could accomplish it anyway by
running imports under a trace hook, if I really wanted to. (But I probably
wouldn't, because alphabetic order is better than an arbitrary order if the
author didn't intentionally organize the code that way.)
The second is more interesting, since e.g. peak.storage.DDE uses
Windows-specific features, but our documentation is generated on
Linux. But, the hybrid approach you're suggesting won't work for such
modules anyway, because it's still initially import-based.
But, in truth, PEAK modules that use platform-specific libraries are
usually written to delay those imports: for example, peak.storage.DDE only
imports win32ui and dde when you actually open a DDE connection. Following
this convention for any Windows-specific code in PEAK would allow an
import-based documentation tool to still generate docs on Unix.
I've been doing a bit more thinking about topics and the like, and
borrowing terminology from XFML ( http://www.xfml.org/ ), I think that I'll
use the idea of "facets" containing hierarchies of "topics", but there will
be two kinds of facets: "TOC"s (tables of contents) and "Index"es. An
Index is a docset-wide listing of symbols grouped by topics in that index.
For example, "Subclasses" could be an Index whose topics are classes. The
classes listed under the topic for a given class would be the classes
registered as subclasses of the class topic. This is a simple way to
implement relationships and links.
Other indexes would be alphabetical, such as "Methods" - its topics would
be names, and all methods named a given name would be collected within
those topics.
There are many ways to format an index, of course, ranging from grouping a
namespace's contents by the topics in that index, or listing the items in a
topic related to the current item (e.g. listing known subclasses of the
current class).
The difference between an Index and a TOC is that Indexes only list links
to actual documentation items, where a TOC is used to order and group the
contents of a namespace. Actually, I guess you could actually use a
designated Index as the TOC, simply by convention. After all, what if you
wanted to generate docs sorted by something else?
So, the overall process still looks something like:
* Import API modules and create Symbols for them, adding them to a DocSet
and populating the symbols with references to the actual objects they represent
* Scan through all of the objects, running a function on each one to
produce additional index entries
* Generate documentation, using methods on the DocSet to query the indexes
and symbols
It may also be that somewhere in there, there should be a pass to parse the
docstrings and extract other metadata to put in the indexes, create new
symbols, tagged values, etc. And, there might be a configuration file
being read to insert other metadata and tagged values.
At this point, the main vagueness in the design is formatting hints like
the sequence between topics of general applicability. I'm thinking that
the "topics" passed to doc APIs should be able to be strings, so that you
can just say whatever's on your mind when doing something new. But, if you
mix that in with existing general-purpose topics like "Abstract Methods"
that might already be defined by their use in another class or module, what
order do they end up in?
One way to deal with this is to use hierarchy to register all-purpose
topics like "Abstract Methods" as subtopics under various standard topics
with an overall ordering, and then put any new topics generated with
strings under a "Contents" topic. But this still presents the possibility
of topic overlap between and change-of-sequence, unless the namespace has
its own sequence stored for the topics.
But keeping the exact sequence isn't always what you want
either! Sometimes you'd rather let the system group the methods sensibly
on its own. It seems you'd have to have an option or something.
And that's really where formatting bugs me at the library level: too many
options with regard to sequencing. (Output formats are another concern,
but apart from the sequencing issue, I don't think they really affect the
structure of the metadata library.)
Maybe the right thing to do is distinguish "sections" and "topics". You
could record an item or items under multiple "topics", but only one
"section", and a namespace's sections are linearly sequenced.
Of course, inheritance leads to some interesting issues. Should you list
all inherited items under a section for inherited methods? I guess if
sections are hierarchical, then we could list inherited methods under the
same sections as the base class used. But it would probably be better to
match up sections, adding any sections that the subclass is missing, and
simply tag the methods as inherited.
Sections could probably be implemented as topics; in effect, when you
define sections under a symbol, the symbol will create a private Index to
serve as its table of contents, and use that index to sequence its output.
So, is a topic anything more sophisticated than a string? If index output
is either in order-of-definition or alphabetical, what else do we
need? For hierarchy, it could be a sequence of strings, or perhaps a
nested set of tuples, such that:
PARENT = "Top-Level Topic"
SUB1 = PARENT, "Subtopic at Level 1"
SUB2 = SUB1, "Subtopic at Level 2"
In other words, a topic is either a string, or a tuple of a topic and a
string, recursively. Then, an Index is little more than an ordered
collection of topics, providing methods to walk its tree or add new topics.
Not bad. Not bad at all. I think that wraps up sequencing and grouping
issues/choices.
At some point, I need to write up a prioritized feature list for this thing.
More information about the PEAK
mailing list