[TransWarp] Page Templates

Fri Jun 20 15:54:05 EDT 2003

(Yet another rambling design brief...)

The basic architecture:

* XHTML input (should we integrate w/HTML Tidy?  Is there a Python lib for 
that?)

* Parse w/SOX to object tree

* Visitor+Builder pattern: recursive function walks tree and calls 
functions on builder to emit code

* Generated code will assume it has some kind of input of a 'model' to 
start with, and a 'write()' function to emit text.

* Builder object has method to emit 'write()' calls for constant strings; 
should be written so that consecutive constants are concatenated by Python 
into a single constant in an invocation of write().

* Builder has methods to "push" and "pop" model paths.  So code generator 
says builder.pushModel('foo') when it encounters 'model="foo"' in a tag, 
then builder.popModel() on exit from the tag's nested contents.

* Code generator carries view namespace, and maintains view 
stack.  Referenced view objects are looked up and adapted to a code 
generation interface, then get the current XHTML node object and builder 
passed in.  The code generation interface could be shared by the main code 
generator, since it takes the same inputs (a node, a view namespace stack, 
and a builder).

* Tags should probably provide convenience methods to find "pattern" 
sub-nodes.

* The builder should provide the ability to start and end a nested function 
definition, in case one wants to turn a "pattern" into a callable function.

* The builder needs to provide namespace management for parameters and 
temporaries.

This is beginning to sound as though I should create Module, Class, and 
Function builders, that work atop an IndentedStream for output.  Each would 
maintain a 'symbol' table of currently active variable names, and perhaps 
have the ability to look up names from other scopes.  Classes would always 
consider their containing Module to be their next namespace level, and 
Functions would consider their next-outer Function or Module to be their 
parent namespace.  I suppose we don't actually need Class or Module for 
page templates, but they'd be handy for other PEAK code-generating efforts.

The page template code builder would be based on Function, adding all the 
convenience methods for outputting strings to write, pushing/popping model 
items, etc.

We need "text" and "list" views to be able to do anything to start 
with.  That gives us the equivalent of early DTML's "var" and 
"in".  Everything else will appear later.  It needs to be "dirt simple" to 
write a code generator for a view, so we need to give the builders some 
very high-level methods to manage code generation, beyond just managing 
string constants.  Ideally, something like this:

<ul model="toc" view="list">
    <li pattern="listItem"><span model="title" view="text">Item title goes 
here</span>
        <ul model="children" view="list">
           <li pattern="listItem">
              <span model="title" view="text">Subtitle goes here</span>
           </li>
        </ul>
    </li>
<ul>

Would translate as something like:

     write('<ul>')
     for tmpVar1 in model['toc']:
         write('<li>')
         write(quote(tmpVar1['title'].getValue()))
         write('<ul>')
         for tmpVar2 in tmpVar1['children']:
             write('<li>')
             write(quote(tmpVar2['title'].getValue()))
             write('</li>')
         del tmpVar2
         write('</ul></li>')
     del tmpVar1
     write('</ul>')

The 'write' function, of course, will probably be some list's "append" 
method, rather than anything written in Python.  And all the string 
constants above would be longer, since they'd contain the original whitespace.

Doing this example also shows that "text" may need to be smart about 
removing useless '<span>' tags, or else we need "text" and "replace" 
variants.  Heck, maybe we also need a "structure" variant that doesn't 
HTML-quote the content that it puts in.

It also illustrates why namespace management by the code builder would be 
critical to easy code generation.  I suppose we could use name prefixes to 
distinguish generated variables (tmpVar1) from framework variables (write, 
quote).  But that seems to lead to severe ugliness in short order, like 
Pyrex's '__pyx_v1' names.

To work around this, it seems useful to conceive of 'Symbol' objects that 
one can create and register with a scope.  For example, a Symbol object 
would be created to represent the 'write' function.  Anybody generating 
code that needs to issue 'write', would need to import that Symbol from 
somewhere, and then use it with the code builder.  The Symbol would know 
how to generate code to initialize itself, and it would know how it 
preferred to be named.  But the namespace would know how it *actually* 
would be named in that space, or how it was named in an enclosing, 
accessible space.

I'm not sure how to handle shadowing in nested namespaces, however.  A 
nested function with a 'write' parameter cannot access the 'write' of its 
caller, for example.  However, this probably isn't worth worrying about, 
because the write parameter probably wants to be a fresh symbol in each 
function anyway.  Usage-specific scope builders will carry these symbols as 
outward attributes.  Thus, if I create a new 'PageTemplateFunction' scope 
object, it will have attributes representing its parameter symbols.

Yes, that would work.  And "above" the Module namespace should be a 
Builtins, that contains Python built-in symbols.  This means that the code 
generator could be prohibited from shadowing builtins at any nesting level, 
which is probably not a bad idea.  Imagine trying to generate code and not 
being able to access builtins like len()!

Struggling beneath the surface of these ideas, trying to get out, is a way 
to write Python in Python.  That is, a syntax for generating Python code 
that can be cleanly expressed in Python itself, such that you could 
mechanically translate from a snippet of Python to a snippet of "meta 
Python".  Unfortunately, every syntax I start to invent seems hopelessly 
awkward or LISP-ish.  Perhaps I'll think of something better later.

In the meantime, the concepts worked out so far sound doable for 
implementing the basics.  As we create more advanced views like form 
widgets, we'll no doubt see what other issues arise.  Actually, it seems 
like it would be a good idea to look at some form widgets *first*, and see 
what we find out.  For example, one idea that pops to mind is that just 
doing a simple dropdown selection widget seems to require *two* models: one 
for the field value being rendered, and one for the collection of possible 
values.  The latter is metadata relative to the former.  How should we 
handle that?

Actually, the first answer that comes to mind is that it's an issue for the 
model-side protocol.  If models always return other models, and models have 
metadata interfaces, then it's actually pretty simple to do that and many 
other "widgety" things.  Reserving parts of the model namespace for things 
like menus and navigation seems feasible as well.