[TransWarp] fmtparse - recursive grammar

Phillip J. Eby pje at telecommunity.com
Fri Jun 13 16:11:26 EDT 2003


At 11:39 PM 6/13/03 +0400, Oleg Broytmann wrote:
>Hello!
>
>    Can I parse a recursive grammar using fmtparse?
>
>    For example, I want to parse queries like this:
>and(1, or(xxx, yyy))
>or(month, and(ttt, uuu))
>    and so on. EBNF grammar is something like
>
>query -> and_or(query, ...)
>and_or -> and | or
>
>    (without all annoying details).
>
>    Now I am trying to write a parser:
>
>from peak.util.fmtparse import *
>
>and_or = Alternatives("and", "or")
>query = Sequence(and_or, '(', query, ')')
>
>    Oops, query referenced before assignment...

Yes, you can do it, but only by assigning the item manually.  I don't 
remember the exact attributes right off.

I'm glad you brought this up, as I need to think about how to deal with 
this for peak.model types.  I am making the assumption that for any 
non-trivial grammar you need an AST, so the idea is that you create the AST 
with peak.model types, and define syntax for the types.  However, this only 
moves splits the recursion into two levels, where the type's syntax depends 
on the feature depends on the type...  Worse, in this case the type doesn't 
even exist at the time you need to know the syntax.

The whole issue could possibly be addressed by having a "reference" type 
that responds to the "withTerminators()" message by looking up the outer 
object.  OTOH, that just results in the version of the outer object that 
existed *before* the terminators were added.  Argh.

It does rather seem that a two-step initialization process is going to be 
the only way to break this.  That is:

query = Sequence()
query.initArgs = and_or, '(', query, ')'
query.rules = query._computeRules()

This should works with the code as it sits today, but I hadn't really 
thought yet about the implicit recursion issue in the AST approach I had in 
mind.  I'm going to have to have a two-stage initialization mechanism, and 
FeatureClass._syntax() will need to use it.  Probably this means that 
fmtparse.Sequence and its derivatives will need to delay adapting their 
initArgs until they actually are used.

Consider this:

class AndOr(model.Type):
     mdl_syntax = fmtparse.Alternatives('and','or')

class Query(model.Type):
     class operator(model.Feature):
         referencedType = AndOr
     class subqueries(model.Sequence):
         referencedType = 'Query'

     mdl_syntax = Sequence(operator, '(', subqueries, ')')

The only reason this doesn't work the way you'd want it to, is that the 
Sequence tries to adapt 'subqueries' to a Rule immediately, which won't 
work because Query doesn't exist yet.  If the adaptation were delayed, I 
think this would actually create the needed circular references automatically.

So, what we really need here is to change 'fmtparse' to do its adapt() 
calls on demand at a later stage, instead of up-front.  Converting to Once 
bindings would be ideal for this, although I originally intended fmtparse 
to be a standalone demo of what kinds of things can be accomplished with 
PEP 246 adapt().  Guess I'll have to give up on that.

Perhaps you could contribute a patch to change fmtparse to use Once 
bindings for all adapt() on input arguments?  E.g. Alternatives() would 
save its args in initArgs or _alternatives, and then have .alternatives be 
a Once binding that adapts, instead of __init__ doing the adapting.  There 
are only about 5-6 places that need changing for this, I think.

Anyway, with this change, it should be possible to define recursive 
grammars using peak.model to create an AST.  To parse the grammar, call 
'TopLevelRuleType.mdl_fromString(theString)' and you will get an instance 
of your top-level AST type.  Then you can of course also add any methods 
you like to your AST classes to do the things you want to do, or define 
adapters from them to a visitor pattern, or whatever else you like.  This 
is a good bit more flexible than just using "plain" fmtparse Rules.




More information about the PEAK mailing list