[TransWarp] fmtparse - recursive grammar
Phillip J. Eby
pje at telecommunity.com
Fri Jun 13 16:11:26 EDT 2003
At 11:39 PM 6/13/03 +0400, Oleg Broytmann wrote:
>Hello!
>
> Can I parse a recursive grammar using fmtparse?
>
> For example, I want to parse queries like this:
>and(1, or(xxx, yyy))
>or(month, and(ttt, uuu))
> and so on. EBNF grammar is something like
>
>query -> and_or(query, ...)
>and_or -> and | or
>
> (without all annoying details).
>
> Now I am trying to write a parser:
>
>from peak.util.fmtparse import *
>
>and_or = Alternatives("and", "or")
>query = Sequence(and_or, '(', query, ')')
>
> Oops, query referenced before assignment...
Yes, you can do it, but only by assigning the item manually. I don't
remember the exact attributes right off.
I'm glad you brought this up, as I need to think about how to deal with
this for peak.model types. I am making the assumption that for any
non-trivial grammar you need an AST, so the idea is that you create the AST
with peak.model types, and define syntax for the types. However, this only
moves splits the recursion into two levels, where the type's syntax depends
on the feature depends on the type... Worse, in this case the type doesn't
even exist at the time you need to know the syntax.
The whole issue could possibly be addressed by having a "reference" type
that responds to the "withTerminators()" message by looking up the outer
object. OTOH, that just results in the version of the outer object that
existed *before* the terminators were added. Argh.
It does rather seem that a two-step initialization process is going to be
the only way to break this. That is:
query = Sequence()
query.initArgs = and_or, '(', query, ')'
query.rules = query._computeRules()
This should works with the code as it sits today, but I hadn't really
thought yet about the implicit recursion issue in the AST approach I had in
mind. I'm going to have to have a two-stage initialization mechanism, and
FeatureClass._syntax() will need to use it. Probably this means that
fmtparse.Sequence and its derivatives will need to delay adapting their
initArgs until they actually are used.
Consider this:
class AndOr(model.Type):
mdl_syntax = fmtparse.Alternatives('and','or')
class Query(model.Type):
class operator(model.Feature):
referencedType = AndOr
class subqueries(model.Sequence):
referencedType = 'Query'
mdl_syntax = Sequence(operator, '(', subqueries, ')')
The only reason this doesn't work the way you'd want it to, is that the
Sequence tries to adapt 'subqueries' to a Rule immediately, which won't
work because Query doesn't exist yet. If the adaptation were delayed, I
think this would actually create the needed circular references automatically.
So, what we really need here is to change 'fmtparse' to do its adapt()
calls on demand at a later stage, instead of up-front. Converting to Once
bindings would be ideal for this, although I originally intended fmtparse
to be a standalone demo of what kinds of things can be accomplished with
PEP 246 adapt(). Guess I'll have to give up on that.
Perhaps you could contribute a patch to change fmtparse to use Once
bindings for all adapt() on input arguments? E.g. Alternatives() would
save its args in initArgs or _alternatives, and then have .alternatives be
a Once binding that adapts, instead of __init__ doing the adapting. There
are only about 5-6 places that need changing for this, I think.
Anyway, with this change, it should be possible to define recursive
grammars using peak.model to create an AST. To parse the grammar, call
'TopLevelRuleType.mdl_fromString(theString)' and you will get an instance
of your top-level AST type. Then you can of course also add any methods
you like to your AST classes to do the things you want to do, or define
adapters from them to a visitor pattern, or whatever else you like. This
is a good bit more flexible than just using "plain" fmtparse Rules.
More information about the PEAK
mailing list