[TransWarp] fmtparse - recursive grammar

Phillip J. Eby pje at telecommunity.com
Fri Jun 20 18:49:30 EDT 2003

At 11:09 PM 6/20/03 +0300, alexander smishlajev wrote:
>Phillip J. Eby wrote:
>>At 03:16 PM 6/16/03 +0400, Oleg Broytmann wrote:
>>>    My first attempt was unsuccsessful.
>Oleg gave up on using fmtparse and i took the baton from him.  i read the 
>papers on packrat parsing, and i found TDPL very attractive.  i am very 
>much interested in working parser engine for TDP grammars.
>>openingChars is used to tell a preceding item at the same rule level what 
>>its terminators are.  So  if I have a rule that's like 
>>Sequence(Named('foo'), ',' Named('bar')),  the openingChars of ',' is 
>>',', and we take the 'withTerminators()' of Named('foo') so it has the 
>>chance to take the ',' terminator into account.
>as far as i uderstand, the terminators are used only to break the 
>greedyness of MatchString.  and the only real provider of openingChars is 
>StringConstant.  am i right?

Yes - but if MatchString is a regular expression, then it should also 
supply openingChars.  It's just left to the user to do this.  Note that 
this also doesn't mean you can't define your own syntax elements that 
supply openingChars as well.  Finally, note that Optional() supplies a 
combination of its contents' opening characters, and those of its 
successor, if any.

>also i see that you have reversed the flow of the parsing: the papers on 
>packrat parsing show right-to-left parsing, starting from the end of the 
>string.  but fmtparse does left-to-right parsing.  i suppose this was done 
>in order to utilize the regexp engine.  i do not mean that there is 
>something wrong; i just want to ensure that i understand things right.

Actually, I'm surprised.  I had absolutely no idea that packrat parsing 
went right to left!  Are you sure?  Honestly, I did not study the code very 
well, just the concept of memoizing partial parse results in a backtracking 
recursive descent parser.

>i propose to use the Input object to keep track of the rules being 
>applied.  since Input.parse is called from each non-trivial Rule.parse, i 
>think there will be no problems in memorizing the rule before calling 
>rule.parse.  then, a rule having need in terminators could request them 
>from it's envelope Sequence via the Input object.  this wold allow the 
>lazy evaluation of the terminators at the time when they are really needed.
>what do you think?

If I understand you correctly, I think you have underestimated the 
problem.  First, terminators can propagate laterally as well as vertically 
within a structure, and second, rules don't know about their containers.  I 
don't think you can transfer the necessary knowledge to the Input without 
making it part of the parsing state, which will make the system much 
slower.  Currently, for all the concrete rules, the parse state is just an 
integer position in the string.  You'd need to change that to a combination 
of a position, and current terminators.  And I'm still not positive that 
this would break the recursion!

More information about the PEAK mailing list