Module: fmtparse

Module: fmtparse ./src/peak/util/fmtparse.py

Parsing and formatting based on production rules

This module provides some special parsing and formatting features not found in other Python parsing libraries. Specifically it:

supports reversing the production process to convert parsed output back to a canonical version of the input
Supports "smart tokenizing" based on contextually-available delimiters
does not separate "lexing" or "tokenizing" from "parsing", so lexical analysis can be parse-context dependent
is designed to accomodate parsing things other than strings (e.g. object streams, SAX event lists,...?)
allows the definition of arbitrary "rule", "input" and "state" objects that can be fitted into the framework to handle specialized input types, context passing, etc.

The first three features were critical requirements for PEAK's URL-parsing tools. We wanted to make it super-easy to create robust URL syntaxes that would produce canonical representations of URLs from input data as well as sensibly parse input strings. And part of "super-easy" meant not having to write bazillions of regular expressions to parse every field in a URL.

Limitations:

The framework isn't designed for "formatting" to non-strings. Specifically, most rules assume that their sub-rules will only write() things that can be joined with "".join() when formatting.
Some parts of the framework may not be 100% Unicode-safe, even if a UnicodeInput type were implemented. Code review and patches appreciated.

TODO:

Docstrings, docstrings, docstrings... and a test suite!
ParseError should provide line/column info about the error location, not just offset, and it should be provided by input.error() rather than by the rule signalling the error. Of course, all the rules should be calling input.error() instead of creating ParseError instances...
Perform timing tests and investigate parsing speedups for:
- "Compiling" rules to regular expressions + postprocessors
- "Optimizing" rules (e.g. convert Optional(user, @) to something that forward-scans for @ before trying to match user).
- Moving speed-critical parts to C

Imported modules

from peak.binding.once import Make
from peak.util.symbols import NOT_GIVEN
import re

Functions

format
parse
uniquechars

format

format ( aDict,  syntax )

parse

parse ( input,  rule )

Exceptions
Exceptions	ParseError("Expected EOF, found %r at position %d in %r" %( input [ state : ], state, input ) ) state

uniquechars

uniquechars ( s )

Classes

Alternatives	Match one out of many alternatives
Conversion
Epsilon	Simplest possible rule: matches nothing, formats nothing
ExtractString	Return matched subrule as a string
IInput
IRule	Production rule protocol for translation between data and strings
Input
MatchString	Match a regex, or longest string that doesn't include a terminator
MissingData
Named	Named value - converts to/from dictionary/items
Optional	Wrapper that makes a construct optional
ParseError
Repeat
Rule	Production rule protocol for translation between data and strings
Sequence
Set
StringConstant
StringInput
Tuple	Sequence of unnamed values, rendered as a tuple (e.g. key/value)

Table of Contents

This document was automatically generated on Mon Jun 2 01:11:03 2025 by HappyDoc version 2.1