Table of Contents

Module: fmtparse ./src/peak/util/fmtparse.py

Parsing and formatting based on production rules

This module provides some special parsing and formatting features not found in other Python parsing libraries. Specifically it:

  • supports reversing the production process to convert parsed output back to a canonical version of the input

  • Supports "smart tokenizing" based on contextually-available delimiters

  • does not separate "lexing" or "tokenizing" from "parsing", so lexical analysis can be parse-context dependent

  • is designed to accomodate parsing things other than strings (e.g. object streams, SAX event lists,...?)

  • allows the definition of arbitrary "rule", "input" and "state" objects that can be fitted into the framework to handle specialized input types, context passing, etc.

The first three features were critical requirements for PEAK's URL-parsing tools. We wanted to make it super-easy to create robust URL syntaxes that would produce canonical representations of URLs from input data as well as sensibly parse input strings. And part of "super-easy" meant not having to write bazillions of regular expressions to parse every field in a URL.

Limitations:

  • The framework isn't designed for "formatting" to non-strings. Specifically, most rules assume that their sub-rules will only write() things that can be joined with "".join() when formatting.

  • Some parts of the framework may not be 100% Unicode-safe, even if a UnicodeInput type were implemented. Code review and patches appreciated.

TODO:

  • Docstrings, docstrings, docstrings... and a test suite!

  • ParseError should provide line/column info about the error location, not just offset, and it should be provided by input.error() rather than by the rule signalling the error. Of course, all the rules should be calling input.error() instead of creating ParseError instances...

  • Perform timing tests and investigate parsing speedups for:

    • "Compiling" rules to regular expressions + postprocessors

    • "Optimizing" rules (e.g. convert Optional(user, @) to something that forward-scans for @ before trying to match user).

    • Moving speed-critical parts to C

Imported modules   
from peak.binding.once import Make
from peak.util.symbols import NOT_GIVEN
import re
Functions   
format
parse
uniquechars
  format 
format ( aDict,  syntax )

  parse 
parse ( input,  rule )

Exceptions   
ParseError("Expected EOF, found %r at position %d in %r" %( input [ state : ], state, input ) )
state
  uniquechars 
uniquechars ( s )

Classes   

Alternatives

Match one out of many alternatives

Conversion

Epsilon

Simplest possible rule: matches nothing, formats nothing

ExtractString

Return matched subrule as a string

IInput

IRule

Production rule protocol for translation between data and strings

Input

MatchString

Match a regex, or longest string that doesn't include a terminator

MissingData

Named

Named value - converts to/from dictionary/items

Optional

Wrapper that makes a construct optional

ParseError

Repeat

Rule

Production rule protocol for translation between data and strings

Sequence

Set

StringConstant

StringInput

Tuple

Sequence of unnamed values, rendered as a tuple (e.g. key/value)


Table of Contents

This document was automatically generated on Mon Nov 11 01:11:04 2024 by HappyDoc version 2.1