Table of Contents

Module: FileParsing ./src/peak/util/FileParsing.py

Line-Oriented File Parsing Tools

This module supplies functions for creating and using "lineInfo" streams, which are iterables of (source,lineNo,line) tuples. The source is an indicator of the line's origin (e.g. a filename), while lineNo is its line number within that source. line is the actual text of the line. This data structure is simple, fast, and easy to use.

This module was created because the standard library ConfigParser makes a lot of assumptions about syntax that don't necessarily work with every config file. For example, it assumes directives are order-insensitive key-value pairs, which doesn't work well for systems like PEAK, which might prefer to process a stream of directives in the original sequence.

So, this module uses a stream-processing approach that should provide a reusable foundation for other types of line-oriented text processing. The basic tools are the fromStream(), fromFile(), and fromString() functions, which respectively take an iterable, a filename, or a string, and return an iterator that yields (source,lineNo,line) tuples.

Once you have a stream to play with, you can then use some of the processors such as iterConfigSections() and iterConfigSettings(), which return iterators yielding various kinds of configuration data (based on ConfigParser-like syntax rules). Or, you can use the AbstractConfigParser class as a base class to create your own specialized parsers.

Imported modules   
from __future__ import generators
import re
Functions   
fromFile
fromStream
fromString
iterConfigSections
iterConfigSettings
  fromFile 
fromFile ( filename,  mode='r' )

Produce (source,lineNo,line) tuple stream from input file

This is the equivalent of fromStream(open(filename,mode), filename). That is, it returns a line-info iterator with a source of filename and the lines from open(filename,mode).readlines().

  fromStream 
fromStream ( stream,  source=None )

Produce (source,lineNo,line) tuple stream from input lines

Calling fromStream(stream,source) returns an iterator which yields (source,lineNo,line) tuples for each line in stream.

stream must be a sequence, iterator, or iterable file-like object that yields text lines. (Line ending characters are stripped from line ends.)

source should be a short string (e.g. filename) or other useful identifier of where the lines came from.

  fromString 
fromString ( text,  source='<string>' )

Produce (source,lineNo,line) tuple stream from input file

This is the equivalent of fromStream(StringIO(text), source). That is, it returns a line-info iterator with the supplied source name (default is "<string>") and the lines from StringIO(text).readlines().

  iterConfigSections 
iterConfigSections ( lineSource )

(section,lines,info) tuples per .ini-like section in lineSource

This function is used to break up a configuration file (.ini or ConfigParser-style) into sections based on '[]-enclosed section names. It returns an iterator which yields (section,lines,info)' tuples for each section in the file. The first yielded section will be None if any lines appear before the first section heading; all others will be the string that was between the [].

lines is always a list of (source,lineNo,line) tuples, suitable for use by iterConfigSettings() or other lineInfo stream processors.

info is a (source,lineNo,line) tuple representing the line where the section header (if any) appeared.

  iterConfigSettings 
iterConfigSettings ( lineSource )

(name,value,lineInfo) tuples per .ini-like setting in lineSource

name and value will be None for any non-blank, non-comment line which does not appear to be a valid option. Otherwise, they are the setting's name and value, respectively.

lineInfo is a standard lineInfo-tuple of (source,lineNo,line) data, with the difference that continuation lines are concatenated to line. This is so that if one needs to display an error message that shows the source of the parsed value, the full logical line is included, even though the first physical line number would be used to identify the error line.

Continuation Lines

RFC822-style line continuations are supported, with leading whitespace stripped from continuation lines, and "\n" separating the lines in the returned value. Unlike ConfigParser, no other interpretation of name or value is done, so it's up to you to do any case-folding, conversions, etc.

Comment and Whitespace (blank line) Processing

Comment lines are lines which begin with a ';', #, or the word rem (case-insensitive). No leading whitespace is allowed, to prevent confusion with continuation lines. Because setting values are not interpreted, comments embedded on the same line with a setting or indented in a continuation line are returned as part of the value text. If you want to support embedded comments, it is up to you to parse them out of the value.

Comment lines are completely ignored, so you can have a comment line inside a series of continuation lines, as long as it has no leading whitespace on the line. Blank (empty or whitespace-only) lines within a series of continuation lines are considered part of the setting value, and are rendered as empty lines in the value. Blank lines which appear at the end of a setting value, or before the first setting in the input stream, are ignored.

Classes   

AbstractConfigParser

Abstract configuration file parser based on sections and settings


Table of Contents

This document was automatically generated on Mon May 6 01:11:03 2024 by HappyDoc version 2.1