Line-Oriented File Parsing Tools
This module supplies functions for creating and using "lineInfo"
streams, which are iterables of (source,lineNo,line) tuples. The
source is an indicator of the line's origin (e.g. a filename),
while lineNo is its line number within that source. line is
the actual text of the line. This data structure is simple, fast,
and easy to use.
This module was created because the standard library ConfigParser
makes a lot of assumptions about syntax that don't necessarily
work with every config file. For example, it assumes directives
are order-insensitive key-value pairs, which doesn't work well for
systems like PEAK, which might prefer to process a stream of
directives in the original sequence.
So, this module uses a stream-processing approach that should provide
a reusable foundation for other types of line-oriented text processing.
The basic tools are the fromStream() , fromFile() , and fromString()
functions, which respectively take an iterable, a filename, or a string,
and return an iterator that yields (source,lineNo,line) tuples.
Once you have a stream to play with, you can then use some of the
processors such as iterConfigSections() and iterConfigSettings() ,
which return iterators yielding various kinds of configuration data
(based on ConfigParser-like syntax rules). Or, you can use the
AbstractConfigParser class as a base class to create your own
specialized parsers.
Imported modules
|
|
from __future__ import generators
import re
|
Functions
|
|
fromFile
fromStream
fromString
iterConfigSections
iterConfigSettings
|
|
fromFile
|
fromFile ( filename, mode='r' )
Produce (source,lineNo,line) tuple stream from input file
This is the equivalent of fromStream(open(filename,mode), filename) .
That is, it returns a line-info iterator with a source of filename
and the lines from open(filename,mode).readlines() .
|
|
fromStream
|
fromStream ( stream, source=None )
Produce (source,lineNo,line) tuple stream from input lines
Calling fromStream(stream,source) returns an iterator which yields
(source,lineNo,line) tuples for each line in stream .
stream must be a sequence, iterator, or iterable file-like object
that yields text lines. (Line ending characters are stripped
from line ends.)
source should be a short string (e.g. filename) or other useful
identifier of where the lines came from.
|
|
fromString
|
fromString ( text, source='<string>' )
Produce (source,lineNo,line) tuple stream from input file
This is the equivalent of fromStream(StringIO(text), source) .
That is, it returns a line-info iterator with the supplied
source name (default is "<string>" ) and the lines from
StringIO(text).readlines() .
|
|
iterConfigSections
|
iterConfigSections ( lineSource )
(section,lines,info) tuples per .ini-like section in lineSource
This function is used to break up a configuration file (.ini or
ConfigParser-style) into sections based on '[]-enclosed section
names. It returns an iterator which yields (section,lines,info)'
tuples for each section in the file. The first yielded section
will be None if any lines appear before the first section heading;
all others will be the string that was between the [] .
lines is always a list of (source,lineNo,line) tuples, suitable
for use by iterConfigSettings() or other lineInfo stream processors.
info is a (source,lineNo,line) tuple representing the line where
the section header (if any) appeared.
|
|
iterConfigSettings
|
iterConfigSettings ( lineSource )
(name,value,lineInfo) tuples per .ini-like setting in lineSource
name and value will be None for any non-blank, non-comment line
which does not appear to be a valid option. Otherwise, they are the
setting's name and value, respectively.
lineInfo is a standard lineInfo-tuple of (source,lineNo,line) data,
with the difference that continuation lines are concatenated to line .
This is so that if one needs to display an error message that shows the
source of the parsed value, the full logical line is included, even though
the first physical line number would be used to identify the error line.
Continuation Lines
RFC822-style line continuations are supported, with leading whitespace
stripped from continuation lines, and "\n" separating the lines in
the returned value. Unlike ConfigParser, no other interpretation of
name or value is done, so it's up to you to do any case-folding,
conversions, etc.
Comment and Whitespace (blank line) Processing
Comment lines are lines which begin with a ';', # , or the word
rem (case-insensitive). No leading whitespace is allowed, to
prevent confusion with continuation lines. Because setting values
are not interpreted, comments embedded on the same line with a setting
or indented in a continuation line are returned as part of the value
text. If you want to support embedded comments, it is up to you to
parse them out of the value.
Comment lines are completely ignored, so you can have a comment
line inside a series of continuation lines, as long as it has no
leading whitespace on the line. Blank (empty or whitespace-only)
lines within a series of continuation lines are considered part of
the setting value, and are rendered as empty lines in the value.
Blank lines which appear at the end of a setting value, or before
the first setting in the input stream, are ignored.
|
Classes
|
|
|
|