The PEAK Developers' Center   Diff for "scale.dsl" UserPreferences
 
HelpContents Search Diffs Info Edit Subscribe XML Print View
Ignore changes in the amount of whitespace

Differences between version dated 2005-09-04 03:15:49 and 2007-03-25 17:27:07 (spanning 5 versions)

Deletions are marked like this.
Additions are marked like this.

generated by the standard library ``tokenize`` module. You can use that module
directly, or you can use these convenience functions to do the tokenizing:
 
tokenize_string(text)
tokenize_string(`text`)
    Yield the tokens of `text`
 
tokenize_stream(file)
tokenize_stream(`file`)
    Yield the tokens found in the open iterable stream `file`
 
tokenize_file(filename)
tokenize_file(`filename`)
    Open `filename` for text reading, and yield its tokens
 
All of these functions support source encoding comments and BOM markers as

whitespace from individual statements, not from the input to ``parse_block()``.
 
 
Splitting Token Sequences
=========================
 
Many parsing operations need to split a sequence of tokens into subsequences
based on the presence of a particular token. For this, the ``dsl`` module
provides two tokenlist-splitting functions:
 
partition(`tokens`, `sep`)
    Return a 3-tuple (`before`, `sep`, `after`), such that `before` is a list
    of the tokens occurring before the first occurence of `sep` in `tokens`,
    and `after` is an **iterator** that will yield the portion of `tokens` that
    is after the separator. (This means that you can keep partitioning the
    "after" portion without incurring an O(N^2) performance penalty for copying
    the same list items over and over.)
 
    If the separator is found, the returned `sep` is a 1-element list
    containing the actual token that matched `sep`. If the separator is not
    found, the returned `sep` is an empty list, and `before` will contain all
    of `tokens`.
 
rpartition(`tokens`,`sep`)
    This is just like ``partition()``, except that the split is done at the
    **last** occurrence of `sep` in `tokens` instead of the first, and all
    three return values are lists, rather than two lists and an iterator.
    (Note that this means repeated right-partitioning is an O(N^2)
    operation, so you should try to use ``partition()`` if you need to keep
    repartitioning the `before` value.)
 
The `sep` value for these functions can be a string, in which case the token
value must exactly match that string, or else it can be one of the ``tokenize``
module constants like ``tokenize.OP`` or ``tokenize.NAME``, in which case it
will match any token of that type.
 
Examples::
 
    >>> before, sep, after = dsl.partition(dsl.tokenize_string("1+2"), "+")
    >>> print dsl.detokenize(before)
    1
    >>> print dsl.detokenize(sep)
    +
    >>> after
    <generator object at ...>
 
    >>> print dsl.detokenize(after)
    2
 
    >>> before, sep, after = dsl.partition(dsl.tokenize_string("1*2"), "+")
    >>> print dsl.detokenize(before)
    1*2
    >>> sep # empty if not found
    []
    >>> list(after) # remember, it's an iterator
    []
 
    >>> from tokenize import OP # now let's split on any operator
    >>> before, sep, after = dsl.partition(dsl.tokenize_string("1*2"), OP)
    >>> print dsl.detokenize(before)
    1
    >>> print dsl.detokenize(sep)
    *
    >>> print dsl.detokenize(after)
    2
 
    >>> before, sep, after = dsl.rpartition(
    ... dsl.tokenize_string("class when class foo"), "class"
    ... )
    >>> before # all three are lists
    [(...'class'...), (...'when'...)]
    >>> sep
    [(...'class'...)]
    >>> after
    [(...'foo'...)]
 
    >>> before, sep, after = dsl.rpartition(
    ... dsl.tokenize_string("class when class foo"), "spam"
    ... ) # now let's try a not-found separator
 
    >>> before
    []
    >>> sep
    []
    >>> dsl.detokenize(after)
    'class when class foo'
 
 
Block Parsing
=============
 
The ``parse_block(tokens)`` function turns an iterable of tokens into a
**block**, which is a list of statements and the blocks that appear indented
under those statements. More specifically, it is a list of two-item
"(`statement`,`block`)" tuples, where `statement` is a list of the tokens
"(`statement`, `block`)" tuples, where `statement` is a list of the tokens
representing a single statement, and `block` is a (possibly-empty) nested list
of "(`statement`,`block`)" pairs::
of "(`statement`, `block`)" pairs::
 
    >>> dsl.parse_block(dsl.tokenize_string("1+2"))
    [([(...'1'...), (...'+'...), (...'2'...)], [])]

a high-level language that then includes blocks of Python code that must be
incorporated into their output.
 
Here's a more detailed example. First, let's parse a block::
 
Using Statements
----------------
 
In addition to creating a tree of blocks and statements, ``parse_block()``
also turns each statement into a tree of nested subexpressions, so that
parentheses, brackets, and braces are paired and their contents nested into
a single token::
 
    >>> block = dsl.parse_block(dsl.tokenize_string("foo(bar[baz])"))
    >>> stmt,block = block[0] # get the first statement
 
    >>> [(tok_name[t],v) for t,v,s,e,line in stmt]
    [('NAME', 'foo'), ('SUBEXPR', [...])]
 
    >>> stmt[1][0] == dsl.SUBEXPR
    True
 
As you can see, the statement consists of a ``NAME`` token and a ``SUBEXPR``
pseudo-token (whose numeric value is ``dsl.SUBEXPR``). Instead of a string
in the normal position for the token's value, there's a list. Let's expand
it::
 
    >>> subexpr = stmt[1][1] # get the nested token tree
    >>> print dsl.detokenize(subexpr)
    (bar[baz])
 
    >>> [(tok_name[t],v) for t,v,s,e,line in subexpr]
    [('OP', '('), ('NAME', 'bar'), ('SUBEXPR', [...]), ('OP', ')')]
 
As you can see, the parentheses surrounding the subexpression are contained
in the nested token list. And there's still another nested subexpression::
 
    >>> subexpr = subexpr[2][1]
    >>> print dsl.detokenize(subexpr)
    [baz]
 
    >>> [(tok_name[t],v) for t,v,s,e,line in subexpr]
    [('OP', '['), ('NAME', 'baz'), ('OP', ']')]
 
This process of nesting subexpressions makes it easier to parse statements by
looking for keywords or punctuation that may have different meaning when found
in a subexpression. For example, the ":" operator in Python has different
meaning in a dictionary display than it does at the top level of a statement,
where it might introduce a block or lambda expression. However, if for some
reason you want to flatten out the nesting, you can use ``flatten_stmt()``::
 
    >>> [v for (t,v,s,e,line) in dsl.flatten_stmt(stmt)]
    ['foo', '(', 'bar', '[', 'baz', ']', ')']
 
 
Using Blocks
------------
 
Now that we've seen what a statement looks like inside, let's look at blocks
in more detail::
 
    >>> block = dsl.parse_block(dsl.tokenize_string("""\
    ... def foo():

    ... whee()
    ... """))
 
The block has two statements in it::
The block we just parsed has two statements in it::
 
    >>> len(block)
    2

    def foo():
    def bar(baz,spam):
 
Now let's print the bodies of the statements, indenting them to match their
original positions::
Now let's print the blocks nested beneath the two statements, indenting them to
match their original positions::
 
    >>> for stmt,blk in block:
    ... print dsl.detokenize(dsl.flatten_block(blk), indent=4)

        whee()
    ...
 
Or, to print the whole block, we can simply flatten and detokenize it::
We can also print the whole block by flattening and detokenizing it::
 
    >>> print dsl.detokenize(dsl.flatten_block(block))
    def foo():

       def bar(baz,spam):
              whee()
    ...
 
 
Parsing Declarations
====================
 
The SCALE language consists entirely of "declaration" statements. A
declaration is an expression with optional assignment prefixes, an optional
"context" clause, and an optional nested block. Here are some examples of
valid SCALE declarations::
 
    >>> samples = dsl.parse_block(dsl.tokenize_string("""
    ...
    ... foo(gee):
    ... zing from Zang
    ... bar = baz = squidge() from spammity>=1.2:
    ... answer = 42
    ...
    ... "whiz" = "oh boy" = 123 * 456
    ...
    ... something = me from:
    ... me = 789
    ...
    ... """))
 
The ``parse_declarations()`` function yields 4-element tuples containing the
assigned names, value expression, context clause, and nested block for each
statement in a given block::
 
    >>> def print_decl(names, expr, context, block):
    ... if names:
    ... print "Assigned to:", names
    ... print "Expression:", `dsl.detokenize(expr)`
    ... if context is not None:
    ... print "Context:", `dsl.detokenize(context)`
    ... if block:
    ... print "Block:"
    ... print dsl.detokenize(dsl.flatten_block(block),8).rstrip()
    ... print
 
    >>> for names, expr, context, block in dsl.parse_declarations(samples):
    ... print_decl(names, expr, context, block)
    ...
    Expression: 'foo(gee)'
    Block:
            zing from Zang
            bar = baz = squidge() from spammity>=1.2:
                answer = 42
    ...
    Assigned to: ['whiz', 'oh boy']
    Expression: '123 * 456'
    ...
    Assigned to: ['something']
    Expression: 'me'
    Context: ''
    Block:
            me = 789
    ...
 
    >>> for names,expr,context,block in dsl.parse_declarations(samples[0][1]):
    ... print_decl(names, expr, context, block)
    ...
    Expression: 'zing'
    Context: 'Zang'
    ...
    Assigned to: ['bar', 'baz']
    Expression: 'squidge()'
    Context: 'spammity>=1.2'
    Block:
            answer = 42
    ...
 
As you can see, the `context` of a declaration is either a token list or None,
depending on whether a "from" clause appeared in the declaration. `names` is
a (possibly empty) list of the names to which the expression is being assigned,
with quoted strings being treated as if they were normal identifiers. `expr`,
meanwhile, is the token list for the declaration's value expression, and
`block` is the nested block beneath the statement, if any. The `expr` and
`context` token lists are stripped of whitespace tokens at the top level, and
do not contain the ``:`` if one was present at the end of the declaration.
 
If a parsed line is not a valid declaration, a ``TokenError`` is raised. Valid
declarations must end with a ``:`` if and only if an indented block follows::
 
    >>> def try_parsing(s):
    ... list(
    ... dsl.parse_declarations(dsl.parse_block(dsl.tokenize_string(s)))
    ... )
 
    >>> try_parsing("xyz:")
    Traceback (most recent call last):
      ...
    TokenError: ("Expected indented block following ':'", (1, 4))
 
    >>> try_parsing("xyz\n foo")
    Traceback (most recent call last):
      ...
    TokenError: ("Expected ':' before indented block", (1, 3))
 
And the "from" clause of a declaration can only be empty if it introduces a
block::
 
    >>> try_parsing("xyz from:\n foo") # okay
    >>> try_parsing("xyz from 123") # also okay
    >>> try_parsing("xyz from") # not okay
    Traceback (most recent call last):
      ...
    TokenError: ("Expected context or ':' after 'from'", (1, 8))
 
It's also not valid to omit the expression if the declaration assigns to any
names::
 
    >>> try_parsing("from foo") # okay
    >>> try_parsing("foo = from bar") # not okay
    Traceback (most recent call last):
      ...
    TokenError: ('Expected expression', (1, 5))
    >>> try_parsing("foo =") # also not okay
    Traceback (most recent call last):
      ...
    TokenError: ('Expected expression', (1, 5))
 
It's also an error to assign to something that's not an identifier or quoted
string::
 
    >>> try_parsing("abc = '123' = 456") # okay, assigns 'abc' and '123'
    >>> try_parsing("123 = 456") # can't assign to number!
    Traceback (most recent call last):
      ...
    TokenError: ("Expected name or string before '='", (1, 4))
 
It's important to note, however, that ``parse_declarations()`` does not
prescribe or guarantee any particular syntax for the `expr` and `context`
clauses. So, they are not guaranteed to be valid Python expressions. This
means that you can create SCALE dialects with different expression or context
syntax, but it also means that you should be prepared for the possibility of
syntax errors within the expression or context clause.

PythonPowered
ShowText of this page
EditText of this page
FindPage by browsing, title search , text search or an index
Or try one of these actions: AttachFile, DeletePage, LikePages, LocalSiteMap, SpellCheck