The PEAK Developers' Center   VisitorRevisited UserPreferences
 
HelpContents Search Diffs Info Edit Subscribe XML Print View

The "Visitor Pattern", Revisited

Sometimes, you need to perform an operation over a data structure that includes many different types of data. But, you don't want to build the operation into the objects themselves, either because you can't, or because you don't want to modify every class every time you have a new kind of operation. For example, you might want to have operations to:

But, if you're working with existing classes, and adding a new operation, you don't necessarily want to have to insert the code into every class. And if someone else wrote the code, you may not even have the option of doing that.

The "Classic" Visitor Pattern

The "Classic" Visitor pattern, from the Gang of Four Book, tries to solve this problem by putting the algorithm into a "Visitor" class, and using an accept() method in the target classes, that selects the correct behavior from the Visitor class. For example, if we were writing a "pretty printer" Visitor, we might make a PrettyPrinter class with methods like visit_list(), visit_string(), and so on, that have the code for printing that kind of object. Then, we would add a simple accept() method like this:

def accept(self, visitor):
    return visitor.visit_list(self)

to each of our target classes. As you can see, the method above calls the visitor's visit_list() method, so that the visitor knows what kind of object it's working on. Then, when we need a new operation, like exporting to XML, we just write an XMLExporter class with a different visit_list() method.

Of course, there are two problems with this approach, at least in Python. First, we can't modify built-in types like list, so if we take this approach, we won't be able to pretty-print lists or export them to XML. We could only use this approach with new classes that we create, or classes that somebody already put an accept() method in. Second, it seems really silly to write a bunch of methods just to tell the visitor what type something is. Isn't there an easier way to do this?

Extrinsic Visitor

A variation on the Visitor pattern, called "Extrinsic Visitor", is more commonly used in Python. As Martin Nordberg writes in "Variations on the Visitor Pattern":

"An Extrinsic Visitor implements double dispatch with run time type information instead of Accept() methods. With the same machinery it is possible to test the feasibility of a particular visit before performing its operation. The extrinsic visitor pattern provides several benefits in ease of software development in trade for poorer run time performance."

So, the Extrinsic Visitor pattern does away with accept(), replacing it with code in the Visitor class that inspects the type of the target, and selects the appropriate method. For example, the compiler.visitor module in the Python standard library has an ASTVisitor class with the following method:

def dispatch(self, node, *args):
    self.node = node
    klass = node.__class__
    meth = self._cache.get(klass, None)
    if meth is None:
        className = klass.__name__
        meth = getattr(self.visitor, 'visit' + className, self.default)
        self._cache[klass] = meth
    return meth(node, *args)

This method attempts to look up methods with names like visitFunction, visitGlobal, and so on, using the name of the class of the target node. This approach is fairly simple to code in Python, so it's often seen in Python code that implements these kinds of algorithms. In fact, if the code above didn't do caching to improve the performance, it would be about half its current size, something like:

def dispatch(self, node, *args):
    self.node = node
    className = node.__class__.__name__
    meth = getattr(self.visitor, 'visit' + className, self.default)
    return meth(node, *args)

So, it's definitely easy to code. (But if you need raw performance, you will want to use a cache: the extra attribute access and string concatenation are 2-3 times slower than looking up the class in a dictionary, and if you have very large data structures the difference will add up very quickly.)

There is one drawback to this approach, however. As commonly coded in Python, the Extrinsic Visitor pattern does not have any way to deal with inheritance.

For example, let's say your PrettyPrinter class has a visit_list() method. And somebody subclasses list to make MyList. You're going to have to add a visit_MyList() to your PrettyPrinter class, even though in all probability the visit_list() method would work just fine.

Or, maybe your XML exporter treats a wide variety of classes exactly alike, for all descendants of some Node class. You still have to write visit_FooNode(), visit_BarNode(), and so on. Also, what if there are two MyList classes in use in a program? How will visit_MyList tell them apart?

Clearly, using class names to identify methods has certain limitations. But beyond that, what if you're writing something that's part of a framework, that you expect others to extend and add on to? If they want to create a new visitor, they can certainly do so. But what if they add a new class, and want to have an existing visitor -- such as our hypothetical PrettyPrinter -- work with it?

Generic Functions

Lisp developers have been occasionally known to sneer at the Visitor pattern as a non-solution to a non-problem. Aside from any general tendency for Lisp programmers to sneer at things (ahem!), this is probably because with the Common Lisp Object System (CLOS), they have a delightful thing called a "generic function".

No, a "generic function" isn't a cheap knock-off of an original, "brand name" function. It's a not-so-great name for something that might be better called a "polymorphic function", an "extensible function", or maybe even a "composable function", as these are all better descriptions of what a generic function is.

But before we dive into explanations, let's start with an example using PyProtocols' dispatch package for generic functions:

import dispatch
import sys

@dispatch.on('target')
def pretty_print(target,stream=sys.stdout):
    """Generic function to pretty-print 'target' onto 'stream'"""

@pretty_print.when(list)
def print_list(target,stream):
    # pretty-print a list (or subclass thereof)
    stream.write("[")
    for item in target:
         pretty_print(item,stream)
         stream.write(",")
    stream.write("]")

@pretty_print.when(object)
def print_object(target,stream):
    # Fallback if nothing else matches
    stream.write(repr(target))

pretty_print([1,"2",3.0])

The above is an overwhelmingly trivial pretty-printer, that simply prints the normal repr() of any object that isn't a list (or subclass thereof). When you call pretty_print([1,"2",3.0]), the print_list function will be invoked, then the print_object function will be called on each item in the list. The functions are looked up in a dictionary, following the inheritance tree of the target parameter. The first matching function is used, so the speed is roughly equivalent to a normal attribute lookup. (At least if you installed PyProtocols with its C accelerator routines.)

Unlike the typical "extrinsic visitor" used in Python code, this "generic function" (pretty_print()) works without needing to define separate routines for integers, strings, and floats. We didn't have to write print_str, print_int, and print_float, if one routine (print_object) sufficed for them all. Also, although I didn't show this above, the routines don't have to have individual names. I could have written, for example:

@pretty_print.when(object)
def pretty_print(target,stream):
    # Fallback if nothing else matches
    stream.write(repr(target))

But, it's often good practice to give the individual routines (or "methods") of a generic function separate names anyway, as it makes tracebacks easier to read when you're debugging.

Another thing you might notice is that generic functions don't need to be in a class, although there's nothing stopping you from putting them in one. For example, a real pretty printer would probably need some kind of indentation management that would best go in a pretty-printer instance, e.g.:

class PrettyPrinter:

    def __init__(self,stream=sys.stdout,indentwidth=4):
        self.stream = stream
        self.indentwidth = indentwidth
        self.current_indent = 0

    @dispatch.on('target')
    def pretty_print(self, target):
         """Generic function to pretty-print 'target'"""

    @pretty_print.when(object)
    def print_object(self,target):
        # Fallback if nothing else matches
        self.stream.write(repr(target))

    # etc.

From the point of view of Python, PyProtocols' generic functions are just ordinary functions; they can be used as methods, as standalone functions, or even as class methods (by using @classmethod before the @dispatch.on()). You can also use the inspect or pydoc modules to get information about them.

Extension and Reuse

But the big difference between generic functions and regular functions is the fact that they are extensible. If you need them to work with a new type of object, you can simply add a new "method" to them. If my new MyList class needs to be handled differently than a regular list, I can just write:

from pretty_printing import pretty_print

@pretty_print.when(MyList)
def print_MyList(target,stream):
    # code to handle 'MyList' target

And voila! The global pretty_print() function now knows how to handle this new type. Because we're keying off of an actual class, not a name, there's no chance of collision with another MyList class. And, if somebody later subclasses my MyList class, my special method will still be used, unless they define a new method to be used "when" their class is a target.

But what if I don't want to change the global pretty_print() function? Or, maybe I'd like to have a function that's like pretty_print(), but that formats lists differently? Well, I can clone the generic function, like this:

my_print = pretty_print.clone()

@my_print.when(list)
def print_list(target,stream):
    # print 'list' the way I want it for my_print

Now, I have a my_print() generic function, identical to the old one, except that it prints lists differently (and maybe their subclasses, if they don't have another method defined).

Interestingly, clone() is more like subclassing than copying, because if someone later adds new methods to pretty_print(), the my_print() function will "inherit" them, as long as they aren't overridden by a more-specific definition in my_print(). For example, if my_print has a method for type object, and pretty_print has a method for type int, then my_print(23) will use the pretty_print method for type int, instead of its own method for type object. But if both have an int method, then the one in my_print will take precedence.

These relationships hold through multiple clonings, so if you clone my_print to make foo_print, then foo_print will also inherit any methods added to my_print or pretty_print.

One important use of the .clone() method is when subclassing a class that contains generic functions. For example:

class NewPrinter(PrettyPrinter):
    
    pretty_print = PrettyPrinter.pretty_print.clone()

    @pretty_print.when(int)
    def print_int(self, target):
        # NewPrinter handles 'int' differently than PrettyPrinter

If you want to add new methods to a generic function in a subclass, and you want those new methods to take effect only in the subclass (and subclasses thereof), you can do so by cloning the base class' generic function, and then adding methods to it.

What We Didn't Cover

We've only touched the very tip of the generic function iceberg here. For example, we've dealt only with "single-dispatch" generic functions, using @dispatch.on() to indicate what parameter we want to select a method for. And, we've only shown dispatching on single types or classes, not on interfaces or class lists. The PyProtocols dispatch package also offers complete "predicate dispatch" generic functions that can test arbitrary Python expressions on multiple parameters simultaneously, and automatically figure out which conditions are "more specific" than others.

However, the basic principles are the same: generic functions are a kind of "open-ended modular function" that let you add new "methods" as you need them, even from within different modules written by different people. Generic functions select and execute the method with the most-specific matching condition, given the function's input parameters at runtime. In addition to being useful for implementing "visitors", generic functions are a boon to any module or framework that needs to provide a facility that's open to extension, without requiring modifications to the original code.


PythonPowered
EditText of this page (last modified 2005-01-08 03:13:09)
FindPage by browsing, title search , text search or an index
Or try one of these actions: AttachFile, DeletePage, LikePages, LocalSiteMap, SpellCheck