The PEAK Developers' Center   IntroToPeak/LessonThree UserPreferences
 
HelpContents Search Diffs Info Edit Subscribe XML Print View Up
Version as of 2004-06-17 11:11:50

Clear message


Up: IntroToPeak Previous: IntroToPeak/LessonTwo Next: IntroToPeak/LessonFour

Lesson Three: Subcommands and Storing Data

We're stretching pretty hard here to find ways to use "Hello, world!" to demonstrate PEAK concepts. Just keep supressing your impulse to laugh at the triviality of the example, and keep focused on the concepts it's demonstrating.

At this point our hello program can greet anything whose name is recorded in the table. Suppose we have a new thing we want to greet? What we need is a way to update the database table to record a new greeting. Our hello program will now have two different functions we need it to perform: greeting, and recording new greetings. (Of course, we could also implement these as two separate commands, but then we wouldn't have an excuse to talk about AbstractInterpreter and demonstrate Bootstrap).

So in this lesson we'll expand the hello command to have subcommands:

 
    % ./hello for Jeff: Hi, guy! 
    % ./hello to Jeff 
    Hi, guy! 
 
This will require revising our storage implementation to allow writing to our database. We'll stick to using a file for now, to keep the distraction of SQL at bay for a little while longer.

Contents

  1. Lesson Three: Subcommands and Storing Data
    1. AbstractInterpreter and Bootstrap
    2. peak.running.shortcuts
    3. Storing a New Message: the "for" Subcommand
    4. Storing a New Message: Modifying the Data Manager
    5. Questioning Existence, and Tuning Performance
    6. Points to Remember

AbstractInterpreter and Bootstrap

Another abstract class provided by PEAK is AbstractInterpreter. This class represents something PEAK can run that will execute subcommands based on the first argument word. PEAK also provides an subclass of AbstractInterpreter called Bootstrap, that looks up a URL or command shortcut and runs it.

If you think that sounds a lot like what the peak script does, you're quite correct. If you take a look at the actual peak script, it looks something like:

    1 #!/usr/bin/env python2.2
    2 
    3 from peak.running import commands
    4 commands.runMain( commands.Bootstrap )

Which means that we've actually been using Bootstrap all along, to run our programs. Since we'd like to be able to use commands like hello to and hello for in the same way that we can use peak help or peak runIni, we'll make our new main program a subclass of commands.Bootstrap:

    1 from peak.api import *
    2 from helloworld.model import Message
    3 
    4 
    5 class HelloWorld(commands.Bootstrap):
    6 
    7     usage = """
    8 Usage: hello command arguments
    9 
   10 Available commands:
   11 
   12     for -- sets a greeting
   13     to  -- displays a greeting
   14 """
   15 
   16     Messages = bindings.Make(
   17         'helloworld.storage.MessageDM',  offerAs=[storage.DMFor(Message)]
   18     )
   19 
   20 
   21 class toCmd(commands.AbstractCommand):
   22 
   23     usage = """
   24 Usage: hello to <name>
   25 
   26 Displays the greeting for "name".
   27 """
   28 
   29     Messages = binding.Obtain(storage.DMFor(Message))
   30 
   31     def _run(self):
   32         storage.beginTransaction(self)
   33         print >>self.stdout, self.Messages[self.argv[1]].text
   34         storage.commitTransaction(self)
So we've got a new HelloWorld main program, this time a subclass of commands.Bootstrap. HelloWorld is still the holder for the MessageDM binding. The Bootstrap class will automatically make the subcommands into children of our HelloWorld component, so the subcommands will be able to Obtain the DM from their context, as discussed in the last chapter.

The only other thing it's got is a usage class variable. To see how this is used, try typing ./hello at your command prompt:

 
% ./hello 
 
Usage: hello command arguments 
 
Available commands: 
 
    for -- sets a greeting 
    to  -- displays a greeting 
 
 
./hello: missing argument(s) 
 
As you can see, PEAK is taking care of a lot of the routine tasks associated with writing a script.

Our original AbstractCommand is still there, but now we've named it toCmd. And, since it a different class from HelloWorld where we defined the binding to our MessageDM, it needs to Obtain that binding. Remember, it is thereby getting access to the same MessageDM instance as the one in the associated HelloWorld instance. So, by using Make(...,offerAs=[something]) in a parent component, and Obtain(something) in child components, a parent component can share one instance of a service with any child component that needs it.

peak.running.shortcuts

Next, we need to hook up the toCmd class so it can be invoked as a subcommand. How can we do that? Remember when we looked at the help for the peak script? The first paragraph said:

  
The 'peak' script bootstraps and runs a specified command object or command  
class.  The NAME_OR_URL argument may be a shortcut name defined in the  
'peak.running.shortcuts' property namespace, or a URL of a type  
supported by 'peak.naming'. 
 

Hmm. So, if something defined in a particular "property namespace" affects the way the peak command behaves, that must mean that the peak command has some place to get those properties, which means it probably has an ini file. Sure enough, a little poking around the peak directories will reveal a peak.ini file. In that file we can find a section called [peak.running.shortcuts], containing a bunch of properties called runIni, help, and many other commands.

Does this mean that if we add a similar section to our hello file, we can create subcommands of our own? Let's try adding this new section to hello:

 
[peak.running.shortcuts] 
to = importString('helloworld.commands.toCmd') 
 

and now try it:

 
    % ./hello to Fred 
    Greetings, good sir. 
 

Excellent! By the way, you may have noticed that when we turned our command into a subcommand, we did not need to change our argv index number. The argument array stored in self.argv of the subcommand has the subcommand name in argv[0], and the rest of the arguments starting in argv[1]. That's because AbstractInterpreter classes like Bootstrap automatically shift the arguments over for us when they create the subcommand object.

Also by the way, we should mention that it wasn't strictly necessary to edit the configuration file to do what we just did. We also could have defined a binding in our HelloWorld class to "offer" the right configuration value, like this:

    1     __toCmd = binding.Obtain(
    2         'import:helloworld.commands.toCmd',
    3         offerAs=['peak.running.shortcuts.to']
    4     )

But now that you've seen how, you can also see why we didn't do it. It's rather ugly to do this sort of configuration in code, compared to using an .ini file. But it's nice to know you can do it if you need to.

Of course, the configuration file is also more flexible: notice, for example, that we could make multiple configuration files for the same code, each file specifying a different set of subcommands, perhaps for different users of the app. You could almost say that PEAK's motto is, "code reuse, through flexibility".

Now for something completely different. Let's try this:

 
    % ./hello to 
 
Given how ./hello magically generated a usage string, you might think this would do so as well. After all, we provided one in the code above, right? Well, an AbstractCommand doesn't automacially display a usage when no arguments are supplied because, after all, no arguments might be required. It will automatically display the usage if we raise a commands.InvocationError, in our _run method, though:
    1     def _run(self):
    2         if len(self.argv)<2:
    3             raise commands.InvocationError("Missing name")
    4         storage.beginTransaction(self)
    5         print >>self.stdout, self.Messages[self.argv[1]].text
    6         storage.commitTransaction(self)

Now we'll get:

 
%s ./hello to 
 
Usage: hello to <name> 
 
Displays the greeting for "name". 
 
to: Missing name 
 

There's just one problem left with the hello command. Try running hello runIni, and see what we get:

 
% ./hello runIni 
 
Usage: peak runIni CONFIG_FILE arguments... 
 
CONFIG_FILE should be a file in the format used by 'peak.ini'.  (Note that 
it does not have to be named with an '.ini' extension.)  The file should 
define a 'running.IExecutable' for the value of its 'peak.running.app' 
property.  The specified 'IExecutable' will then be run with the remaining 
command-line arguments. 
 
 
runIni: missing argument(s) 
 

Whoops! Just because our configuration file contains its own [peak.running.shortcuts] section, doesn't mean that the settings in peak.ini don't apply. We need to do something about this, so that hello doesn't reuse all the peak subcommands.

Looking at peak.ini, we sometimes see that properties sometimes end with a *. What happens if we define a * rule in the shortcuts section,?

 
[peak.running.shortcuts] 
*  = commands.NoSuchSubcommand 
to = importString('helloworld.commands.toCmd') 
 

Let's try it now:

 
% ./hello runIni 
 
Usage: hello command arguments 
Available commands: 
 
    for -- sets a greeting 
    to  -- displays a greeting 
 
 
runIni: No such subcommand 'runIni' 
 

Good. To recap, we used commands.NoSuchSubcommand, which raises an InvocationError for us, and we used a * rule to define a default value for properties whose names are within a particular "property namespace". That is, any name we look up in peak.running.shortcuts from our configuration file, that isn't explicitly defined there or in our app, will return the commands.NoSuchSubcommand class. That's just what we want for now.

Actually... there is still one more problem. commands.Bootstrap also accepts URLs on the command line by default. commands.Bootstrap provides a way to turn that behavior off, though. We just need to override a flag in our HelloWorld class:

    1 class HelloWorld(commands.Bootstrap):
    2 
    3     acceptURLs = False
    4 
    5     # rest of HelloWorld class goes here...

With these changes, our Bootstrap derivative will now do the right thing. Let's move on to the for command now.

Storing a New Message: the "for" Subcommand

Now, we know we're going to have to rewrite our storage.py to allow us to write to the database, but let's start this part of the task by writing the subcommand first. As you'll quickly see, any consideration of how we implement the saving of the data is virtually independent of how we go about initiating the save in the application program.

So, we need another rule in our hello configuration file:

 
[peak.running.shortcuts] 
*  = commands.NoSuchSubcommand 
to = importString('helloworld.commands.toCmd') 
for = importString('helloworld.commands.forCmd') 
 
and another AbstractCommand subclass in commands.py

    1 class forCmd(commands.AbstractCommand):
    2 
    3     usage = """
    4 Usage: hello for <name>: <greeting>
    5 
    6 Stores "greeting" as the greeting message for "name".
    7 """
    8 
    9     Messages = binding.Obtain(storage.DMFor(Message))
   10 
   11     def _run(self):
   12 
   13         if len(self.argv)<2:
   14             raise commands.InvocationError("Missing arguments")
   15 
   16         parts = ' '.join(self.argv[1:]).split(':',1)
   17         if len(parts)!=2:
   18             raise commands.InvocationError("Bad argument format")
   19 
   20         forname, message = parts
   21 
   22         storage.beginTransaction(self)
   23 
   24         newmsg = self.Messages.newItem()
   25         newmsg.forname = forname.strip()
   26         newmsg.text = message.strip()
   27 
   28         storage.commitTransaction(self)
To put a new object in our database, we ask the Data Manager for a new "empty" object, using newItem(). (Actually, it can have a preloaded default state, but we'll ignore that for now). Then we modify it just like we would any other writable object we got from the Data Manager, and the transaction machinery takes care of getting the data written to the backing store at transaction commit time.

At this point the for subcommand of our hello command is runable:

 
% ./hello for 
 
Usage: hello for <name>: <greeting> 
 
Stores "greeting" as the greeting message for "name". 
 
for: Missing arguments 
 
% ./hello for foobar 
 
Usage: hello for <name>: <greeting> 
 
Stores "greeting" as the greeting message for "name". 
 
for: Bad argument format 
 
% ./hello for Jeff: Hi, guy! 
Traceback (most recent call last): 
  File "/usr/local/bin/peak", line 4, in ? 
    commands.runMain( commands.Bootstrap ) 
  File "/usr/local/lib/python2.3/site-packages/peak/running/commands.py", line 70, in runMain 
    result = factory().run() 
  File "/usr/local/lib/python2.3/site-packages/peak/running/commands.py", line 211, in run 
    return self._run() or 0 
  File "/var/home/rdmurray/proj/peak/helloworld/07writabledb/helloworld.py", line 53, in _run 
    newmsg = self.Messages.newItem() 
AttributeError: 'MessageDM' object has no attribute 'newItem' 
 
Ah, yes. As you'll recall, we used a read-only Data Manager base class when we developed our database. So we can't store anything until we fix that.

Storing a New Message: Modifying the Data Manager

OK, it's time to do some serious surgery on our Data Manager. First, we need to exchange our QueryDM base class to a base class that supports updating the database. That would be storage.EntityDM.

EntityDM requires two additional methods to be defined by the concrete class: _new, and _save. _new is called when a new object is added to the DM, and needs to store the data for that object in the external database. _save is called when an object's state is changed, and a transaction boundry has been reached where that state needs to be synchronized with the external database.

Let's write the new storage.py:

    1 from peak.api import *
    2 from helloworld.model import Message
    3 
    4 class MessageDM(storage.EntityDM):
    5 
    6     defaultClass = Message
    7     filename = binding.Obtain(PropertyName('helloworld.messagefile'))
    8 
    9     def data(self):
   10         data = {}
   11         file = open(self.filename)
   12         for line in file:
   13             fields = [field.strip() for field in line.split('|',1)]
   14             forname, text = fields
   15             data[forname] = {'forname': forname, 'text': text}
   16         file.close()
   17         return data
   18 
   19     data = binding.Make(data)
   20 
   21     def _load(self, oid, ob):
   22         return self.data[oid]
   23 
   24     def _new(self, ob):
   25         self._save(ob)
   26         return ob.forname
   27 
   28     def _save(self,ob):
   29         self.data[ob.forname] = {'forname':ob.forname, 'text':ob.text}

That was easy. The _new() method is responsible for both saving state and returning the object ID of the new object. This is because _new is responsible for assigning object IDs. In this case, we simply return ob.forname, since that's what we're using as an object ID), after calling self._save(ob). Let's run the script, and try it out:

 
% ./hello for Fred: Hi, guy! 
% ./hello to Fred 
Greetings, good sir. 
 

Oops. All we did was update our in-memory data dictionary. We didn't save it to disk, so the change didn't stay in place for long. How can we fix that?

If we look at the storage.IWritableDM interface (see peak help storage.IWritableDM), we'll see that it includes a flush() method. flush() is called as part of the transaction commit process, and the default implementation of this method in EntityDM is what calls our _save() and _new() methods for the appropriate objects. If we define our own version of flush(), that first calls the standard flush(), and then writes our data array to disk, we'll be all set:

    1     def flush(self,ob=None):
    2         super(MessageDM,self).flush(ob)
    3         file = open(self.filename,'w')
    4         for forname, data in self.data.items():
    5             print >>file, "%s|%s" % (forname,data['text'])
    6         file.close()

But wait. What if there's an error while writing the file? What is going to happen to the original file? Since we're opening the existing file for output, we'll have already erased our original data. That's not good.

We need a mechanism for writing files that can roll back or commit, just like the transaction as a whole. PEAK has a peak.storage.files module with two classes we can use for this: TxnFile and EditableFile. Because we're dealing with such a small file, and can load it all in memory at once, we'll use EditableFile, which offers a more convenient interface for such files. Let's take a look at the part of the output from peak help peak.storage.files that covers EditableFile:

    class EditableFile(TxnFile) 
     |  File whose text can be manipulated, transactionally 
     | 
     |  Example:: 
     | 
     |      myfile = EditableFile(self, filename="something") 
     |      print myfile.text   # prints current contents of file 
     | 
     |      # Edit the file 
     |      storage.beginTransaction(self) 
     |      myfile.text = myfile.text.replace('foo','bar') 
     |      storage.commitTransaction(self) 
     | 
     |  Values assigned to 'text' will be converted to strings.  Setting 'text' 
     |  to an empty string truncates the file; deleting 'text' (i.e. 
     |  'del myfile.text') deletes the file.  'text' will be 'None' whenever the 
     |  file is nonexistent, but do not set it to 'None' unless you want to replace 
     |  the file's contents with the string '"None"'! 
     |  By default, files are read and written in "text" mode; be sure to supply 
     |  a 'fileType="b"' keyword argument if you are editing a binary file.  Note 
     |  that under Python 2.3 you can also specify 'fileType="U"' to use "universal 
     |  newline" mode. 
     | 
     |  'EditableFile' subclasses 'TxnFile', but does not use 'autocommit' mode, 
     |  because it wants to support "safe" alterations to existing files. 

Yep, that looks like what we need. We should be able to easily load and save our data by reading or writing to the EditableFile object's text attribute, especially since we will already be inside a transaction whenever we use the data manager.

Okay, so let's fix up storage.py to use EditableFile:

    1 from peak.api import *
    2 from peak.storage.files import EditableFile
    3 from helloworld.model import Message
    4 
    5 class MessageDM(storage.EntityDM):
    6 
    7     defaultClass = Message
    8     filename = binding.Obtain(PropertyName('helloworld.messagefile'))
    9 
   10     file = binding.Make(
   11         lambda self: EditableFile(filename=self.filename)
   12     )
   13 
   14     def data(self):
   15         data = {}
   16         for line in self.file.text.strip().split('\n'):
   17             fields = [field.strip() for field in line.split('|',1)]
   18             forname, text = fields
   19             data[forname] = {'forname': forname, 'text': text}
   20         return data
   21 
   22     data = binding.Make(data)
   23 
   24     def _load(self, oid, ob):
   25         return self.data[oid]
   26 
   27     def _new(self, ob):
   28         self._save(ob)
   29         return ob.forname
   30 
   31     def _save(ob):
   32         self.data[ob.forname] = {'forname':ob.forname, 'text':ob.text}
   33 
   34     def flush(self,ob=None):
   35         super(MessageDM,self).flush(ob)
   36         self.file.text = ''.join(
   37             ["%s|%s\n" % (forname,data['text'])
   38                 for forname, data in self.data.items()
   39             ]
   40         )

We hardly changed a thing. Instead of opening self.filename to read and write the data, now we simply split or join self.file.text. The EditableFile will automatically handle writing the new data to a different filename, then renaming it and replacing the old file. It'll also automatically discard the new file if the transaction is aborted for any reason.

Speaking of aborting, there's actually still a bug in this DM. If a transaction is aborted, the DM may or may not have called _new(), _save() and/or flush(), one or more times already. The EditableFile will take care of resetting itself if a transaction is aborted, but our data dictionary could wind up out-of-sync with the file.

An easy way to do this would be to override the abortTransaction() method, similar to what we did for flush(), and delete the data dictionary if the transaction is aborted:

    1     def abortTransaction(self, txnSvc):
    2         del self.data
    3         super(MessageDM,self).abortTransaction(ob)

Now, if the transaction is aborted, the data attribute gets deleted, and the next time we try to use it, our binding.Make() wrapper will re-run the function that creates the dictionary from the EditableFile. EditableFile does something similar to this, so when we access its text again, it will have reverted to whatever was last stored on disk, not what we changed it to.

There are a few other things to notice about our revised DM. We're still getting the filename from that same configuration variable. Now, however, we are turning that into an EditableFile. Again we use binding.Make to create a descriptor that will return a real value (and cache it) when the class attribute is actually accessed. We used a lambda expression here instead of a function, as this is more readable when there's only a single expression being executed.

Anyway, with these changes in place, our for method should now be working:

 
% ./hello for Jeff: Hi, guy! 
% ./hello to Jeff 
Hi, guy! 
 

Questioning Existence, and Tuning Performance

At this point certain readers may be getting antsy because there's a flaw in the forCmd implementation. As we wrote it, the for command assumes that it's always creating a new Message, even though the forname may already exist in our primitive "database".

For our current example, this doesn't actually cause any problems: because of the way we're updating the "database", it doesn't matter if the item is new or an update. But, we don't want to rely on this implementation quirk, and when we move to an SQL database later on, it will matter quite a bit whether we're adding or updating.

To fix this, we need to change our for command to check whether the name exists, and then either update the existing Message object, or create a new one, as appropriate. In order to do that, we need to be able to ask the DM whether or not a given key exists. Since we're using the forname as the object id, we can handily provide a way to do it by adding a __contains__ method to the DM:

    1     def __contains__(self,oid):
    2         return oid in self.data

Now we can update our forCmd._run() method in helloworld.py:

    1     def _run(self):
    2 
    3         if len(self.argv)<2:
    4             raise commands.InvocationError("Missing arguments")
    5 
    6         parts = ' '.join(self.argv[1:]).split(':',1)
    7         if len(parts)!=2:
    8             raise commands.InvocationError("Bad argument format")
    9 
   10         forname, message = [part.strip() for part in parts]
   11 
   12         storage.beginTransaction(self)
   13 
   14         if forname in self.Messages:
   15             msg = self.Messages[forname]
   16         else:
   17             msg = self.Messages.newItem()
   18             msg.forname = forname
   19 
   20         msg.text = message
   21         storage.commitTransaction(self)

With this change, updating the database should still work:

 
% ./hello for Jeff: Hey, Dude! 
% ./hello to Jeff 
Hey, Dude! 
 

Sharp-eyed readers will notice that the __contains__ method we wrote does double the normal work for retrieving an item, because it actually "loads" data, by accessing the "database". Then, if the item exists in the database, the _load() method will access the database again. For our in-memory database, this is no big deal, but it will be more important when start using SQL. Let's change our approach. We'll replace the __contains__ method with a get method:

    1     def get(self,oid,default=None):
    2 
    3         if oid in self.data:
    4             return self.preloadState(oid, self.data[oid])
    5 
    6         return default

This method will either retrieve the object, or return the default, which is the standard python signature for get. To support only retrieving the object once (as well as various other situations), DM's have a preloadState(oid,state) method. This method creates a pre-loaded object, using state, instead of calling _load to get the state. (Remember, what we have stored in data is a dictionary containing the values for the various object attributes, which is the state from the DM's point of view).

So, our get() method can load the state from our "database", and then preload it into the object it returns.

There is actually still a minor inefficiency here: we're always checking the "database", even if the object we want is already loaded into memory. We can make this slightly more efficient by changing it to:

    1     def get(self,oid,default=None):
    2 
    3         if oid in self.cache:
    4             return self.cache[oid]
    5 
    6         elif oid in self.data:
    7             return self.preloadState(oid, self.data[oid])
    8 
    9         return default
DM's have a cache attribute that holds on to currently loaded objects, so that multiple requests for a given object ID, will always return the same object. So, by checking it here first, we can avoid doing the lookup in self.data if the requested object is already loaded.

These minor changes are of little or no consequence to our current app, but will have more impact when we move to using SQL, as every self.data lookup is going to end up as an SQL query.

Let's finish out our refactoring by updating forCmd to use our new get() method:

    1     def _run(self):
    2 
    3         if len(self.argv)<2:
    4             raise commands.InvocationError("Missing arguments")
    5 
    6         parts = ' '.join(self.argv[1:]).split(':',1)
    7         if len(parts)!=2:
    8             raise commands.InvocationError("Bad argument format")
    9 
   10         forname, message = [part.strip() for part in parts]
   11 
   12         storage.beginTransaction(self)
   13 
   14         msg = self.Messages.get(forname)
   15 
   16         if msg is None:
   17             msg = self.Messages.newItem()
   18             msg.forname = forname
   19 
   20         msg.text = message
   21 
   22         storage.commitTransaction(self)

There. That even simplifies the logic a little. Note, by the way, that we do not pass Messages.newItem() as the default argument to get(), because that would do two wrong things: 1) it'd create a new object that would be added to the database at transaction commit, even if we didn't need it, and 2) it wouldn't set forname on the new message. We could work around problem #2, but not problem #1. Using the newItem() method of a DM always creates an object that the DM will attempt to save when the transaction commits, even if you don't keep the object around that long. So: never call newItem() unless you want the object to be added to the database. (Note: it's possible to write a DM that doesn't behave this way, and only saves an object if it's referenced from other objects or some kind of "root" object. We're just not going to show you how in this tutorial!)

Anyway, that about wraps it up for creating a practical EntityDM subclass.

Points to Remember

Here's the recap for what we've learned in this lesson (once again, it's quite a lot!):

Up: IntroToPeak Previous: IntroToPeak/LessonTwo Next: IntroToPeak/LessonFour


PythonPowered
EditText of this page (last modified 2004-06-17 11:11:50)
FindPage by browsing, title search , text search or an index
Or try one of these actions: AttachFile, DeletePage, LikePages, LocalSiteMap, SpellCheck