Up: IntroToPeak Previous: IntroToPeak/LessonTwo Next: IntroToPeak/LessonFour

We're stretching pretty hard here to find ways to use "Hello, world!" to demonstrate PEAK concepts. Just keep supressing your impulse to laugh at the triviality of the example, and keep focused on the concepts it's demonstrating.

At this point our hello program can greet anything whose name is recorded in the table. Suppose we have a new thing we want to greet? What we need is a way to update the database table to record a new greeting. Our hello program will now have two different functions we need it to perform: greeting, and recording new greetings. (Of course, we could also implement these as two separate commands, but then we wouldn't have an excuse to talk about AbstractInterpreter and demonstrate Bootstrap).

So in this lesson we'll expand the hello command to have subcommands:

]]> This will require revising our storage implementation to allow writing to our database. We'll stick to using a file for now, to keep the distraction of SQL at bay for a little while longer.

Contents

Lesson Three: Subcommands and Storing Data

AbstractInterpreter and Bootstrap
peak.running.shortcuts
Storing a New Message: the "for" Subcommand
Storing a New Message: Modifying the Data Manager
Questioning Existence, and Tuning Performance
Points to Remember

Another abstract class provided by PEAK is AbstractInterpreter. This class represents something PEAK can run that will execute subcommands based on the first argument word. PEAK also provides an subclass of AbstractInterpreter called Bootstrap, that looks up a URL or command shortcut and runs it.

If you think that sounds a lot like what the peak script does, you're quite correct. If you take a look at the actual peak script, it looks something like:

1 #!/usr/bin/env python2.2 2 3 from peak.running import commands 4 commands.runMain( commands.Bootstrap )]]>]]>

Which means that we've actually been using Bootstrap all along, to run our programs. Since we'd like to be able to use commands like hello to and hello for in the same way that we can use peak help or peak runIni, we'll make our new main program a subclass of commands.Bootstrap:

1 from peak.api import * 2 from helloworld.model import Message 3 4 5 class HelloWorld(commands.Bootstrap): 6 7 usage = """ 8 Usage: hello command arguments 9 10 Available commands: 11 12 for -- sets a greeting 13 to -- displays a greeting 14 """ 15 16 Messages = binding.Make( 17 'helloworld.storage.MessageDM', offerAs=[storage.DMFor(Message)] 18 ) 19 20 21 class toCmd(commands.AbstractCommand): 22 23 usage = """ 24 Usage: hello to <name> 25 26 Displays the greeting for "name". 27 """ 28 29 Messages = binding.Obtain(storage.DMFor(Message)) 30 31 def _run(self): 32 storage.beginTransaction(self) 33 print >>self.stdout, self.Messages[self.argv[1]].text 34 storage.commitTransaction(self)]]>]]>

So we've got a new HelloWorld main program, this time a subclass of commands.Bootstrap. HelloWorld is still the holder for the MessageDM binding. The Bootstrap class will automatically make the subcommands into children of our HelloWorld component, so the subcommands will be able to Obtain the DM from their context, as discussed in the last chapter.

The only other thing it's got is a usage class variable. To see how this is used, try typing ./hello at your command prompt:

]]> As you can see, PEAK is taking care of a lot of the routine tasks associated with writing a script.

Our original AbstractCommand is still there, but now we've named it toCmd. And, since it a different class from HelloWorld where we defined the binding to our MessageDM, it needs to Obtain that binding. Remember, it is thereby getting access to the same MessageDM instance as the one in the associated HelloWorld instance. So, by using Make(...,offerAs=[something]) in a parent component, and Obtain(something) in child components, a parent component can share one instance of a service with any child component that needs it.

Next, we need to hook up the toCmd class so it can be invoked as a subcommand. How can we do that? Remember when we looked at the help for the peak script? The first paragraph said:

]]>

Hmm. So, if something defined in a particular "property namespace" affects the way the peak command behaves, that must mean that the peak command has some place to get those properties, which means it probably has an ini file. Sure enough, a little poking around the peak directories will reveal a peak.ini file. In that file we can find a section called [peak.running.shortcuts], containing a bunch of properties called runIni, help, and many other commands.

Does this mean that if we add a similar section to our hello file, we can create subcommands of our own? Let's try adding this new section to hello:

]]>

and now try it:

]]>

Excellent! By the way, you may have noticed that when we turned our command into a subcommand, we did not need to change our argv index number. The argument array stored in self.argv of the subcommand has the subcommand name in argv[0], and the rest of the arguments starting in argv[1]. That's because AbstractInterpreter classes like Bootstrap automatically shift the arguments over for us when they create the subcommand object.

Also by the way, we should mention that it wasn't strictly necessary to edit the configuration file to do what we just did. We also could have defined a binding in our HelloWorld class to "offer" the right configuration value, like this:

1 __toCmd = binding.Obtain( 2 'import:helloworld.commands.toCmd', 3 offerAs=['peak.running.shortcuts.to'] 4 )]]>]]>

But now that you've seen how, you can also see why we didn't do it. It's rather ugly to do this sort of configuration in code, compared to using an .ini file. But it's nice to know you can do it if you need to.

Of course, the configuration file is also more flexible: notice, for example, that we could make multiple configuration files for the same code, each file specifying a different set of subcommands, perhaps for different users of the app. You could almost say that PEAK's motto is, "code reuse, through flexibility".

Now for something completely different. Let's try this:

]]> Given how ./hello magically generated a usage string, you might think this would do so as well. After all, we provided one in the code above, right? Well, an AbstractCommand doesn't automacially display a usage when no arguments are supplied because, after all, no arguments might be required. It will automatically display the usage if we raise a commands.InvocationError, in our _run method, though:

1 def _run(self): 2 if len(self.argv)<2: 3 raise commands.InvocationError("Missing name") 4 storage.beginTransaction(self) 5 print >>self.stdout, self.Messages[self.argv[1]].text 6 storage.commitTransaction(self)]]>]]>

Now we'll get:

Displays the greeting for "name". to: Missing name

]]>

There's just one problem left with the hello command. Try running hello runIni, and see what we get:

]]>

Whoops! Just because our configuration file contains its own [peak.running.shortcuts] section, doesn't mean that the settings in peak.ini don't apply. We need to do something about this, so that hello doesn't reuse all the peak subcommands.

Looking at peak.ini, we sometimes see that properties sometimes end with a *. What happens if we define a * rule in the shortcuts section,?

]]>

Let's try it now:

]]>

Good. To recap, we used commands.NoSuchSubcommand, which raises an InvocationError for us, and we used a * rule to define a default value for properties whose names are within a particular "property namespace". That is, any name we look up in peak.running.shortcuts from our configuration file, that isn't explicitly defined there or in our app, will return the commands.NoSuchSubcommand class. That's just what we want for now.

Actually... there is still one more problem. commands.Bootstrap also accepts URLs on the command line by default. commands.Bootstrap provides a way to turn that behavior off, though. We just need to override a flag in our HelloWorld class:

1 class HelloWorld(commands.Bootstrap): 2 3 acceptURLs = False 4 5 # rest of HelloWorld class goes here...]]>]]>

With these changes, our Bootstrap derivative will now do the right thing. Let's move on to the for command now.

Now, we know we're going to have to rewrite our storage.py to allow us to write to the database, but let's start this part of the task by writing the subcommand first. As you'll quickly see, any consideration of how we implement the saving of the data is virtually independent of how we go about initiating the save in the application program.

So, we need another rule in our hello configuration file:

]]> and another AbstractCommand subclass in commands.py

1 class forCmd(commands.AbstractCommand): 2 3 usage = """ 4 Usage: hello for <name>: <greeting> 5 6 Stores "greeting" as the greeting message for "name". 7 """ 8 9 Messages = binding.Obtain(storage.DMFor(Message)) 10 11 def _run(self): 12 13 if len(self.argv)<2: 14 raise commands.InvocationError("Missing arguments") 15 16 parts = ' '.join(self.argv[1:]).split(':',1) 17 if len(parts)!=2: 18 raise commands.InvocationError("Bad argument format") 19 20 forname, message = parts 21 22 storage.beginTransaction(self) 23 24 newmsg = self.Messages.newItem() 25 newmsg.forname = forname.strip() 26 newmsg.text = message.strip() 27 28 storage.commitTransaction(self)]]>]]>

To put a new object in our database, we ask the Data Manager for a new "empty" object, using newItem(). (Actually, it can have a preloaded default state, but we'll ignore that for now). Then we modify it just like we would any other writable object we got from the Data Manager, and the transaction machinery takes care of getting the data written to the backing store at transaction commit time.

At this point the for subcommand of our hello command is runable:

: Stores "greeting" as the greeting message for "name". for: Missing arguments % ./hello for foobar Usage: hello for : Stores "greeting" as the greeting message for "name". for: Bad argument format % ./hello for Jeff: Hi, guy! Traceback (most recent call last): File "/usr/local/bin/peak", line 4, in ? commands.runMain( commands.Bootstrap ) File "/usr/local/lib/python2.3/site-packages/peak/running/commands.py", line 70, in runMain result = factory().run() File "/usr/local/lib/python2.3/site-packages/peak/running/commands.py", line 211, in run return self._run() or 0 File "/var/home/rdmurray/proj/peak/helloworld/07writabledb/helloworld.py", line 53, in _run newmsg = self.Messages.newItem() AttributeError: 'MessageDM' object has no attribute 'newItem'

]]> Ah, yes. As you'll recall, we used a read-only Data Manager base class when we developed our database. So we can't store anything until we fix that.

OK, it's time to do some serious surgery on our Data Manager. First, we need to exchange our QueryDM base class for a base class that supports updating the database. That would be storage.EntityDM.

EntityDM requires two additional methods to be defined by the concrete class: _new, and _save. _new is called when a new object is added to the DM and needs to store the data for that object in the external database. _save is called when an object's state is changed, and a transaction boundry has been reached where that state needs to be synchronized with the external database.

Let's write the new storage.py:

1 from peak.api import * 2 from helloworld.model import Message 3 4 class MessageDM(storage.EntityDM): 5 6 defaultClass = Message 7 filename = binding.Obtain(PropertyName('helloworld.messagefile')) 8 9 def data(self): 10 data = {} 11 file = open(self.filename) 12 for line in file: 13 fields = [field.strip() for field in line.split('|',1)] 14 forname, text = fields 15 data[forname] = {'forname': forname, 'text': text} 16 file.close() 17 return data 18 19 data = binding.Make(data) 20 21 def _load(self, oid, ob): 22 return self.data[oid] 23 24 def _new(self, ob): 25 self._save(ob) 26 return ob.forname 27 28 def _save(self,ob): 29 self.data[ob.forname] = {'forname':ob.forname, 'text':ob.text}]]>]]>

That was easy. The _new() method is responsible for both saving state and returning the object ID of the new object. This is because _new is responsible for assigning object IDs. In this case, we simply return ob.forname, since that's what we're using as an object ID, after calling self._save(ob). Let's run the script, and try it out:

]]>

Oops. All we did was update our in-memory data dictionary. We didn't save it to disk, so the change didn't stay in place for long. How can we fix that?

If we look at the storage.IWritableDM interface (see peak help storage.IWritableDM), we'll see that it includes a flush() method. flush() is called as part of the transaction commit process, and the default implementation of this method in EntityDM is what calls our _save() and _new() methods for the appropriate objects. If we define our own version of flush() that first calls the standard flush() and then writes our data array to disk, we'll be all set:

1 def flush(self,ob=None): 2 super(MessageDM,self).flush(ob) 3 file = open(self.filename,'w') 4 for forname, data in self.data.items(): 5 print >>file, "%s|%s" % (forname,data['text']) 6 file.close()]]>]]>

But wait. What if there's an error while writing the file? What is going to happen to the original file? Since we're opening the existing file for output, we'll have already erased our original data. That's not good.

We need a mechanism for writing files that can roll back or commit, just like the transaction as a whole. PEAK has a peak.storage.files module with two classes we can use for this: TxnFile and EditableFile. Because we're dealing with such a small file, and can load it all in memory at once, we'll use EditableFile, which offers a more convenient interface for such files. Let's take a look at the part of the output from peak help peak.storage.files that covers EditableFile:

]]>

Yep, that looks like what we need. We should be able to easily load and save our data by reading or writing to the EditableFile object's text attribute, especially since we will already be inside a transaction whenever we use the data manager.

Okay, so let's fix up storage.py to use EditableFile:

1 from peak.api import * 2 from peak.storage.files import EditableFile 3 from helloworld.model import Message 4 5 class MessageDM(storage.EntityDM): 6 7 defaultClass = Message 8 filename = binding.Obtain(PropertyName('helloworld.messagefile')) 9 10 file = binding.Make( 11 lambda self: EditableFile(filename=self.filename) 12 ) 13 14 def data(self): 15 data = {} 16 for line in self.file.text.strip().split('\n'): 17 fields = [field.strip() for field in line.split('|',1)] 18 forname, text = fields 19 data[forname] = {'forname': forname, 'text': text} 20 return data 21 22 data = binding.Make(data) 23 24 def _load(self, oid, ob): 25 return self.data[oid] 26 27 def _new(self, ob): 28 self._save(ob) 29 return ob.forname 30 31 def _save(ob): 32 self.data[ob.forname] = {'forname':ob.forname, 'text':ob.text} 33 34 def flush(self,ob=None): 35 super(MessageDM,self).flush(ob) 36 self.file.text = ''.join( 37 ["%s|%s\n" % (forname,data['text']) 38 for forname, data in self.data.items() 39 ] 40 )]]>]]>

We hardly changed a thing. Instead of opening self.filename to read and write the data, now we simply split or join self.file.text. The EditableFile will automatically handle writing the new data to a different filename, then renaming it and replacing the old file. It'll also automatically discard the new file if the transaction is aborted for any reason.

Speaking of aborting, there's actually still a bug in this DM. If a transaction is aborted, the DM may or may not have called _new(), _save() and/or flush(), one or more times already. The EditableFile will take care of resetting itself if a transaction is aborted, but our data dictionary could wind up out-of-sync with the file.

An easy way to do this would be to override the abortTransaction() method, similar to what we did for flush(), and delete the data dictionary if the transaction is aborted:

1 def abortTransaction(self, ob): 2 self._delBinding("data") 3 super(MessageDM,self).abortTransaction(ob)]]>]]>

Now, if the transaction is aborted, the data attribute gets deleted, and the next time we try to use it, our binding.Make() wrapper will re-run the function that creates the dictionary from the EditableFile. EditableFile does something similar to this, so when we access its text again, it will have reverted to whatever was last stored on disk, not what we changed it to.

There are a few other things to notice about our revised DM. We're still getting the filename from that same configuration variable. Now, however, we are turning that into an EditableFile. Again we use binding.Make to create a descriptor that will return a real value (and cache it) when the class attribute is actually accessed. We used a lambda expression here instead of a function, as this is more readable when there's only a single expression being executed.

Anyway, with these changes in place, our for method should now be working:

]]>

At this point certain readers may be getting antsy because there's a flaw in the forCmd implementation. As we wrote it, the for command assumes that it's always creating a new Message, even though the forname may already exist in our primitive "database".

For our current example, this doesn't actually cause any problems: because of the way we're updating the "database", it doesn't matter if the item is new or an update. But, we don't want to rely on this implementation quirk, and when we move to an SQL database later on, it will matter quite a bit whether we're adding or updating.

To fix this, we need to change our for command to check whether the name exists, and then either update the existing Message object, or create a new one, as appropriate. In order to do that, we need to be able to ask the DM whether or not a given key exists. Since we're using the forname as the object id, we can handily provide a way to do it by adding a __contains__ method to the DM:

1 def __contains__(self,oid): 2 return oid in self.data]]>]]>

Now we can update our forCmd._run() method in commands.py:

1 def _run(self): 2 3 if len(self.argv)<2: 4 raise commands.InvocationError("Missing arguments") 5 6 parts = ' '.join(self.argv[1:]).split(':',1) 7 if len(parts)!=2: 8 raise commands.InvocationError("Bad argument format") 9 10 forname, message = [part.strip() for part in parts] 11 12 storage.beginTransaction(self) 13 14 if forname in self.Messages: 15 msg = self.Messages[forname] 16 else: 17 msg = self.Messages.newItem() 18 msg.forname = forname 19 20 msg.text = message 21 storage.commitTransaction(self)]]>]]>

With this change, updating the database should still work:

]]>

Sharp-eyed readers will notice that the __contains__ method we wrote does double the normal work for retrieving an item, because it actually "loads" data, by accessing the "database". Then, if the item exists in the database, the _load() method will access the database again. For our in-memory database, this is no big deal, but it will be more important when start using SQL. Let's change our approach. We'll replace the __contains__ method with a get method:

1 def get(self,oid,default=None): 2 3 if oid in self.data: 4 return self.preloadState(oid, self.data[oid]) 5 6 return default]]>]]>

This method will either retrieve the object, or return the default, which is the standard python signature for get. To support only retrieving the object once (as well as various other situations), DM's have a preloadState(oid,state) method. This method creates a pre-loaded object, using state, instead of calling _load to get the state. (Remember, what we have stored in data is a dictionary containing the values for the various object attributes, which is the state from the DM's point of view).

So, our get() method can load the state from our "database", and then preload it into the object it returns.

There is actually still a minor inefficiency here: we're always checking the "database", even if the object we want is already loaded into memory. We can make this slightly more efficient by changing it to:

1 def get(self,oid,default=None): 2 3 if oid in self.cache: 4 return self.cache[oid] 5 6 elif oid in self.data: 7 return self.preloadState(oid, self.data[oid]) 8 9 return default]]>]]>

DM's have a cache attribute that holds onto currently loaded objects, so that multiple requests for a given object ID will always return the same object. So, by checking it here first, we can avoid doing the lookup in self.data if the requested object is already loaded.

These minor changes are of little or no consequence to our current app, but will have more impact when we move to using SQL, as every self.data lookup is going to end up as an SQL query.

Let's finish out our refactoring by updating forCmd to use our new get() method:

1 def _run(self): 2 3 if len(self.argv)<2: 4 raise commands.InvocationError("Missing arguments") 5 6 parts = ' '.join(self.argv[1:]).split(':',1) 7 if len(parts)!=2: 8 raise commands.InvocationError("Bad argument format") 9 10 forname, message = [part.strip() for part in parts] 11 12 storage.beginTransaction(self) 13 14 msg = self.Messages.get(forname) 15 16 if msg is None: 17 msg = self.Messages.newItem() 18 msg.forname = forname 19 20 msg.text = message 21 22 storage.commitTransaction(self)]]>]]>

There. That even simplifies the logic a little. Note, by the way, that we do not pass Messages.newItem() as the default argument to get(), because that would do two wrong things: 1) it'd create a new object that would be added to the database at transaction commit, even if we didn't need it, and 2) it wouldn't set forname on the new message. We could work around problem #2, but not problem #1. Using the newItem() method of a DM always creates an object that the DM will attempt to save when the transaction commits, even if you don't keep the object around that long. So: never call newItem() unless you want the object to be added to the database. (Note: it's possible to write a DM that doesn't behave this way, and only saves an object if it's referenced from other objects or some kind of "root" object. We're just not going to show you how in this tutorial!)

Anyway, that about wraps it up for creating a practical EntityDM subclass.

Here's the recap for what we've learned in this lesson (once again, it's quite a lot!):

Subcommands

The peak script is based on commands.Bootstrap, a commands.AbstractInterpreter subclass that runs a "subcommand" specified as its first argument.
commands.Bootstrap looks up non-URL commands in the peak.running.shortcuts property namespace
Raising commands.InvocationError in a command's _run() method causes the command's usage message to be displayed, along with the text of the InvocationError instance.

Configuration

Application configuration files load after peak.ini, and only explicit settings in an application config file override those in peak.ini
Using an asterisk (*) as the last part of a property name, defines a rule for all undefined properties in that property namespace. So, defining a rule for peak.running.shortcuts.* provides a value for any undefined property whose name begins with "peak.running.shortcuts." (Important: note the trailing dot).

Writable DM Basics

The newItem() method of a writable data manager (storage.IWritableDM) returns a new, "empty" object which will be saved when the transaction is committed.
The storage.QueryDM class does not support adding or modifying items, but storage.EntityDM does
The _new() method of an EntityDM must add the new object to the underlying database, and return the object ID to be used for the object
The object ID returned by _new() can be a newly-generated ID, or it can be a primary key field.
The _save() method of an EntityDM must update an existing object in the underlying database

Extending EntityDM

The flush() method of a writable data manager is called to write in-memory state to external databases. It can be overridden so that _new() and _save() can batch the data to be written, and then flush() can perform the batch in bulk.
If you are already accessing an underlying database (e.g. to check if an object exists, or when performing a mass query), you can use the preloadState() method to retrieve an object from the DM. The DM will not call its _load() method, but instead use the state that you supply to the preloadState() call.
DM's have a cache attribute, that caches "ghosts" and active objects used in the current transaction, to ensure that multiple requests for the same object ID will return the same object.
If you need to be able to tell if an item exists in a DM, it can be handy to implement a get() method, that checks the DM's cache attribute before checking the underlying database, and finally using preloadState() to retrieve the now-loaded object.

Miscellaneous

peak.storage.files.EditableFile provides an easy way to transactionally alter the contents of a file small enough to be held in memory. It works with the PEAK transaction framework to ensure that existing data isn't lost in the case of a write failure or other error, and that rolled-back changes are in fact rolled back.

Up: IntroToPeak Previous: IntroToPeak/LessonTwo Next: IntroToPeak/LessonFour