Up:
(Note: This lesson has not yet been edited by PJE)
OK, the big moment has come (hold your chuckles!). Management is impressed with our demo program, and they want us to take it into production, for use in greeting customers. There's a wrinkle, though: they aren't interested in having a custom greeting for every customer. Instead, they want us to access the corporate PostgreSQL database, check the group to which each customer has been assigned, and use a greeting associated with the customer's group to greet the customer.
So in this lesson we're going to learn how to use PEAK facilities to work with an SQL database, how to combine data from more than one database but keep them all in sync, and we'll also get to see just how flexible a PEAK program can be in the face of change. Along the way we'll introduce PEAK's "n2" utility, and give you a small glimpse of how powerful PEAK's naming system is. Oh, yeah, and we'll exercise the configuration system a bit more, too.
Then we'll finish it all off by throwing away our version 1 model and storage implementation program and writing version 2 to match the new reality.
To do our testing, we'll use an SQLite database. If you don't
have SQLite on your system, you can get it from
Contents
Remember back at the beginning when we took a look at the usage
information for the peak command? The first argument to the peak
command is described as "NAME_OR_URL", where a name is something
defined in the peak.running.shortcuts
section of an ini
file,
and URL is "of a type supported by peak.naming".
You should already be familiar with 'peak.running.shortcuts', including how to define your own subcommands using it, from Lesson 3. Now we're interested in that second bit, the one about the URL.
PEAK provides a general way to refer to "resources" via "names".
Since "URL" stands for "Uniform Resource Locator", it's a natural
syntax for PEAK to use. A URL can be quite complex syntactically,
but we're only worried about the highest level view for now. We
can divide the URL into two parts at the first :
: the part before
the ":" is called the "scheme", and the part after the :
is called
the body. When the peak naming system interprets the body in the
context of the scheme, we get an object.
The peak command further requires that the referenced object support one of the `peak.running' interfaces or be callable. That's important: the naming system will look up anything it understands, but something that uses the naming system to look things up may have additional constraints on what referenced objects will actually work when specified in such a URL.
PEAK of course makes schemes and production rules completely
configurable. The peak.naming.schemes
section of the
peak.ini
file looks like this:
Note that while we are calling these URLs, and they are URLs, they aren't URLs in the sense most people think of them: things you can type into your web browser and have it do something useful. If you try typing `shellcmd:"ls -l"' into your browser navigation bar, it's going to ask you if you are out of your mind. Well, OK, it won't do that. But it will give you an error message that indicates it doesn't know how to handle a "shellcmd" URL. Note that it isn't telling you that the URL is invalid, it's telling you it doesn't know how to handle it. So, a lot of the URL schemes used by PEAK are what we might call "private" URL schemes: they aren't really usable outside of PEAK, which does know how to handle them.
So, what does this have to do with our program? Well, observe
that a number of the schemes supported by peak have to do with
sql. In particular we can clearly see sqlite
in there.
psycopg
, it turns out, is for postgres. So, PEAK has a way
of refering to SQL databases. The general form is this:
//
. This is true in general for
PEAK URLs, and in fact the cannonical representation of PEAK URLs
omits them. That is, if PEAK echos back a URL it has parsed, it
will omit the slashes following the colon.
OK, so now we know how to point PEAK at our databases. Let's try it:
Well, that doesn't seem very useful. But it did something. It opened atest.db
database and returned an object, as
promised. In this case, an object representing a "database
cursor" pointing to the database.
But wait, we haven't actually created that database yet, so how could peak have returned a pointer to it?
Ah, so sqlite created an empty database for us to access.That's fine, but it doesn't seem to be helping us to
actually work with the SQL databases. To do that, we
need to learn about another PEAK tool: n2
.
"N2" stands for "Namespace Navigator", and it is a general purpose
tool for browsing and manipulating namespaces, and it will also
allow you to interact with any resource for which someone has written
a navigator adapter. It is available as a subcommand of the
peak
command:
peak n2
is also a
pun on "peek into".
N2 uses python's readline library, which means that at the n2 command line you have available to you all of the facilities of python's readline (ie: gnu readline if available for your platform), such as command line editing and command history.
Why is n2
of interest to us now? One of the n2 adapters avaiable
is one that will allow us to talk to an SQL database. Not that
that's the way our program will talk to the corporate database,
mind. Rather, we want to use this tool to make it easy for us to
set up our test SQL database so we can test our modidified program
against it before we try it on the corporate database.
The first thing we need to do is figure out the tables, columns. and data types from the corporate database with which we'll need to interact, so we can set up our test database to emulate them. So let's use the SQL interaction features of n2 to take a peak into the corporate database:
You'll note that the prompt is different this time. This is because we are talking to a different "Interactor". The default interactor is the one for exploring PEAK namespaces. But because we referenced an object that supports an SQL Interface, n2 picked up the sql Interactor and is using it instead. This interactor is allowing us to enter SQL commands as the normal thing. So most n2 commands are preceeded by a "\". We can enter SQL commands, terminated by semi-colons, and they will be applied against the databse.We're looking for the tables and columns relevant to our application.
There ought to be a way to list things in the database. In the
python interactor there was that attractive looking ls
command,
but we don't have that here.
describe
sounds like it might be interesting. Let's try
that:
OK, so we can see that the customer table has a column for customer
name, and GRP
sounds like it is the group they are assigned to.
So now we know the columns we need to refer to, and how they are
defined. Now we just need to replicate this much of the corporate
table structure in our test database.
OK, now let's connect to our test database:
What we need is a test customer
table that maps customer names
to customer groups, since that's what we'll be accessing in the
corporate database. In production our application will be using
the strings from the corporate database, so we can use any old
strings we like for our tests.
Note that the "commit" step, which is an n2 command and not an sql
command, is critical. N2 starts a transaction when it connects to
the database. The commit
command causes n2 to tell the database
to commit the current transaction. If you don't do that, none of
the changes made to create the table and insert the data will
actually take effect.
One of the commands listed in response to help
at the
n2 prompt for our sql database was \python
. Let's give
that a try:
c
is our SQL connection object. Now, what can we do
with that? Well, as we've implied, an SQL connection
provides the ISQLConnection
interface: Here I've abreviated the help display to the part we're
interested in.
So, our connection object is callable, and it returns a
database cursor. And it says positional arguments are
passed to an execute
function. So, what happens if we
pass it an sql query string?
Right, so now we know how to work with our SQL database from python, and we've confirmed that our test database is defined and populated.
In introducing the n2 command I said it could be used with "anything for which someone has written an navigator adpater". This is a topic worth expanding on, since it is a concept that is very central to the way PEAK works.
For the full skinny on Adapters and Adaptation, you should read
The short version is that Adaptation is about taking the object
you've got, and getting the object you want. The PEAK system,
through PyProtocols
, provides the infrastructure for doing this,
making it very simple to use in practice. In fact, you end up using
adaptation all the time in PEAK without really being aware of it.
So, what does it mean to "get the object you want"?
Suppose you have something, say a connection to an SQL database. But what you (in this case, "you" is the n2 subcommand) really want is an object that has a bunch of command hooks on it for interacting with the object. That set of "hooks" (method calls and attributes) constitutes a "protocol" for talking to the object in a well defined way and getting well defined results. The SQL connection object, by virtue of providing an interface (another protocol) for talking with an SQL database, clearly provides the ability to interact in the way n2 is looking for. But it doesn't have the specific hooks that would make it really easy for n2 to talk to it.
So what n2 does is to hand the PyProtocols
subsystem the SQL
connection object it has, and asks for an object that supports the
protocol it wants (IN2Interactor
) in return. PyProtocols
looks
in the table that has been built through the declarations made in
various modules, and finds a pointer to a piece of code that can do
just what n2 wants: wrap up an SQL connection object in another
object that can act as an intermediary (an "adapter") between the
protocol n2 wants and the one the SQL connection object actually
has.
PEAK uses this concept throughout its codebase. Instead of subcommands and other code having a bunch of if-then-else statements to check what kind of object it has been handed and then branching off to code to handle those cases, PEAK simply "adapts" the object in question to the kind of Interface (protocol) it needs. Elsewhere in the codebase will be pieces of code ("adapters") that intermediate between the actual objects and the desired interface. These pieces of code will be declared to adapt from some type of object (defined most often by some Interface that that object supports) and to "provide" a specific protocol (again defined most often by a specific Interface).
So somewhere in PEAK, there's a bit of code that knows how to take
an object that provides an ISQLConnection
, wrap it up, and provide
an IN2Interactor
to the code that asked for it. The module
containing this bit of code declares that this is so. And the
PyProtocols
adapt
function takes care of making it happen when the
n2 code asks for it.
The fantastic thing about this arrangement is that someone can come
along later and write an adapter that takes, say, an IIMAPConnection
and provides an IN2Interactor
interface. They can declare this
adapter in some module completely apart from either n2 or the IMAP
module. And then, supposing there is already a naming scheme for
addressing an IMAP connection (say imap:username:password@host
),
we'd be able to say
So, how do we access the database from our python program, as
opposed to the n2 tool? Well, first of all, peak wouldn't
have a "naming system" if it didn't use it pretty much
everywhere. So you probably won't find it too surprising
that, just like the peak
command and the peak n2
subcommand,
the naming.Obtain
function also accepts anything handled
by peak.naming
, as long as it supports the IComponentKey
Interface or can be adapted to it.
Yes, this is another example of PEAK using adaptation, so you can
see why I took some time to cover it. In this case, the Obtain
method, when handed an argument, simply tries to adapt it to the
IConfigKey
protocol. Many things are adaptable to IConfigKey
including, in this case, strings such as sqlite:test.db
.
But before we start accessing the test database from our code, let's think a little about the fact that this is a test database we're working with. We know we're going to have to switch which database we access eventually. And that's something that could change in the future; after all, they might move the database to some new machine someday.
So really we should put the pointer to the database into
the configuration file instead of burying it in the code.
The naming system provides a little utility for facilitating
this called LinkRef
:
helloworld.db
configuration variable.
Hmm. But that isn't really quite what we want, either. After all, our project group is working on other projects using PEAK, and some of those will need to access to this same corporate database. It would be a shame to have to go to the config file for each app and adjust it if the database gets moved.
PEAK of course provides a way to deal with that. We can create a
special section in our ini
file named Load Settings From
, and
in that define a file
variable pointing to another ini
file.
(Footnote: actually, you might want to use ZConfig
here rather
than an ini
file, since ZConfig
is probably better for "end
user" configuration tasks, but that's another topic I'm not ready
to delve in to.)
So we'll create a new file, global.ini
:
hello
file: In our deployed application we'll actually want to
use either a hardcoded path name, or reference to a
module related to the shared parts of our corporate PEAK
applications.
This leads to another interesting point, though. If this
global.ini
file were already deployed, and we needed some
of the settings in it, but we wanted to use our test database
rather than the real customerdb, what could we do? Naturally
enough, all it takes is to add a [corporate]
section to
our hello
config file, and define in there the config
variables we want to override.
We're finally ready to use the database.
Before we do this the right way, though, lets conduct a little exercise. PEAK is designed to allow you to develop applications that are flexible and easy to modify. Suppose we were trying to do this database change on an emergency basis, and we wanted to touch as little of our existing system as possible? Let's see what we can do.
Our domain model pairs names with greetings. As far as the user
of the to
subcommand goes, that's all that matters. So lets
see if we can maintain that pairing while using the data from
our new database.
First, we'll need a reference to the database as a class variable in our Data Manager.
Next, we'll need to change our _load
method to get the right
message, using auxiliary info from the customer data base. Remember,
our database connection object is callable; if we pass it an SQL
command, we'll get back an iterable cursor containing the results:
What's that funny looking ~
doing in front of our database
call? The ICursor
interface, which is an interface supported by
the type of object returned by a DM (Data Manager, not to be confused with Domain Model),
defines the python unary
negation operator to be the function oneOf
. oneOf
will raise
an error if there is anything other than one row accessable from
the cursor. If there is only one row, it returns it.
Since we're sure we're only going to get one record back, we use
oneOf
("~") to pick it out. Using this operator also has the
advantage that if we don't get one and only one row back, it
will throw an error. This follows the PEAK philosphy of failing
as soon as possible when something that shouldn't happen, does.
As we saw above, the "row" object we get back has our select columns
as attributes. So we get the data from the GRP
column for this
user, and use that to look up our message. You'll note that
I'm back to doing "%" substitution on the resulting message, so
that we are in fact personalizing our message for each customer.
If corporate doesn't like that, it's easy enough to back out.
Nothing else in our application has to change. Although
our usage message for the for
subcommand is now wrong (it
sets the message for the group, not the thing being greeted),
it works just fine for storing messages into our database
under the group names:
get
function finds that this forname
doesn't
exist, so our toCmd
creates a new object, thereby avoiding
the error we'd otherwise get when it tried to load the non-existent
forname
. This is logical, but the quirk that makes the app
still work is that our storage implementation for the message
database will update the message even though we did a
newItem to get the object. This is not something that a
good app design should depend on, so we'll fix it right in a
moment.
The moral of this story is that while PEAK's excellent separation of concerns often makes it possible to finess a fix, one should not, in general, try to bend an incorrect domain model to a purpose at variance with its design. Instead, the domain model should be fixed.
Before we do that, though, let me point out another nice feature
of PEAK, even though it isn't useful in our simple example. Observe
that the SQL database object participates in the transaction. (In
fact, there has to be an active transaction for the sql database
function to work.) If our subcommands were in fact updating
the SQL database, both the changes to it and the changes to our
EditableFile
would be kept consistent across PEAK transaction
boundaries. If our PEAK transaction aborts, all database
changes, to whichever database, are rolled back (assuming the
database supports transaction abort, of course).
Also, to go production all we need to do is change that customerdb declaration in the configuration file. The PEAK database object hides all the details of the different databases from us. (Well, as long as we can stick to standard SQL it does, anyway.)
If you think about what you just learned about how PEAK can be used to tie together disparate, possibly legacy databases in the corporate setting, and maintain enterprise-level control over the integrity of the data, you can see why PEAK is an "Enterprise Application Toolkit".
All right, enough of taking shortcuts. Let's fix this application the right way.
Our domain model has actually changed: instead of the domain consisting of "Message" objects that are associated with particular things to greet, we now have "Customers" that belong to "Groups", and it is the groups that have messages associated with them.
So, if we are going to bring our app in tune with the new reality, we'll have to start by revising our domain model:
The most interesting thing to notice about this new model is that
the reference type for the group attribute of our Customer
is
actually the Group
element we define. When we use this model,
we're going to actually store references to Group objects in the
group attribute of the Customer objects. But we'll do it the PEAK
way.
As long as we're rewriting anyway, we might as well get rid of that clunky file and move to a real database for our messages. Perhaps someday they'll move into a table on the corporate database, but for now we'll stick with our SQLite database, since it has proven its worth so far.
We'll replace the messagefile
configuration item in our hello
file
with one for the new database:
We also need to create the database and the messages
table.
As when we created the customers
database, n2
is handy here:
Our storage.py
file will see the greatest amount of change.
Let's start with the simpler class, the Data Manager for the
Customers. This data is read only, so we can use QueryDM
here:
As you can see, there's some interesting stuff happening here.
First, we're using the binding system to pick up a reference to the Data Manager for the group. Even though we don't define that until later in the file, we don't have a reference problem, because the dereference doesn't happen until runtime.
Second, we're returning an instance retrieved from the GroupDM
as the value of one of our attributes. This kind of arrangement
is why Data Managers return ghosts: when we access a Customer
object, even though it contains a reference to a Group object,
no system activity (ie: reading from databases) happens because
of this until we actually try to reference an attribute of
the Group. Thus in PEAK inter-object links are not expensive,
even when the other objects are backed by potentially slow
external databases and/or occupy large amounts of memory when
active.
OK, here's the GroupDM
, which is not much less complicated
than our old MessageDM
from before we added the customers,
but which shows how to use SQL to do the same stuff:
We no longer need the Flush
or abortTransaction
methods,
because we no longer have any local data structures we need to keep
synced. The SQL connection will take care of flushing and aborting
the database.
We do, however, now actually need different methods for the _new
versus _save
case, because the SQL commands to update a record
are very different from those used to add a record.
To implement a get
method similar in efficiency to the one we had
for our file based database, we introduce a local helper method
_getstate
. This is the method that actually reads from the
database and picks up the data. get
uses this to see if a record
exists for the oid in question, and if it does uses preloadState
as before. _load
uses this helper method to get the state and
return it. Note that now _load
needs to raise an error if the
record doesn't exist; before, we were letting oneOf
do that for
us. Doing it this way has the happy consequence that the error
message presented to the end users will be more intuitive. Note
also that if oneOf
detects too many rows, that error will not
be caught, and the resulting error message will tell the user exactly
what they need to know (there are too many records for this oid in
the database).
Now, of course, our application needs to use the new domain objects:
The changes here are relatively small, almost trivial. The biggest
change is that we are getting the message from the greetingtemplate
attribute of the group
attribute of a Customer
object. The
changes in for
are mostly to the usage string and some variable
name changes, just to be consistent.
If we test this program, though, we'll quickly find a design error:
Woops, we forgot about our tailored messages.How to solve this? Well, clearly a tailored message is more a
property of the Customer
proper. So we can add something to
our domain model: a greeting
function. And we can take
advantage of this to deal with the case where our bosses are
serious about not wanting tailored messages. We'll allow
for messages that don't have a substitution point:
Now our print
line in commands.py
becomes:
And now everyone is happy:
Up: