[TransWarp] First attempt to use the storage package in PEAK

Tue Dec 24 14:11:08 EST 2002

At 11:19 AM 12/24/02 +0200, Roché Compaan wrote:
>Hi Philip
>
>Here is my first quick attempt to use the storage package in PEAK.
>Creating new instances work fine but loading and saving doesn't work
>yet.  My use of the binding package is still a shot in the dark. Can you
>maybe comment on the way I used the various parts of PEAK below:
>
>from peak.storage.SQL import MySQLConnection
>from peak.api import storage
>from peak.api import binding, model
>from peak.model.datatypes import String

Sure.  These classes:

>class Address:
>
>     server = binding.requireBinding('server')
>     db = binding.requireBinding('db')
>     user = binding.requireBinding('user')
>     passwd = binding.requireBinding('passwd')
>
>
>class ContactsDB(MySQLConnection):
>
>     __implements__ = storage.ISQLConnection
>
>     address = Address()
>     address.server = 'localhost'
>     address.db = 'Contacts'
>     address.user = 'roche'
>     address.passwd = 'mypasswd'

are unnecessary and can be deleted.  I'll show you how, later on below.

>class Contact(model.Element):
>
>     Name = binding.New(String)
>     Surname = binding.New(String)
>     HomePhone = binding.New(String)
>     WorkPhone = binding.New(String)
>     Email = binding.New(String)

class Contact(model.Element):

      class Name(model.Field):
          referencedType = String

      class Surname(model.Field):
          referencedType = String

      #etc...

But it isn't actually necessary for you to do all that; if you don't care 
about any of the special capabilities provided by 'peak.model' bases and 
metaclassses, you can just subclass Persistence.Persistent and be done with it.

Also, 'peak.model' has a significant refactoring pending, and it's not well 
documented.  For the time being, you may not want to bother with it.  But 
if you want examples of its use, look at:

peak.metamodels.MOF131,
peak.metamodels.uml.MetaModel, and
peak.metamodels.uml.Model

More or less in that order, with the caveat that MOF131 is untested and 
might contain errors, or that it might have code that depends on 
refactorings in 'peak.model' that haven't actually been done yet.  :)

>class ContactDM(storage.EntityDM):
>
>     defaultClass = Contact
>
>     attrs = ['Name', 'Surname', 'HomePhone', 'WorkPhone', 'Email']
>
>     DBConn = binding.bindTo(storage.ISQLConnection)

Replace the line above with:

      DBConn = binding.bindTo("mysql://roche:mypasswd@localhost/Contacts")

This suffices to instantiate a MySQLConnection.  You also don't need to 
import MySQLConnection.  This is one of the things that the naming system 
is for.  :)

>     def defaultState(self, ob):
>         state = {}
>         for attr in self.attrs:
>             state[attr] = ''
>         return state
>
>     def load(self, oid, ob):
>         print "load"
>         result = self.DBConn('SELECT * FROM Contact WHERE oid=%s'%oid)
>         state = {}
>         for r in result:
>             for attr in self.attrs:
>                 value = getattr(r, attr)
>                 state[attr] = value
>         return state

Another way to specify the above is:

def load(self, oid, ob):
     return dict(
         (~self.DBConn('SELECT * FROM Contact WHERE oid=%s' % oid)).items()
     )

Two things are happening here...  first the '~' means you want exactly one 
row from the result.  If there are more or fewer, an exception will be 
raised.  Second, database result rows support a mapping interface, 
including items(), keys(), etc.  So you could also write:

def load(self, oid, ob):
     state = {}
     for r in self.DBConn('SELECT * FROM Contact WHERE oid=%s'%oid):
         for attr in self.attrs:
             state[attr] = r[attr]
     return state

Anyway, there's several ways to skin this cat.  If you knew that the 
columns were in the same order as the 'attrs' list, you could do:

def load(self, oid, ob):
     row = ~ self.DBConn('SELECT * FROM Contact WHERE oid=%s'%oid)
     return dict(zip(self.attrs, row))

Which would pair each self.attrs value with its corresponding column value.

>     def new(self, ob):
>         print "new"
>         sql = "INSERT INTO Contact (%s) VALUES (" % ','.join(self.attrs)
>         values = []
>         for attr in self.attrs:
>             values.append('"%s"' % getattr(ob, attr))

I'd recommend '%r' instead of '%s', or that you otherwise do the necessary 
SQL quoting on the attribute value, in case it contains quotes.

>         sql += ','.join(values) + ")"
>         self.DBConn(sql)
>
>         # get new oid
>         sql = """SELECT MAX(oid) FROM Contact"""
>         for r in self.DBConn(sql):
>             return r[0]

I should remind you that doing a SELECT MAX() is subject to race conditions 
in a multi-process environment.

>     def save(self, ob):
>         sql = "UPDATE Contact SET "
>         sql_where = "oid = %s" % ob._p_oid
>         values = []
>         for attr in self.attrs:
>             value = getattr(ob, attr)
>             values.append('%s = "%s"' % (attr, value))
>         sql += ','.join(values)
>         sql += sql_where
>         self.DBConn(sql)

Looks fine.

Replace these next few lines...

>db = ContactsDB()
>db._open()
>
>myDM = ContactDM()
>myDM.DBConn = db

with just:

myDM = ContactDM()

The rest will happen automatically.

># Create a new instance
>storage.begin(ContactDM)
>ni = myDM.newItem()
>ni.Name = 'John'
>ni.Surname = 'Smith'
>ni.HomePhone = '888 0000'
>ni.WorkPhone = '999 0000'
>ni.Email = 'john at smith.com'
>storage.commit(ContactDM)

Looks good, but that should be 'storage.begin(myDM)', not 'ContactDM'.  You 
want the *instance* to join the transaction, not the class.

Since your code is using only a single global transaction, you can also 
simplify to just 'storage.begin()' and 'storage.commit()', since passing 
the DM in isn't going to cause it to find a different transaction service.

># Retrieve and modify existing instance with oid=1
>storage.begin(ContactDM)
>ni = myDM.preloadState(1)

storage.begin(mYDM)
ni = myDM[1]

The "preloadState()" method is not for this purpose.  Use __getitem__ to 
retrieve objects.  'preloadState()' is for collaboration with query 
DMs.  Consider a report that is a contact list.  You want to do a select to 
get a lot of contacts, but if you work with the individual contact objects, 
you have to select again.  'preloadState()' allows you to bypass this, by 
having the query DM pass preloaded rows to the entity DM.  So if you had a 
ContactList query DM, when it selects multiple rows, it can call 
ContactDM.preloadState() with an 'oid' and a usable 'state'.  If the object 
is in cache, its state is updated, or if not, a fresh ghost is created and 
loaded with the state.  In this way, you can mass load objects into the 
cache, or load them one at a time as you iterate through the rows of a 
reporting query.

># I would have thought that the object will get loaded here but it
># doesn't.  If I call print ni.HomePhone before assigning to it the load
># method on the DM does get called.

Right.  That's because you didn't pass a state to 'preloadState()'.  In any 
case, __getitem__ doesn't preload the state, and that's what you should use.

Lazy loading in this fashion is a big advantage for relational 
models.  Let's say that Contacts were associated with a Department.  When I 
load the state for the Contact, I could say something like:

state['Department'] = self.DepartmentDM[row.departmentID]

The department object will not actually be loaded yet, until you need to 
touch it.  This is good because a department might have a parent department 
and so on, and maybe a document repository, or various other things.  You 
don't want your entire database to get loaded in order to look at just one 
object.  :)

Anyway, this lazy-load model is not restricted to PEAK.  It's standard for 
ZODB persistence and has been in Zope since the beginning.  And ZPatterns 
implements it too, but on a per-attribute basis.  PEAK mostly follows the 
ZODB pattern of loading the object's state all at once, but it also 
supports LazyLoader objects which can be used to load selected attributes 
even later than "object state load" time.

>ni.HomePhone = '777 0000'
>storage.commit(ContactDM)

storage.commit(myDM)