[TransWarp] making progress - examples in cvs

Fri Feb 21 22:06:49 EST 2003

At 11:38 PM 2/21/03 +0100, Ulrich Eck wrote:
>hi out there,
>
>for any interested people an update on our work:
>
>in our mission to build a scriptable user-management
>(ldap-accounts + imap-mailboxes) we make good progress using
>the latest and greatest peak-capabilities.
>
>you can have a look at our code at:
>http://cvs.net-labs.de -> apps -> acmgr

Wow.  Fantastic!  I'm enthused at what you've accomplished, but I have a 
few comments about some things that I'd like to "un-recommend" to anyone 
who is looking at the code for examples, because they should not be 
emulated.  I also have some things I want to single out for praise as good 
examples; these are mixed here in no real order...

* model.Package is deprecated; I'd suggest getting rid of the IMAPModel and 
just putting all its classes into the module, and using the module itself 
in place of the IMAPModel class.

* The acmgr.connection.imapmodel.MessageHeaderField class has some 
issues.  First, you're breaking persistence notification by modifying a 
stored mutable, unless the mutable itself is Persistent.  Second, the 
proper place to perform format conversions of this sort is in the DM 
_load() method, either by performing the computation or inserting a 
LazyLoader.  Third, you're breaking validation/observation capabilities by 
bypassing _link()/_unlink() - this might be okay if you know that the 
'data' attribute is implementing all the validation and observation you 
need.  Fourth, if you *really* need to redefine how an object stores its 
bindings, you can always override '_setBinding()' in the element 
class.  All in all, the MessageHeaderField class is very bad for separation 
of implementation issues from the domain model.

* Most of your features don't declare a 'referencedType'.  *This will break 
soon*.  I expect to add a validation hook to the StructuralFeature._link() 
method soon, that will expect to call a 'mdl_normalize()' class method on 
the referenced type, to normalize the value of a supplied input 
object.  This will let you do things like:

class OddNumber(Integer):

     def mdl_normalize(klass, value):
         if (value % 1):
             return value
         raise ValueError("%d isn't odd" % value)

     mdl_normalize = classmethod(mdl_normalize)

and then declare an attribute to have a type of 'OddNumber'.  The catch is 
that each feature *must* have a type object, or else define its own 
normalize() method.  So just an early warning.

* The 'parts' collection attribute is pretty cool; a nice use of verb 
exporting.  I would note, however, that it is not necessary to use the 
'%(initCap)s' format for the verbs if you don't want to.  In fact, since 
here you are only using the verbs for one feature, you could do:

             newVerbs = Items(newText='addTextPart',
                              newImage='addImagePart',
                              newAudio='addAudioPart',
                              newBase='addBasePart',
                              )

Or, if you wanted to reuse this for other MIME-part collections, you might 
declare use '%(singular.initCap)s', and then define 'singular = "part"' in 
the feature.  This would result in you getting the same method names as 
shown above.  (Using '%(singular.upper)s' would result in 'addTextPART', 
etc.  See peak.model.method_exporter.nameMapping for details.)

Anyway, all of this is really only useful if you need to reuse the method 
templates for another collection.  In your app, the effort is wasted except 
as a learning experiment in using MethodExporter techniques.  Note that you 
could have, with less effort, simply written 'addTextPart()', 
'addImagePart()', etc. methods in the body of the element class, using 
plain ol' Python and having them call 'self.addParts(part)' to do the 
actual addition, or 'self.__class__.parts.add(part)' if you wanted to 
bypass the exported 'addParts()' method for some reason.

* I recommend the idiom 'isChangeable = False' for clarity, in place of 
using '0' for false.  I'm also curious whether the status fields should 
really be represented as derived attributes or methods instead of as 
read-only attributes, but I don't know/remember enough about IMAP to give 
an informed opinion.

* Cool example of the use of the EntityDM.preloadState() in 
acmgr.connection.imap.IMAPFolderACLQuery!    For those of you who don't 
know what that is/what it's for, 'preloadState()' allows you to speed 
loading of objects when a QueryDM is loading data that could be used as the 
object states.  In this case, it's being used to ensure that the data an 
ACL query loads, is used to create and cache the ACL objects in the ACL DM.

* I would like to caution against the excessive use of '../foo/bar' type 
name paths.  I prefer to keep these infrequent for components that are 
potentially more separable and reusable, as most of the connection.imap 
module is.  I'd suggest that instead you use 'connection = 
binding.bindTo(IIMAPConnection)', for example, so that you simply connect 
to the nearest available IMAPConnection.  In this way, somebody who wants 
to reuse your component can simply drop an instance of it into their 
context, without having to hard-wire/override the 'connection' attribute of 
your component.  The object that owns the connection simply does (for example):

somethingIdontCareWhatItsCalled = binding.bindTo('somethingThatGetsIMAP', 
provides=IIMAPConnection)

Notice that the contained component doesn't have to care whether the 
connection is named 'connection' or not.  This principle can and probably 
should be applied to most of the other components you are linking to with 
'bindToParent()' and/or 'bindTo("../something")'.  If interfaces are too 
much trouble to define because you don't need them for anything else, you 
can always use property names instead, e.g.:

# client
dm = binding.bindToProperty('acmgr.IMAP.FolderDM')

# provider
myFolderDM = binding.New(IMAPFolderDM, 
provides=PropertyName('acmgr.IMAP.FolderDM'))

And finally, you can always use implicit acquisition 
(bindTo('someNameWithoutDotsAtTheFront')), as long as you choose relatively 
unique names.  Hardwired relative paths are really only appropriate for 
very tightly coupled components.  Because if you use them, your components 
*will* be coupled rather tightly whether you meant them to be or not.

Also, using interfaces or property names is much clearer as to intention 
than most other binding types.  If you bind to an interface, a reader then 
*knows* that you intend the attribute should implement the interface.  He 
also knows that he need only have something declared as that interface 
within the context of the component, in order to use it.

You may wonder how to deal with something like your 
'../folderACLDM/listACLs' binding; I would do this:

class IFolderACLDM(storage.IDataManager):
     #...other stuff
     listACLs = Interface.Attribute("an IIMAPFolderACLQuery instance")

ACLs = binding.bindTo(IFolderACLDM)

and then simply refer to 'self.ACLs.listACLs[folder]', which is close 
enough to the Law of Demeter for me.  If you want to be particular, or 
really can't afford that extra attribute lookup on every call, you can just 
add:

listFolderACLs = binding.bindTo('ACLs/listACLs')

and then do as you have done from there.  Notice here that the path is 
local to the object, so a reader can understand this code 
locally.  Probably it should really be './ACLs/listACLs' to make it crystal 
clear that it's a local reference.  I'm usually just too lazy to type those 
extra two keystrokes when the reference is within a few lines of the 
binding for the named attribute...

* Nice style on the 'stateForMessage()' method; by that I mean giving the 
DM a common way to extract an object's state, so that the method can be 
shared by query DM's that collaborate with it.  I'll have to remember that 
trick to use myself!  Maybe even write it up as a standard technique.

* You might want to create a "TxnUnsafeDM" base class (or maybe I should 
add one to PEAK!) to use as a base for your EntityDM's.  The idea would be 
that it would raise an error on transaction abort if 'saved' were 
non-empty, as we discussed on the list earlier this week.  Right now, if a 
transaction rollback is attempted following a flush() or commit attempt, 
there will be no indication that the transaction partially committed.

* Speaking of flush(), I don't recall seeing any of your IMAP DM's calling 
flush() on the EntityDM they get their records for during _load().  This 
means that if a user (for example) adds a folder during a transaction, and 
then looks at the parent folder's child list, they will not see the new 
folder unless the parent's folder list had been viewed before the add 
occurred.  In general, it's best to have QueryDM's flush their 
corresponding EntityDM's before executing a query against external state.

* I notice that overall you've emulated the style of PEAK itself, with 
regards to naming and structuring conventions.  Please make sure that this 
is appropriate for your application; I'm not saying it's not, mind you, but 
I've done little application development with PEAK as yet and don't know 
how *I* feel about using the same style.  I expect that when I do 
application code, for example, I will be more likely to used MixedCaseNames 
for modules, because brevity will be less important than clarity for an app 
versus a framework.  I also expect to use flatter package hierarchy than in 
PEAK, because most apps aren't as big as PEAK!  And I expect there to be 
many other such style differences, either more or less subtle than the 
preceding.  So, I encourage everyone to be diverse in their styles, so that 
people don't get the impression that if you use PEAK, your code's got to 
look like PEAK.  :)

>message-listing of a folder is a QueryLink to a MessageListQuery
>that retrieves the ids of the messages in this folder.
>
>i tried 2 implementations:
>
>  1. list only the ids and initialize the PersistentQuery with
>     [ dm[id] for id in message_ids ]
>
>  2. list the ids and fetch all headers and then initialize the
>     Persistent Query with
>     [dm.preloadState(<oid>, <somestate>) for xx in message_ids ]
>
>  the first solution gives back a list of message-objects fast
>  and allowes using slices of the list (without knowing the content)
>  but it is fairly slow because it needs to contact the imap-server
>  for every single mail to fetch the header-info (dm._load(xxx))
>
>  the second solution needs some time to get loaded but has
>  preloaded objects with message-headers ready for listing.
>  the second solution is good for mailboxes with up to 1000 messages,
>  the first solution is better for mailboxes with more messages.
>
>  possible solution: create a policy that switches the implementation
>  based on the message-count.

That's perfectly acceptable, as is the use of a configuration property to 
set the threshhold count, so that users can easily change it very high or 
very low if it doesn't match their expectations...

>----------------------------------------------------------------
>SIEVE (a mail-filtering-language for cyrus-imap):
>- NamingProvider for SIEVEConnections
>- SIEVE Elements (SieveScript) -> Model
>- Datamangers (EntityDM's for access, QueryDM for scriptlisting
>               LazyLoader for Script-Download)
>
>----------------------------------------------------------------
>LDAP-Account (work has just started and is not yet ready):
>- Basic Model (User)
>- UserDM for load/save/new of User-Objects.
>
>open questions:
>(a lot) but the most important:
>
>what is the oid for the ldap-stored objects -> the DN ??
>the DN is the only unique identifier for objects in ldap.

The only time I'd use anything *other* than the DN as an LDAP oid, is if I 
had a restricted set of types of objects I was retrieving, and I didn't 
need to reference anything polymorphically (i.e., without knowing its type, 
and therefore what keys to retrieve it by).

Note that although DN is the only thing guaranteed by LDAP itself to be 
unique, it's perfectly acceptable to declare restrictions on the schema 
required by a particular application.  For example, I have an app that uses 
an attribute in LDAP that is required to be unique.  (The LDAP server has 
an extension that enforces it, but it could also be enforced by the 
applications using the server.)

>if we use the DN, we need a way to build the DN when creating
>objects. we thought of using a similar way as in our IMAP solution:
>e.g. an User-Object should be stored in a OrganizationalUnit,
>then we would for example do this:
><begin> ...
>ou = ouDM['ou=people,dc=net-labs,dc=dev']
>user = userDM.newItem()
>user.cn = 'Some User'
>user.sn = 'User'
>user.givenName = 'Some'
>ou.addChild(user)
><commit> ...
>
>the newly created user-object would be stored using the following _p_oid:
>def _new(self, ob)
>
>    oid = 'cn=%s, %s' % (ob.cn, self.ouDM.oidFor(ob.parent))
>
>where ou.child is a model.Collection
>and user.parent is a model.Attribute with proper referencedEnd declarations.
>
>and there comes another question what should "def _thunk(self, ob)" do ??
>
>if i would say in the above example:
>
>    oid = 'cn=%s, %s' % (ob.cn, self.oidFor(ob.parent))
>                                ^^^^^^^^^^
>then i need to implement the userDM._thunk(self, ob) method.
>
>any hints for this one ??

Your _thunk() should check whether the object's DM is of the right type 
(i.e. supports a compatible interface) and that it is using the same LDAP 
connection, and if so, return the object's _p_oid.  If not, it should 
complain of a cross-database reference error.  (Unless you want to 
implement referrals, in which case _thunk() should search the LDAP db for a 
referral to the target object, and return the referral's DN, or if not 
present, create a referral entry in the local LDAP db, and point it to the 
other object.  Does that make sense?  _thunk() is a hook for doing 
cross-database references.  In this case, you don't really want or need a 
cross database reference, so all you want to do is verify that this isn't 
really a cross-database reference, and that it's therefore acceptable to 
use the same oid.

Note that you could also get away from this issue by using a single LDAP_DM 
which looked at the 'objectclass' field of records it retrieved, in order 
to determine what class to assign them.  This is the route I plan to go 
with my own apps, because I need the ability to reference objects by DN 
regardless of type, in order to implement certain LDAP features (e.g. the 
'seeAlso' field, which doesn't tell you what kind of object you're supposed 
to "see also") and also to support cross-db references from relational 
databases pointing to LDAP objects.

Anyway, to take that approach, just override the '_ghost()' method, and if 
the 'state' argument isn't supplied, load the 'objectclass' 
attribute.  Then return an instance of the appropriate type, without 
loading its state unless it was supplied by the caller.  Voila.

If you want to have DM's that specialize in a particular application-domain 
type, just use a FacadeDM that adapts its keys to the target.  For example, 
you could have a FacadeDM for users that looks them up by login name, by 
doing a query and returning the LDAP_DM.preloadState() of the found 
data.  The object's oid is still its DN, of course, and so can be used to 
key into the LDAP_DM in future.

Anyway, in this scenario there's no need for _thunk() because all the 
'oidFor()' calls go to the same DM that's asking for the oid.

>i'm pretty impressed that a non trivial thing like accessing
>IMAP in a nice python-object-tree is fairly simple using
>peak.storage and it seems to have a really good performance
>due to its load-on-demand design.
>
>i like it very much :)))

Thank you for having the courage to "eat my dogfood", so to speak, 
especially on the rapidly-changing peak.model stuff.  (At least I am almost 
done the refactoring!)

I hope to start work on a detailed 'peak.model' tutorial soon, using 
'graphviz' as a basis for a long-running example, starting with a simple 
model of 'Node', 'Edge' and 'Graph' classes, building up to having 
'Styles', using model.Enumerations for things like the colors, shapes, 
arrowheads, etc.  The ultimate goal will be to have an example app that 
reads a UML model and generates an inheritance diagram or dependency 
diagram or something like that.  Intermediate results will be things like a 
similar diagram generated from Python classes, and maybe even from the 
application's own domain model!

I think I'll be able to find lots of motivating examples to show framework 
extension techniques like adding domain-specific metadata to classes, 
defining specialized types (like my OddNumber example way up earlier in 
this e-mail), and so on.  And, I'd like to be able to use the resulting 
library for some things myself.  :)

I can already see a lot of how it will end up in the complex version, but 
if I build it up from very small parts, I'll get to share my design thought 
process in the tutorial, showing why you might do things different ways 
with different PEAK capabilities.  In the process, I expect the tutorial 
will also showcase selective use of binding, naming, config, and even 
'util' features like 'IndentedStream', but the main focus will be on 
'peak.model'.  The discussions we've had lately on the mailing list, and 
the work I've been doing with the metamodels package and the model package 
refactoring, have made it clear that there's some very good stuff here but 
if I don't do a *lot* of good quality documentation, people will barely 
scratch the surface of its capabilities.