[TransWarp] Handling names, addresses, and rename operations in peak.naming

Wed Dec 4 00:22:59 EST 2002

Just some late-night thoughts sorting out some remaining conceptual issues 
in peak.naming...  If you have trouble following me, don't worry, I'm 
probably having trouble too; that's why I'm writing this.  :)

A "naming authority" is a system that knows how to resolve a name.  In an 
HTTP URL, the "http://xyz/" part of the URL is the naming authority.  The 
rest of the URL before the query string or fragment ID is the "name" 
portion of the URL.

SMTP URLs are addresses.  They denote only a naming authority, but no 
actual name within that authority.

LDAP URLs are names; they contain both a naming authority and a path.

A path doesn't have to be multi-part, even conceptually, to be considered a 
name - a flat namespace is still a namespace.

The grey area lies with things like our database URLs, which often refer to 
a database server as a naming authority, and then reference a 
database.  Should these be considered addresses, because they name a 
specific database, or names, because they are resolved by the naming authority?

Intuitively, I want to say that a DB URL is an address, because there is no 
indirection taking place.  Yet a "file:" URL seems to be a name, and there 
isn't necessarily any indirection there.  Of course, there *could* be 
indirection in a file system.

Perhaps the real question to whether something is a name or address is 
whether the naming authority by itself could be usefully considered a 
naming context.  Could you retrieve items from it?  Put new items in 
it?  But then that seems to make DB servers naming contexts, because it's 
actually pretty reasonable to think that you could add or remove databases 
from the database server, or even list the current databases, if you had 
appropriate access.

It seems that all names can be represented with a simple two-level 
structure.  The outer structure is a CompositeName, whose first element is 
an optional naming authority, specified by a URL up through any naming 
authority portions.  Subsequent elements are paths within each respective 
naming system.  Thus it's possible to describe a name such as:

ldap://somewhere.com/cn=thingy,ou=documents,o=somewhere.com/imagesDir/anImage.gif

Which would be represented as:

naming.CompositeName(
     [ ldapURL('ldap', '//somewhere.com/'),
       naming.CompoundName(
           [ 'o=somewhere.com', 'ou=documents', 'cn=thingy']
       ),
       naming.CompoundName(
           [ 'imagesDir', 'anImage.gif' ]
       ),
     ]
)

(Of course, this translation should be internal to the naming system; 
you'll never see this as a user, only possibly as an implementer of a 
namespace or maybe even only as a PEAK internals developer.)

Anyway, the idea of the name shown is that it's a composite name that goes 
through two naming systems: first LDAP, and then something else.  That 
something else might be HTTP, FTP, or maybe even the local 
filesystem.  Perhaps it's something else we haven't thought of 
yet.  There's a reference in the LDAP directory that tells us where to 
go.  No problem.

It seems to me that every context should know its full CompositeName.  For 
a lot of our trivial contexts, this will be something like 
'naming.CompositeName(['somescheme:']), and that's it.  That's 
okay.  Anyway, if every context knows its full name, then we can do some 
important things for "rename()" operations.  A rename needs to find a 
context whose full name is a prefix of both the origin and destination 
names.  The most reliable way to do this is to perform a "resolve()" on 
both names, and then look for the common prefix of the names of the 
contexts returned.  If it happens to be the same as one of the contexts, so 
much the better, otherwise we have to resolve the newly computed prefix 
again.  One constraint: the outer composite names must match in every 
element before the last, otherwise the source and target names are in 
different naming systems.  But we can't tell this until *after* we resolve 
them, because we don't know ahead of time where symlinks, referrals, 
cross-system thunks, and other goofy things might come into play.  What if 
our naming system supports traversing into zipfiles, for example?  Sheesh.

Okay, so it might be a little inefficient.  If you want fast renames, 
you'll just have to lookup a common context first, and not do something like:

naming.rename('a:veryverylongURL','a:veryveryveryverylongURL')

This of course assumes that we are going to add context-less functions for 
all the standard naming operations.  It certainly could be handy for things 
like:

naming.bind("file:///somefile", documentToSave)

and better, letting you implement a "Save as:" function in your editor that 
supports arbitrary ways to do so:

naming.bind("ftp://somehost/somefile", documentToSave)  # FTP upload
naming.bind("http://somehost/somefile", documentToSave) # WebDAV PUT

Or the piece de resistance...

naming.bind("file:///aZipFile.zip/someInternalPath/somefile", documentToSave)

Okay.  So what do we need to add to IBasicContext?  A method to get the 
"full name".  I guess we could call it the same thing they use in JNDI: 
getNameInNamespace().  So far so good.  We need a default implementation in 
AbstractContext and GenericURLContext.  Don't know how to do that 
yet.  None of our current context subclasses deal with hierarchy; none of 
them really represent a "place" as such; if you create different instances, 
they're really all the same "place".  So the default implementation I guess 
could just say it's at "myscheme:" as the first part of the composite name, 
and the rest would be empty.  I guess any new context we create that's 
hierarchical, will either need to pass into its children what their names 
are, and/or the children will need to look up to the parents.  But, the 
child would have to know what to add to the parent, so it's better for the 
parent to name the child.  This is generally how it works with humans, so 
the algorithm is known to work.  :)

All the object factory methods (e.g. address.retrieve(), IObjectFactory, 
etc.) receive both a name and the parent context, so they could actually 
assemble this information.  We even have a field already reserved for a 
component name to be passed to a child constructor, so we could use 
that.  It would need to be guaranteed to be a CompoundName, though, if we 
don't want a getComponentPath() operation to be confused.  Yecch.  Better 
to have a keyword argument - perhaps 'nameInContext'.  Yeah.  Now if every 
context carries a 'namingAuthority' as well, we can in principle compose a 
local name simply by walking backwards up our parents with the same naming 
authority, concatenating compoundNames as we go.  We want to use compound 
names at each level rather than a single element at each level, so that 
it's not necessary for a context to create a bunch of intermediate contexts 
when somebody looks up say, 'foo/bar/baz' in a file:// context.  We'd 
rather it simply create the 'foo/bar' context within itself.

Hm.  What's funny about this is, it suddenly seems as though there's no 
longer a reason to have multiple parts to an "outer" composite name, apart 
from a naming authority and a single compoundName path.  That's because if 
you cross over from one naming system to another, there's no point in 
retaining the path that carried you over.  A nice simplification.

So what features are needed to do this?  We need bindings for 
namingAuthority and nameInContext.  namingAuthority should be based off of 
the default URL scheme for the context class, or the applicable URL scheme 
for the GenericURLContext instance.  For a "nested" contexts, such as a 
zipfile context that lives in a file system, the context should use its 
container's namingAuthority.  nameInContext should default to an empty 
compound name.  When a context factory such as an IObjectFactory returns a 
subcontext, or if the context itself creates or returns a subcontext, it 
should pass in a 'nameInContext' parameter to the new context to tell it 
where it "lives" in relation to its parent.  getNameInNamespace() is then a 
simple upward walk, gathering compoundName objects as long as the 
namingAuthority remains the same.

The namingAuthority for hierarchical contexts in general should default to 
that of the parent.  It's only when a URL or other address mechanism is 
used to jumpstart a new naming system, that namingAuthority should be 
reset.  This suggests that URL classes used to retrieve a naming context 
should set namingAuthority when they retrieve the context.  This could be 
implemented in the _resolveURL() method of such a URL context.

Okay, new twist - what happens with "symlinks" or LinkRef()s within a 
naming system?  Simply concatenating names traversed could get quite 
silly.  Likewise if somebody goes to "foo/../bar/../baz/../spam".  We'd 
rather see the resulting context call itself "spam" instead of all the 
other garbage.

Alright, here's how to fix that.  We make namingAuthority "None" by 
default, and it's only set by URL retrieval or URL context factories -- or 
when we want to give an object an "absolute" name.  Now we implement 
getNameInNamespace() as an upward walk until a non-None namingAuthority is 
found.  Good; that makes a lot of cases simpler.  Now, children never have 
to mess with their own names in any way, either their parents or 
"godparents" (URL.retrieve()/object factory methods) give them their names.

Hm.  So what's left to figure out?  We need something to get a common 
prefix of two names.  But it's only actually needed for compound names, 
since we now know that our hypothetical "absolute composite name" only ever 
has a naming authority and a compound name.  If the authorities don't match 
between two names, you can't rename one to the other.  So we don't even 
need getNameInNameSpace() to return a true "name"; really it should just 
return an '(authority,compoundName)' tuple.

URL objects should support extracting a naming authority.  Naming 
authorities should hashable and comparable, and equivalent authorities 
should hash and compare equally.  (This implies that they must be in 
canonical form; e.g. default port numbers filled in, etc.)  Ideally, there 
should also be a way to go from a naming authority and a compound name, 
back to a URL.  The precise mechanics of this require further thought.

Tomorrow.  :)