[TransWarp] Input wanted: Streams, Factories, sessions, etc.
Phillip J. Eby
pje at telecommunity.com
Sun Nov 17 15:53:41 EST 2002
Ty and I have been batting around the idea of representing files and
network connections as "stream factories". Specifically, a stream factory
is an object with an 'open()' method (and possibly others) for manipulating
the referenced stream/file/connection. Examples of other methods might
include exists(), stat(), etc. The open() method would return a normal
"file-like" object, possibly an actual file, for reading and writing.
What good is this? Well, it allows for some interesting things like being
able to manipulate zipfile contents as if they were real files, or maybe
having transactional streams which actually write to a (locked) temporary
file and then rename themselves to overwrite the original file at
transaction commit, and so on. It could also be used to represent
os.popen() targets.
A similar pattern is needed for certain types of network connections. For
example, although right now our SMTP URL factory returns an open SMTP
connection, it really should return some kind of factory with a method to
open a connection or session. This also makes sense for things like HTTP
connections (after all, suppose you want to do a POST instead of a GET, or
want to control the headers?).
In addition, it seems to make sense for almost any sort of messaging API -
have a more or less static reference (via naming.lookup()) to an object
that lets you "tear off" sessions, similar to the way ManagedConnection
objects let you "tear off" cursors to perform queries.
So, what should the actual interface be. For file-like objects, it seems
it should include:
open(mode='r',bufsize=0)
exists()
isfile()
islink()
isdir()
stat()
mimeType(), guessType()...?
It doesn't seem to make sense to include the ability to delete the file, or
rename it, since those are properly functions of the file's container (e.g.
a naming context). We'll probably need a helper class or functions to
parse a file mode string, so that file-like objects that aren't really
files will have a consistent interpretation of mode strings.
For file:// URLs, we could implement this interface directly on the URL
class, since instances have all the information needed (i.e. the
filename!). HTTP URLs could implement an 'r' open mode as a simple GET,
optionally using other modes to have more control over headers sent,
etc. In theory, FTP URLs could interpret 'w' as returning the data
connection over which the data upload should be sent. Anyway, it seems
that URLs in general can and should be their own stream/connection
factories if there's a need to have one. This would allow us to have some
default object/state factories that would use this stream factory interface
to load or save objects in a naming context.
One of the big questions is where to put the interface definitions
themselves, though. They don't quite fit under any of the ideas of
binding, naming, config, or even really storage! Perhaps Ty's idea of
'peak.networking' might make more sense, although even there it's an odd
one out when it comes to files. The interfaces also don't seem ubiquitous
enough to deserve placement in peak.api.
For things like SMTP and other messaging interfaces, open() seems wrong,
since you don't really want a stream. (Or do you?) Perhaps it would make
more sense to call a 'session()' method for such kinds of objects, which
returns a session object that supports 'open()', but that open() would take
a different set of parameters than the usual. For example, a SMTP
session's open() could take all the parameters of smtplib's 'sendmail()'
method, except for the actual message.
Okay, let's take a use case and see how it works in a short script:
from peak.api import *
storage.beginTransaction()
s = naming.lookup('smtp://some.where').session().open(
'me at nowhere',['you at somewhere']
)
print >>s, "From: me at nowhere"
print >>s, "To: you at somewhere"
print >>s, "Subject: test"
print >>s
print >>s, "Here's my test e-mail"
storage.commitTransaction()
# s is closed by the transaction, so writing to it
# past this point causes an error
Interestingly, a session is rather like a ManagedConnection, in that it
needs to keep track of its cursors (streams). Unlike a managed connection,
it needs to do so to keep you from trying to send two e-mails on the
session at the same time, or else to keep a connection pool and
automatically handle it behind the scenes (which would definitely be YAGNI
for us right now).
So, here are some of the open questions...
* Where should the interfaces for these ideas live? (If all else fails, I
suppose peak.naming.interfaces would be okay, since you'll mainly *get*
instances of these factories via the naming system.)
* Should all messaging services (such as e-mail) be transactional? (My
inclination is yes, for data integrity, Ty's inclination is no, for
simplicity.)
* Should an explicit close() operation be required for a stream's content
to be valid? (My inclination is yes, if the resource isn't controlled by a
transaction, but no if it is controlled.)
* Are there any parameters in common for session() methods across different
kinds of systems (e.g. e-mail, spread, ...)? Are there other things this
could be used for?
Your thoughts and suggestions, on these questions or anything else, are
appreciated.
More information about the PEAK
mailing list