The PEAK Developers' Center   PythonEggs UserPreferences
 
HelpContents Search Diffs Info Edit Subscribe XML Print View
Version as of 2005-05-24 10:44:06

Clear message


Python Eggs

Overview

"Eggs are to Pythons as Jars are to Java..."

Python Eggs are zipfiles using the .egg extension, that support including data and C extensions as well as Python code. They can be used with Python 2.3 and up, and can be built using the setuptools package (see http://cvs.sf.net/viewcvs.py/python/python/nondist/sandbox/setuptools/ for source code). Once the implementation is far enough along, we plan to propose it for inclusion with the standard library beginning in Python 2.5.

The primary benefits of Python Eggs are:

  • They are a "zero installation" format for a Python package; no build or install step is required, just put them on PYTHONPATH or sys.path and use them
  • They can include package metadata, such as the other eggs they depend on
  • They allow "namespace packages" (packages that just contain other packages) to be split into separate distributions (e.g. zope.*, twisted.*, peak.* packages can be distributed as separate eggs, unlike normal packages which must always be placed under the same parent directory. This allows what are now huge monolithic packages to be distributed as separate components.)
  • They allow applications or libraries to specify the needed version of a library, so that you can e.g. require("Twisted-Internet>=2.0") before doing an import twisted.internet.

There are also other benefits that may come from having a standardized format, similar to the benefits of Java's "jar" format.

Using Eggs

If you have a pure-Python egg that doesn't use any in-package data files, and you don't mind manually placing it on sys.path or PYTHONPATH, you can use the egg without installing setuptools. For eggs containg C extensions, however, or those that need access to non-Python data files contained in the egg, you'll need the pkg_resources module from setuptools installed. You can do this either by installing setuptools, or by copying the pkg_resources module to an appropriate directory.

In addition to providing runtime support for using eggs containing C extensions or data files, the pkg_resources module also provides an API for automatically locating eggs and their dependencies and adding them to sys.path at runtime. With this support, you can install and keep multiple versions of the same package on your system, with the right version automatically being selected at runtime. Plus, if an egg has a dependency that can't be met, the runtime will raise a DistributionNotFound error that says what package and version is needed.

By the way, in case you're wondering how you can tell a "pure" (all-Python) egg from one with C extensions, the difference is that eggs containing C extensions will have their target platform's name at the end of the filename, just before the .egg. "Pure" eggs are (in principle) platform-indepenent, and have no platform name. If you're using the pkg_resources runtime to find eggs for you, it will ignore any eggs that it can tell are not usable on your platform or Python version. If you're not using the runtime, you'll have to make sure that you use only compatible eggs.

Once you have the runtime installed, you need to get your desired egg(s) on to sys.path. You can do this manually, by placing them in the PYTHONPATH environment variable, or you can add them directly to sys.path in code. This approach doesn't scale well, however, because as you need additional eggs, you'll be managing a longer and longer PYTHONPATH or sys.path by hand. Not only that, but you'll have to manually keep track of all the eggs needed by the eggs you're using! Luckily, there is a better way to do it.

Automatic Discovery

The better way to manage your eggs is to place them in a directory that's already on sys.path, such as site-packages, or the directory that your application's main script is in, or a directory that you'll be adding to PYTHONPATH or sys.path. Then, before attempting to import from any eggs, use a snippet of code like this:

from pkg_resources import require
require("FooBar>=1.2")

This will search all sys.path directories for an egg named "FooBar" whose release version is 1.2 or higher, and it will automatically add the newest matching version to sys.path for you, along with any eggs that the FooBar egg needs. (A note about versions: the egg runtime system understands typical version numbering schemes, so it knows that versions like "1.2a1" and "1.2rc5" are actually older than the plain version "1.2", but it also knows that versions like "1.2p1" or "1.2-1" are newer than "1.2".)

You can specify more than one requirement when calling require(), and you can also specify more complex version requirements, like require("FooBar>=1.2", "Thingy>1.0,!=1.5,<2.0a3,==2.1,>=2.3"). Requirement strings basically consist of a distribution name, an optional list of "options" (more on this in a moment), and a comma-separated list of zero or more version conditions. Version conditions basically specify ranges of valid versions, using comparison operators. The version conditions you supply are sorted into ascending version order, and then scanned left to right until the package's version falls between a pair of > or >= and < or <= conditions, or exactly matches a == or != condition.

Note, by the way, that it's perfectly valid to have no version conditions; if you can use any version of "FooBar", for example, you can just require("FooBar"). Distribution names are also case-insensitive, so require("foobar") would also work, but for clarity's sake we recommend using the same spelling as the package's author.

Some eggs may also offer "options" - optional features that, if used, will need other eggs to be located and added to sys.path. You can specify zero or more options that you wish to use, by placing a comma-separated list in square brackets just after the requested distribution name. For example, the "FooBarWeb" web framework might offer optional FastCGI support. When you require("FooBarWeb[FastCGI]>=1.0"), the additional eggs needed to support the FastCGI option will also be added to sys.path. (Or, if one of them isn't found, a pkg_resources.DistributionNotFound error will be raised, identifying what dependency couldn't be satisfied.)

To find out what options an egg offers, you should consult its documentation, or unpack and read its EGG-INFO/depends.txt file, which lists an egg's required and optional dependencies. For more on the format of depends.txt, see the Declaring Dependencies section below.

(Note: the pkg_resources module does not automatically look for eggs on PyPI or download them from anywhere; any needed eggs must already be available in a directory on sys.path, or require() will raise a DependencyNotFound error. You can of course trap this error in your code and attempt to find the needed eggs on PyPI or elsewhere. But, if you want your application to support automated downloads, a better approach is to create a subclass of the AvailableDistributions class in pkg_resources and override its obtain() method to do the desired searching and downloading. See the source code of the require() function for how to use an AvailableDistributions object to resolve dependencies.)

Building Eggs

To build an egg from a package's setup.py, you'll need to have setuptools installed. Just check it out of Python's CVS sandbox and run setup.py install to install it (assuming you haven't already installed it in order to use the pkg_resources runtime). Now you're ready to build eggs.

Edit the target package's setup.py and add from setuptools import setup such that it replaces the existing import of the setup function. Then run setup.py bdist_egg.

That's it. A .egg file will be deposited in the dist directory, ready for use. If you want to add any special metadata files, you can do so in the SomePackage.egg-info directory that bdist_egg creates. ("SomePackage" will of course be replacd by the name of the package you're building.) Any files placed in this directory are copied to an EGG-INFO directory within the egg file, for use at runtime. There are a handful of special filenames that the egg runtime system understands, like eager_resources.txt and depends.txt, both of which we'll cover in a moment. Other metadata files are automatically generated for you, such as native_libs.txt (a list of C extensions, if any) and PKG-INFO (descriptive information about the package). Do not edit these files, as the next time you run bdist_egg they will be overwritten with the automatically generated versions.

If you want to build eggs from other people's packages (who don't import from setuptools), then in Python 2.4 and higher you can do:

python setup.py --command-packages=setuptools.command bdist_egg

If you're using Python 2.3, however (and eggs don't work with versions less than 2.3), you have to copy setuptools/command/bdist_egg.py into the distutils/command/ directory of your Python installation (e.g. in lib/python2.3 or \Python23\Lib).

Note: packages that expect to find data files in their package directories, but which do not use either the PEP 302 API or the pkg_resources API to find them will not work when packaged as .egg files. One way you can check for this is if the .egg file contains data files, and the package is using __file__ to find them. You'll need to then patch the package so it uses pkg_resources.resource_filename() or one of the other resource_* APIs instead of __file__. (TODO: forward reference here to API intro section under Developing Eggs)

Also, some packages (e.g. wxPython) may include dynamic link libraries other than Python extensions. If this is the case, you'll need to create an eager_resources.txt file in the .egg-info directory that lists the in-zip paths to these libraries, one per line. This will let the runtime system know that those files need to be unpacked if any of the extensions are used. Thus, when attempting to import an extension, the runtime will also unpack all the dynamic link libraries that go with it.

(Note: if you still can't get the library to work as an .egg file after trying the above tactics, please report your problem on the distutils-Sig mailing list. Thanks.)

Declaring Dependencies

Some eggs need other eggs to function. However, there isn't always a meaningful place for a library to call require(), and in any case a library's source code is rarely the place to declare its version dependencies. So setuptools allows you to create a depends.txt file that can be bundled inside the .egg file's metadata directory, and which will be used by the egg runtime to automatically locate the egg's dependencies and add them to sys.path whenever the egg is needed by a require() call.

To create this file, you'll need to place it in the SomePackage.egg-info directory that the bdist_egg command creates. The format is fairly simple; here's a heavily-commented example:

# A sample "MyPackageName.egg-info/depends.txt" file
# Blank lines and lines beginning with "#" are ignored

# Lines at the beginning of the file specify the package's minimum
# requirements, and line-end comments are allowed:

FooBar >= 1.2   # a 'require()'-style dependency


# Here, we specify a more restricted set of versions for another
# package -- just the ones we've tested with this package.  Notice 
# that if a requirement is too long to fit on a line, you can use "\"
# to continue it, as long as the split is between version conditions.
# Notice also that arbitrary whitespace is allowed between tokens:

BazSpam ==1.1, ==1.2, ==1.3, ==1.4, ==1.5, \
        ==1.6, ==1.7


[FastCGI]
# A line with a name in square brackets defines an "option"
# The lines that follow, specify what other eggs are needed
# if this optional feature is requested.

fcgiapp>=0.1
FastCGITools>=2.1


[reST]
# Here's another optional feature: reStructuredText processing.  We
# need docutils to support that.
docutils >= 0.3

Of course, you don't have to comment your dependencies and features so thoroughly or use so much whitespace if you don't want to. Here's a minimal rendering of the same example, with exactly the same semantics:

FooBar>=1.2
BazSpam==1.1,==1.2,==1.3,==1.4,==1.5,==1.6,==1.7
[FastCGI]
fcgiapp>=0.1
FastCGITools>=2.1
[reST]
docutils>=0.3

But, as you can see, a little whitespace and commenting goes a long way towards other people understanding what your dependencies are. At the least, a blank line before option definitions makes them easier to find and read.

Currently, the depends.txt file is not syntax-checked by bdist_egg, so if you make a mistake you won't find out about it until you try to use the egg. (This will be fixed in a later version.)

Developing Eggs

TODO: explain how to use SomePackage.egg-info dirs to work with packages that are already on sys.path/PYTHONPATH

TODO: introduce pkg_resources APIs for accessing data files

Implementation Status

Egg Builder (bdist_egg command)
  • (DONE) format suitable for use with zipimport if no resource access (including extensions) is needed
  • (DONE) add required EGG-INFO metadata directory for storing info about the egg itself
  • (DONE) include PKG-INFO metadata file in the EGG-INFO
  • (DONE) Hand-made EGG-INFO files can be placed in an PackageName.egg-info directory of the distribution
  • (DONE) By default, all .so/.dll/.pyd files are "eager" (listed in EGG-INFO/native_libs.txt)
  • (DONE) build process generates .py/.pyc/.pyo stubs to load extensions via pkg_resources.resource_filename() calls
  • Syntax-check manually-supplied metadata (depends.txt, eager_resources.txt, and namespace_pacakges) to avoid creating "bad eggs".
  • Need a way to specify features to be excluded from egg distribution
  • needs option to not include source code in .egg (default is to include source, for IDE's, debugging, etc.)
Resource Manager Open Issues
  • (DONE) Egg-info can include list of "eager resources" which must be extracted all-at-once if any of them are extracted (listed in EGG-INFO/eager_resources.txt)
  • Cleanup on Windows doesn't work, because the .pyd's remain in use as long as Python is still running. An application that really wants to clean up on exit can presumably spawn another process to do something about it, but that kind of sucks. (Michael Dubner suggested using HKEY_LOCAL_MACHINE\\SYSTEM\\CurrentControlSet\\Control\\Session Manager\\FileRenameOperations to fix the Windows problem, but unfortunately didn't give any details or sample code.)
  • Default cache location: os.expanduser() is broken on Windows, as it can result in a directory that isn't actually writable. May need to use registry info to fix this, or default to a temporary directory on Windows. At least an application can choose a fixed location if it wants to ensure future executions will be fast. (Unix home directories aren't necessarily writable either, so maybe a temporary directory really is the right default, or perhaps try to create the home first and fall back to a temporary if the directory isn't writable.)
  • The extraction process needs to be protected by a thread lock and a process lock (i.e., prevent collisions both inside and outside the current process)
  • The extract process treats the file's timestamp in the zipfile as "local" time with "unknown" DST. It's theoretically possible that a DST change could cause the system to think that the file timestamp no longer matches the zip timestamp. Also, the resulting Unix-style timestamp for the extracted file may differ between systems with different timezones. This is an unfortunate side effect of the fact that the zip file format does not include timezone information or a UTC timestamp.
  • Fix home directory location on Windows
  • Allow listing or globbing of resources, or unpacking resource directories (rather than just individual files, which are all that's currently supported).
Namespace Packages
  • Need to implement hook in Distribution.install_on() to fix up existing NS packages, and to flag namespace packages identified from distribution metadata (e.g. EGG-INFO/namespace_packages.txt).
  • Need namespace_package() API in pkg_resources
Dependency/Discovery
  • (DONE) Given a sys.path entry, iterate over available distributions, either .egg directories, .egg files, or distributions identified by .egg-info directories.
  • (DONE) Version spec syntax (needs to support specifying "extras" in order to implement optional dependencies)
  • (DONE) require() API call to automatically resolve dependencies and add them to sys.path
  • (DONE) EGG-INFO.in/depends.txt (feature->requirements mapping specifying distribution+version)
  • Add support for extending the distribution finder for non-filesystem sys.path entries (by adapting PEP 302 importer objects)

Older Stuff

The stuff from here on down is fairly out of date; it's just here to make sure we didn't forget anything once the current runtime refactoring is complete.

Package Resource API

The following API routines will be available in the pkg_resources module as module-level functions:

declare_namespace(name)
Declare that the dotted package name name is a "namespace package" whose contained packages and modules may be spread across multiple distributions. The named package's __path__ will be extended to include the corresponding package in all active distributions that contain a package of that name. (More precisely, if an importer's find_module(name) returns a loader, then it will also be searched for the package's contents.) Whenever a Distribution's require() is invoked, it checks for the presence of namespace packages and updates their __path__ contents accordingly. A distribution is "active" if its require() has been invoked, or if it is present on sys.path.
iter_distributions(name=None,path=None)
Searching the list of locations specified by path, yield Distribution instances whose names match name. If name is None, all recognized distributions are yielded. If path is None, the resource manager's default path is searched. If the resource manager has no default path, sys.path is searched. Distribution objects yielded by this routine may be added to sys.metapath in order to make them accessible for importing, as they are PEP 302-compatible "importer" objects.
require(name,version_info=None,path=None)
Ensure that the named distribution (matching version_info if specified) is present on sys.meta_path. The path argument is the same as for iter_distributions(). XXX define version-info format!
resource_string(package_name,resource_name)
Return the named resource as a binary string.
resource_stream(package_name,resource_name,mode='b')
Open the named resource as a file-like object, using the specified mode ('t', 'b', or 'U'). (Note that this does not necessarily return an actual file; if you need a fileno() or an actual operating system file, you should use resource_filename() instead.)
resource_filename(package_name,resource_name)
Return a platform file or directory name for the named resource. If the package is in an egg distribution, the resource will be unpacked before the filename is returned. If the named resource is a directory, the entire directory's contents will be extracted before the directory name is returned. Also, if the named resource is an "eager" resource such as a Python extension or shared library, then all "eager" resources will be extracted before the resource's filename is returned. (This is to ensure that shared libraries that link to other included libraries will have their dependencies available before loading.)
set_extraction_path(path)
Set the base path where resources will be extracted to. If not set, this defaults to os.expanduser("~/.python-eggs"). Resources are extracted to subdirectories of this path, named for the corresponding .egg file. You may set this to a temporary directory, but then you must call cleanup_resources() to delete the extracted files when done. (Note: you may not change the extraction path for a given resource manager once resources have been extracted, unless you first call cleanup_resources().)
cleanup_resources(force=False)
Delete all extracted resource files and directories, returning a list of the file and directory names that could not be successfully removed. This function does not have any concurrency protection, so it should generally only be called when the extraction path is a temporary directory exclusive to a single process. This method is not automatically called; you must call it explicitly or register it as an atexit function if you wish to ensure cleanup of a temporary directory used for extractions.

Distribution Objects

These need name, version, python version, and a metadata API, as well as PEP 302 "importer" methods.

require()
Add an appropriate entry to the beginning of sys.meta_path for this Distribution if not present.
get_platform_info()
Return the platform information for this Distribution as a str or None if not present.
get_version_object()
Return the version information for this Distribution as a distutils.version StrictVersion or LooseVersion or None if not present.
get_python_version()
Return the Python version for this Distribution as a str containing the major Python version (i.e. "2.3") or None if not present.
get_name()
Return the name of this Distribution without any platform or version information.
get_archive_name()
Return the name of this Distribution with all platform and version information.
get_distdata(filename, default=NotGiven)
Like PEP 302's get_data, this returns the data for the specified path. This may be used to retrieve distribution metadata (such as EGG-INFO/native_libs.txt). Unlike PEP 302, this method permits you to specify a default value to be returned if the resource is not present in the distribution archive. This is for convenience in retrieving optional metadata files, such as are contained in archives' EGG-INFO files.
get_resource_path(module, package_name, resource_name)

For a module that this Distribution controls, return the path for the given resource. Typically implemented as:

return os.path.join(os.path.dirname(module.__file__), resource_name)
get_resource_stream(module, package_name, resource_name, mode='rb')

Open the path as a file-like object, using the specified mode ('t', 'b', or 'U').

(Note that this does not necessarily return an actual file; if you need a fileno() or an actual operating system file, you should use get_filename() instead.)

get_resource_filename(module, package_name, resource_name)

Return an openable file or directory path for the given filename. If the filename is not already on the filesystem, it will be extracted to a file allocated by the ResourceManager. If the filename represents a directory, the ResourceManager will allocate files for the entire tree.

The Distribution may allocate more files than requested during this call. A trivial implementation could extract the entire archive's contents, and a complex implementation may decide to extract all object code at the same time (e.g. dll, pyd, so, etc.). Resources that are extracted before they are requested are termed "eager" resources.

get_resource_string(module, package_name, resource_name)
Return the named resource as a binary string.
find_module(fullname, path=None)

This method will be called with the fully qualified name of the module. If the importer is installed on sys.meta_path, it will receive a second argument, which is None for a top-level module, or package.__path__ for submodules or subpackages. It should return a loader object if the module was found, or None if it wasn't. If find_module() raises an exception, it will be propagated to the caller, aborting the import.

See: PEP 302 Importer Protocol

load_module(fullname)

This method returns the loaded module.

See: PEP 302 Loader Protocol

is_package(fullname)
See: PEP 302 Optional Extensions to the Importer Protocol
get_code(fullname)
See: PEP 302 Optional Extensions to the Importer Protocol
get_source(fullname)
See: PEP 302 Optional Extensions to the Importer Protocol
get_data(filename)
See: PEP 302 Optional Extensions to the Importer Protocol

PythonPowered
EditText of this page (last modified 2005-05-24 10:44:06)
FindPage by browsing, title search , text search or an index
Or try one of these actions: AttachFile, DeletePage, LikePages, LocalSiteMap, SpellCheck