NOTE: If all you want to do is install a project distributed as an .egg file, head straight to the Easy Install page. EasyInstall makes installing Python code as easy as typing easy_install SomeProjectName. You don't need to read this page unless you want to know more about how eggs themselves work.
"Eggs are to Pythons as Jars are to Java..."
Python eggs are a way of bundling additional information with a Python project, that allows the project's dependencies to be checked and satisfied at runtime, as well as allowing projects to provide plugins for other projects. There are several binary formats that embody eggs, but the most common is '.egg' zipfile format, because it's a convenient one for distributing projects. All of the formats support including package-specific data, project-wide metadata, C extensions, and Python code.
The easiest way to install and use Python eggs is to use the "Easy Install" Python package manager, which will find, download, build, and install eggs for you; all you do is tell it the name (and optionally, version) of the Python project(s) you want to use.
Python eggs can be used with Python 2.3 and up, and can be built using the setuptools package (see the Python Subversion sandbox for source code, or the EasyInstall page for current installation instructions).
The primary benefits of Python Eggs are:
They enable tools like the "Easy Install" Python package manager
.egg files are a "zero installation" format for a Python package; no build or install step is required, just put them on PYTHONPATH or sys.path and use them (may require the runtime installed if C extensions or data files are used)
They can include package metadata, such as the other eggs they depend on
They allow "namespace packages" (packages that just contain other packages) to be split into separate distributions (e.g. zope.*, twisted.*, peak.* packages can be distributed as separate eggs, unlike normal packages which must always be placed under the same parent directory. This allows what are now huge monolithic packages to be distributed as separate components.)
They allow applications or libraries to specify the needed version of a library, so that you can e.g. require("Twisted-Internet>=2.0") before doing an import twisted.internet.
They're a great format for distributing extensions or plugins to extensible applications and frameworks (such as Trac, which uses eggs for plugins as of 0.9b1), because the egg runtime provides simple APIs to locate eggs and find their advertised entry points (similar to Eclipse's "extension point" concept).
There are also other benefits that may come from having a standardized format, similar to the benefits of Java's "jar" format.
If you have a pure-Python .egg file that doesn't use any in-package data files, and you don't mind manually placing it on sys.path or PYTHONPATH, you can use the egg without installing setuptools. For eggs containing C extensions, however, or those that need access to non-Python data files contained in the egg, you'll need the pkg_resources module from setuptools installed. For installation instructions, see the EasyInstall page.
In addition to providing runtime support for using eggs containing C extensions or data files, the pkg_resources module also provides an API for automatically locating eggs and their dependencies and adding them to sys.path at runtime. (See the API documentation and setuptools documentation for details.)
With this support, you can install and keep multiple versions of the same package on your system, with the right version automatically being selected at runtime. Plus, if an egg has a dependency that can't be met, the runtime will raise a DistributionNotFound error that says what package and version is needed.
By the way, in case you're wondering how you can tell a "pure" (all-Python) egg from one with C extensions, the difference is that eggs containing C extensions will have their target platform's name at the end of the filename, just before the .egg. "Pure" eggs are (in principle) platform-indepenent, and have no platform name. If you're using the pkg_resources runtime to find eggs for you, it will ignore any eggs that it can tell are not usable on your platform or Python version. If you're not using the runtime, you'll have to make sure that you use only compatible eggs.
Once you have the runtime installed, you need to get your desired egg(s) on to sys.path. You can do this manually, by placing them in the PYTHONPATH environment variable, or you can add them directly to sys.path in code. This approach doesn't scale well, however, because as you need additional eggs, you'll be managing a longer and longer PYTHONPATH or sys.path by hand. Not only that, but you'll have to manually keep track of all the eggs needed by the eggs you're using! Luckily, there is a better way to do it.
The better way to manage your eggs is to place them in a directory that's already on sys.path, such as site-packages, or the directory that your application's main script is in, or a directory that you'll be adding to PYTHONPATH or sys.path. Then, before attempting to import from any eggs, use a snippet of code like this:
from pkg_resources import require require("FooBar>=1.2")
This will search all sys.path directories for an egg named "FooBar" whose release version is 1.2 or higher, and it will automatically add the newest matching version to sys.path for you, along with any eggs that the FooBar egg needs. (A note about versions: the egg runtime system understands typical version numbering schemes, so it knows that versions like "1.2a1" and "1.2rc5" are actually older than the plain version "1.2", but it also knows that versions like "1.2p1" or "1.2-1" are newer than "1.2".)
You can specify more than one requirement when calling require(), and you can also specify more complex version requirements, like require("FooBar>=1.2", "Thingy>1.0,!=1.5,<2.0a3,==2.1,>=2.3"). Requirement strings basically consist of a distribution name, an optional list of "options" (more on this in a moment), and a comma-separated list of zero or more version conditions. Version conditions basically specify ranges of valid versions, using comparison operators. The version conditions you supply are sorted into ascending version order, and then scanned left to right until the package's version falls between a pair of > or >= and < or <= conditions, or exactly matches a == or != condition.
Note, by the way, that it's perfectly valid to have no version conditions; if you can use any version of "FooBar", for example, you can just require("FooBar"). Distribution names are also case-insensitive, so require("foobar") would also work, but for clarity's sake we recommend using the same spelling as the package's author.
Some eggs may also offer "extras" - optional features that, if used, will need other eggs to be located and added to sys.path. You can specify zero or more options that you wish to use, by placing a comma-separated list in square brackets just after the requested distribution name. For example, the "FooBarWeb" web framework might offer optional FastCGI support. When you require("FooBarWeb[FastCGI]>=1.0"), the additional eggs needed to support the FastCGI option will also be added to sys.path. (Or, if one of them isn't found, a pkg_resources.DistributionNotFound error will be raised, identifying what dependency couldn't be satisfied.)
To find out what options an egg offers, you should consult its documentation, or unpack and read its EGG-INFO/depends.txt file, which lists an egg's required and optional dependencies.
(Note: the pkg_resources module does not automatically look for eggs on PyPI or download them from anywhere; any needed eggs must already be available in a directory on sys.path, or require() will raise a DependencyNotFound error. You can of course trap this error in your code and attempt to find the needed eggs on PyPI or elsewhere. If you want to automatically install dependencies for a project you're working on, you should probably build it using setuptools, which lets you declare dependencies where they can be found by tools like EasyInstall. Setuptools is also needed in order to build eggs.)
To build an egg from a package's setup.py, you'll need to have setuptools installed. If you haven't already installed it in order to use the pkg_resources runtime, just check it out of Python's Subversion sandbox and run setup.py install to install it, or see the EasyInstall page's installation instructions (). Now you're ready to build eggs.
Edit the target package's setup.py and add from setuptools import setup such that it replaces the existing import of the setup function. Then run setup.py bdist_egg.
That's it. A .egg file will be deposited in the dist directory, ready for use. If you want to add any special metadata files, you can do so in the SomePackage.egg-info directory that bdist_egg creates. ("SomePackage" will of course be replacd by the name of the package you're building.) Any files placed in this directory are copied to an EGG-INFO directory within the egg file, for use at runtime. Other metadata files are automatically generated for you, so don't edit them, as the next time you run a setup command they may be overwritten with the automatically generated versions.
Note: packages that expect to find data files in their package directories, but which do not use either the PEP 302 API or the pkg_resources API to find them will not work when packaged as .egg files. One way you can check for this is if the .egg file contains data files, and the package is using __file__ to find them. You'll need to then patch the package so it uses pkg_resources.resource_filename() or one of the other resource_* APIs instead of __file__. See the section on Accessing Package Resources, below, for more information about updating packages to use the resource management API instead of __file__ manipulation.
Some eggs need other eggs to function. However, there isn't always a meaningful place for a library to call require(), and in any case a library's source code is rarely the place to declare its version dependencies. So setuptools allows you to declare dependencies in your project's setup script, so that they will be bundled inside the egg's metadata directory, and both the runtime and EasyInstall can then automatically find the additional eggs needed, adding them to sys.path when your project is installed or requested at runtime via require(). (Note: the EasyInstall program will find and download dependencies from the internet automatically, but for security reasons simply using require() in Python code does not do this. require() only locates eggs that are in directories on the local machine that are listed in sys.path)
For more information on declaring your project's dependencies, see the setuptools documentation.
Here are a few quick tips and techniques about developing software using eggs' features. These are just overviews, though, and you should dig into the complete setuptools documentation and API documentation manuals if there's not enough information here.
So far, we've only covered how to use eggs that have actually been installed, by building them with the distutils and then putting them in a directory on sys.path. (Note: EasyInstall can download source distributions, automatically build eggs from them, and install the eggs for you, with just a single command -- even if the package's author did nothing special to support Python Eggs. Check it out.)
But what if you are developing a package and working from source code? You don't want to have to rebuild the egg every time you make a change to the source code. But, you have code in your script or application that calls require() and expects the egg you're developing to be available. For example, see this question from Ian Bicking about working with packages checked out from subversion, but not built as eggs.
If you're using setuptools, the answer is simple: run "setup.py develop" to create a "source egg" - a special link to your project's source directory, combined with wrappers for your source scripts. See the setuptools documentation under "Develoment Mode" and also the "develop" commmand reference for more details.
Many modern Python packages depend on "resources" (data files) that are included with the package, typically placed within the package's subdirectory in a normal installation. Usually, such packages manipulate their modules' __file__ or __path__ attributes in order to locate and read these resources. For example, suppose that a module needs to access a "foo.config" file that's in its package directory. It might do something like:
foo_config = open(os.path.join(os.path.dirname(__file__),'foo.conf').read()
However, when code like this is packed inside a zipfile, it can no longer assume that __file__ or __path__ contain filenames or directory names, and so it will fail.
Packages that access resource files, and want to be usable inside a zipfile (such as a .egg file), then, must use the PEP 302 get_data() extension (see under "Optional Extensions to the Importer Protocol") before falling back to direct __file__ access.
Using this protocol can be complex, however, so the egg runtime system offers a convenient resource management API as an alternative. Here's our "foo_config" example, rewritten to use the pkg_resources API:
from pkg_resources import resource_string foo_config = resource_string(__name__, 'foo.conf')
Instead of manipulating __file__, you simply pass a module name or package name to resource_string, resource_stream, or resource_filename, along with the name of the resource. Normally, you should try to use resource_string or resource_stream, unless you are interfacing with code you don't control (especially C code) that absolutely must have a filename. The reason is that if you ask for a filename, and your package is packed into a zipfile, then the resource must be extracted to a temporary directory, which is a more costly operation than just returning a string or file-like object.
Note, by the way, that if your resources include subdirectories of their own, you must specify resource names using '/' as a path separator. The resource API will replace slashes with a platform-appropriate filename, if in fact filenames are being used (as opposed to e.g. zipfile contents). For more examples and information, see the setuptools documentation and the API documentation for pkg_resources.
The runtime implementation has been stable for some time now, and the EasyInstall package manager is now close to beta quality. The runtime still has a couple of minor issues, which should probably be in the official documentation:
- The extract process treats the file's timestamp in the zipfile as "local" time with "unknown" DST. It's theoretically possible that a DST change could cause the system to think that the file timestamp no longer matches the zip timestamp. Also, the resulting Unix-style timestamp for the extracted file may differ between systems with different timezones. This is an unfortunate side effect of the fact that the zip file format does not include timezone information or a UTC timestamp.
- Cleanup on Windows doesn't work, because the .pyd's remain in use as long as some Python process using them is still running. An application that really wants to clean up on exit can presumably spawn another process to do something about it, but that kind of sucks, and doesn't account for the fact that another process might still be using the file anyway. (Michael Dubner suggested using HKEY_LOCAL_MACHINE\\SYSTEM\\CurrentControlSet\\Control\\Session Manager\\FileRenameOperations to fix the Windows problem, but unfortunately didn't give any details or sample code.) Sample code at http://aspn.activestate.com/ASPN/docs/ActivePython/2.2/PyWin32/win32api__MoveFileEx_meth.html, but in practice we really can't do this without requiring the PyWin32 extensions, which isn't such a great idea. So, the simple solution is to just avoid doing cleanups, and instead stick with a persistent cache directory.