The PEAK Developers' Center   PythonEggs UserPreferences
 
HelpContents Search Diffs Info Edit Subscribe XML Print View
Version as of 2005-06-04 17:21:17

Clear message


Python Eggs

Table of Contents

Overview

"Eggs are to Pythons as Jars are to Java..."

Python Eggs are zipfiles using the .egg extension, that support including data and C extensions as well as Python code. They can be used with Python 2.3 and up, and can be built using the setuptools package (see the Python CVS sandbox for source code, or the EasyInstall page for current installation instructions). Once the implementation is far enough along, we plan to propose it for inclusion with the standard library beginning in Python 2.5.

The primary benefits of Python Eggs are:

There are also other benefits that may come from having a standardized format, similar to the benefits of Java's "jar" format.

Using Eggs

If you have a pure-Python egg that doesn't use any in-package data files, and you don't mind manually placing it on sys.path or PYTHONPATH, you can use the egg without installing setuptools. For eggs containing C extensions, however, or those that need access to non-Python data files contained in the egg, you'll need the pkg_resources module from setuptools installed. You can do this either by installing setuptools (See the EasyInstall page for instructions), or by copying the pkg_resources module to an appropriate directory.

In addition to providing runtime support for using eggs containing C extensions or data files, the pkg_resources module also provides an API for automatically locating eggs and their dependencies and adding them to sys.path at runtime. With this support, you can install and keep multiple versions of the same package on your system, with the right version automatically being selected at runtime. Plus, if an egg has a dependency that can't be met, the runtime will raise a DistributionNotFound error that says what package and version is needed.

By the way, in case you're wondering how you can tell a "pure" (all-Python) egg from one with C extensions, the difference is that eggs containing C extensions will have their target platform's name at the end of the filename, just before the .egg. "Pure" eggs are (in principle) platform-indepenent, and have no platform name. If you're using the pkg_resources runtime to find eggs for you, it will ignore any eggs that it can tell are not usable on your platform or Python version. If you're not using the runtime, you'll have to make sure that you use only compatible eggs.

Once you have the runtime installed, you need to get your desired egg(s) on to sys.path. You can do this manually, by placing them in the PYTHONPATH environment variable, or you can add them directly to sys.path in code. This approach doesn't scale well, however, because as you need additional eggs, you'll be managing a longer and longer PYTHONPATH or sys.path by hand. Not only that, but you'll have to manually keep track of all the eggs needed by the eggs you're using! Luckily, there is a better way to do it.

Automatic Discovery

The better way to manage your eggs is to place them in a directory that's already on sys.path, such as site-packages, or the directory that your application's main script is in, or a directory that you'll be adding to PYTHONPATH or sys.path. Then, before attempting to import from any eggs, use a snippet of code like this:

from pkg_resources import require
require("FooBar>=1.2")

This will search all sys.path directories for an egg named "FooBar" whose release version is 1.2 or higher, and it will automatically add the newest matching version to sys.path for you, along with any eggs that the FooBar egg needs. (A note about versions: the egg runtime system understands typical version numbering schemes, so it knows that versions like "1.2a1" and "1.2rc5" are actually older than the plain version "1.2", but it also knows that versions like "1.2p1" or "1.2-1" are newer than "1.2".)

You can specify more than one requirement when calling require(), and you can also specify more complex version requirements, like require("FooBar>=1.2", "Thingy>1.0,!=1.5,<2.0a3,==2.1,>=2.3"). Requirement strings basically consist of a distribution name, an optional list of "options" (more on this in a moment), and a comma-separated list of zero or more version conditions. Version conditions basically specify ranges of valid versions, using comparison operators. The version conditions you supply are sorted into ascending version order, and then scanned left to right until the package's version falls between a pair of > or >= and < or <= conditions, or exactly matches a == or != condition.

Note, by the way, that it's perfectly valid to have no version conditions; if you can use any version of "FooBar", for example, you can just require("FooBar"). Distribution names are also case-insensitive, so require("foobar") would also work, but for clarity's sake we recommend using the same spelling as the package's author.

Some eggs may also offer "options" - optional features that, if used, will need other eggs to be located and added to sys.path. You can specify zero or more options that you wish to use, by placing a comma-separated list in square brackets just after the requested distribution name. For example, the "FooBarWeb" web framework might offer optional FastCGI support. When you require("FooBarWeb[FastCGI]>=1.0"), the additional eggs needed to support the FastCGI option will also be added to sys.path. (Or, if one of them isn't found, a pkg_resources.DistributionNotFound error will be raised, identifying what dependency couldn't be satisfied.)

To find out what options an egg offers, you should consult its documentation, or unpack and read its EGG-INFO/depends.txt file, which lists an egg's required and optional dependencies. For more on the format of depends.txt, see the Declaring Dependencies section below.

(Note: the pkg_resources module does not automatically look for eggs on PyPI or download them from anywhere; any needed eggs must already be available in a directory on sys.path, or require() will raise a DependencyNotFound error. You can of course trap this error in your code and attempt to find the needed eggs on PyPI or elsewhere. But, if you want your application to support automated downloads, a better approach is to create a subclass of the AvailableDistributions class in pkg_resources and override its obtain() method to do the desired searching and downloading. See the source code of the require() function for how to use an AvailableDistributions object to resolve dependencies.)

Building Eggs

To build an egg from a package's setup.py, you'll need to have setuptools installed. If you haven't already installed it in order to use the pkg_resources runtime, just check it out of Python's CVS sandbox and run setup.py install to install it, or see the EasyInstall page's installation instructions (). Now you're ready to build eggs.

Edit the target package's setup.py and add from setuptools import setup such that it replaces the existing import of the setup function. Then run setup.py bdist_egg.

That's it. A .egg file will be deposited in the dist directory, ready for use. If you want to add any special metadata files, you can do so in the SomePackage.egg-info directory that bdist_egg creates. ("SomePackage" will of course be replacd by the name of the package you're building.) Any files placed in this directory are copied to an EGG-INFO directory within the egg file, for use at runtime. There are a handful of special filenames that the egg runtime system understands, like eager_resources.txt and depends.txt, both of which we'll cover in a moment. Other metadata files are automatically generated for you, such as native_libs.txt (a list of C extensions, if any) and PKG-INFO (descriptive information about the package). Do not edit these files, as the next time you run bdist_egg they will be overwritten with the automatically generated versions.

If you want to build eggs from other people's packages (who don't import from setuptools), then in Python 2.4 and higher you can do:

python setup.py --command-packages=setuptools.command bdist_egg

If you're using Python 2.3, however (and eggs don't work with versions less than 2.3), you have to copy setuptools/command/bdist_egg.py into the distutils/command/ directory of your Python installation (e.g. in lib/python2.3 or \Python23\Lib).

Note: packages that expect to find data files in their package directories, but which do not use either the PEP 302 API or the pkg_resources API to find them will not work when packaged as .egg files. One way you can check for this is if the .egg file contains data files, and the package is using __file__ to find them. You'll need to then patch the package so it uses pkg_resources.resource_filename() or one of the other resource_* APIs instead of __file__. See the section on Accessing Package Resources, below, for more information about updating packages to use the Resource Management API instead of __file__ manipulation.

Also, some packages (e.g. wxPython) may include dynamic link libraries other than Python extensions. If this is the case, you'll need to create an eager_resources.txt file in the .egg-info directory that lists the in-zip paths to these libraries, one per line. This will let the runtime system know that those files need to be unpacked if any of the extensions are used. Thus, when attempting to import an extension, the runtime will also unpack all the dynamic link libraries that go with it.

(Note: if you still can't get the library to work as an .egg file after trying the above tactics, please report your problem on the distutils-Sig mailing list. Thanks.)

Declaring Dependencies

Some eggs need other eggs to function. However, there isn't always a meaningful place for a library to call require(), and in any case a library's source code is rarely the place to declare its version dependencies. So setuptools allows you to create a depends.txt file that can be bundled inside the .egg file's metadata directory, and which will be used by the egg runtime to automatically locate the egg's dependencies and add them to sys.path whenever the egg is needed by a require() call.

To create this file, you'll need to place it in the SomePackage.egg-info directory that the bdist_egg command creates. The format is fairly simple; here's a heavily-commented example:

# A sample "MyPackageName.egg-info/depends.txt" file
# Blank lines and lines beginning with "#" are ignored

# Lines at the beginning of the file specify the package's minimum
# requirements, and line-end comments are allowed:

FooBar >= 1.2   # a 'require()'-style dependency


# Here, we specify a more restricted set of versions for another
# package -- just the ones we've tested with this package.  Notice 
# that if a requirement is too long to fit on a line, you can use "\"
# to continue it, as long as the split is between version conditions.
# Notice also that arbitrary whitespace is allowed between tokens:

BazSpam ==1.1, ==1.2, ==1.3, ==1.4, ==1.5, \
        ==1.6, ==1.7


[FastCGI]
# A line with a name in square brackets defines an "option"
# The lines that follow, specify what other eggs are needed
# if this optional feature is requested.

fcgiapp>=0.1
FastCGITools>=2.1


[reST]
# Here's another optional feature: reStructuredText processing.  We
# need docutils to support that.
docutils >= 0.3

Of course, you don't have to comment your dependencies and features so thoroughly or use so much whitespace if you don't want to. Here's a minimal rendering of the same example, with exactly the same semantics:

FooBar>=1.2
BazSpam==1.1,==1.2,==1.3,==1.4,==1.5,==1.6,==1.7
[FastCGI]
fcgiapp>=0.1
FastCGITools>=2.1
[reST]
docutils>=0.3

But, as you can see, a little whitespace and commenting goes a long way towards other people understanding what your dependencies are. At the least, a blank line before option definitions makes them easier to find and read.

Currently, the depends.txt file is not syntax-checked by bdist_egg, so if you make a mistake you won't find out about it until you try to use the egg. (This will be fixed in a later version.)

Developing with Eggs

The next few sections provide tips and techniques for developing packages that work well as eggs, and/or take advantage of the egg runtime's special features.

Running Eggs from Source

So far, we've only covered how to use eggs that have actually been installed, by building them with the distutils and then putting them in a directory on sys.path. (Note: EasyInstall can download source distributions, automatically build eggs from them, and install the eggs for you, with just a single command -- even if the package's author did nothing special to support Python Eggs. Check it out.)

But what if you are developing a package and working from source code? You don't want to have to rebuild the egg every time you make a change to the source code. But, you have code in your script or application that calls require() and expects the egg you're developing to be available. For example, see this question from Ian Bicking about working with packages checked out from subversion, but not built as eggs.

The answer is fairly simple. When you run the bdist_egg command on a source distribution, it creates a SomePackage.egg-info directory in the distribution's root package directory. That is, it adds it to the directory that you would add to sys.path if you wanted to do development work with the source distribution. If you do in fact place this source directory on sys.path, the pkg_resources runtime will automatically recognize the directory as containing a "development egg" or "source egg".

That is, the presence of one or more SomePackage.egg-info directories inside a directory on sys.path tells the egg runtime that those eggs are installed in the containing directory. Any metadata files you've placed in the Package.egg-info directory will be read by the runtime, just as if you were using a .egg built by the bdist_egg command. Thus, if you have code that calls require("SomePackage"), and the runtime finds a SomePackage.egg-info in a directory on sys.path, the require() call will succeed, and no new entries will be added to sys.path. (Assuming the require() doesn't request a conflicting version, of course.)

So, if you're working on developing a package from its source checkout, you'll need to run bdist_egg at least once in order to create the egg-info directory. However, if you're adding metadata to the egg-info directory, you should add it to your revision control system anyway, and commit any changes that bdist_egg makes to the directory when you add or remove extensions or change metadata in setup.py. That way, anybody checking out the source code to do development will already be set up to have the egg-info on their sys.path.

Of course, you still have to get the package root on sys.path. One easy way to do this is to use .pth files; see this Bob Ippolito article on using .pth files, for example. You can also just use PYTHONPATH, or start Python with a script that's located in the desired directory. In other words, just do whatever you would've done before, and it'll work fine as long as the .egg-info directory is inside the target directory.

Accessing Package Resources

Many modern Python packages depend on "resources" (data files) that are included with the package, typically placed within the package's subdirectory in a normal installation. Usually, such packages manipulate their modules' __file__ or __path__ attributes in order to locate and read these resources. For example, suppose that a module needs to access a "foo.config" file that's in its package directory. It might do something like:

foo_config = open(os.path.join(os.path.dirname(__file__),'foo.config').read()

However, when code like this is packed inside a zipfile, it can no longer assume that __file__ or __path__ contain filenames or directory names, and so it will fail.

Packages that access resource files, and want to be usable inside a zipfile (such as a .egg file), then, must use the PEP 302 get_data() extension (see under "Optional Extensions to the Importer Protocol") before falling back to direct __file__ access.

Using this protocol can be complex, however, so the egg runtime system offers a convenient alternative: the Resource Management API (described in greater detail below). Here's our "foo_config" example, rewritten to use the pkg_resources API:

from pkg_resources import resource_string
foo_config = resource_string(__name__, 'foo.config')

Instead of manipulating __file__, you simply pass a module name or package name to resource_string, resource_stream, or resource_filename, along with the name of the resource. Normally, you should try to use resource_string or resource_stream, unless you are interfacing with code you don't control (especially C code) that absolutely must have a filename. The reason is that if you ask for a filename, and your package is packed into a zipfile, then the resource must be extracted to a temporary directory, which is a more costly operation than just returning a string or file-like object.

Note, by the way, that if your resources include subdirectories of their own, you must specify resource names using '/' as a path separator. The resource API will replace slashes with a platform-appropriate filename, if in fact filenames are being used (as opposed to e.g. zipfile contents). There are more detailed examples of resource path usage in the Resource Management section below.

Namespace Packages

Sometimes, a large package is more useful if distributed as a collection of smaller eggs. However, Python does not normally allow the contents of a package to be retrieved from more than one location. "Namespace packages" are a solution for this problem. When you declare a package to be a namespace package, it means that the package has no meaningful contents in its __init__.py, and that it is merely a container for modules and subpackages. The pkg_resources runtime will automatically ensure that the contents of namespace packages that are spread over multiple eggs or directories are combined into a single virtual package.

The best way to declare a namespace package is to create a namespace_packages.txt file in your MyPackage.egg-info directory before building the egg. Then, the runtime will automatically detect this when it adds the distribution to sys.path, and ensure that the packages are properly merged. The file should list the namespace packages that the egg participates in, one per line. Blank lines, and lines beginning with # are ignored, but line-end comments are not allowed. For example:

# Example namespace_packages.txt file

# This egg contains packages under the 'zope' namespace package,
# and the 'zope.app' namespace package
zope
zope.app

Note that to create an egg that is part of a namespace package, you must construct the egg such that it includes the namespace package's __init__.py, and the __init__.py of any parent packages, in a normal Python package layout. These __init__.py files should not contain any code or data, because only one egg's __init__.py files will be used to construct the parent packages in memory, and there is no guarantee which egg will be used. (This is one reason the concept is called a "namespace package": it is a package that exists only to provide a namespace under which other modules or packages are gathered. In Java, for example, namespace packages are often used just to avoid naming collisions between different projects, using packages like org.apache as a namespace for packages that are part of apache.org projects.)

Although you'll normally just declare your namespace packages in a namespace_packages.txt file, there is a Namespace Package Support API (described below) that will let you manipulate namespace packages at runtime if for some reason you need to do so. Normally, however, you should just use place a namespace_packages.txt file in the appropriate MyPackage.egg-info directory before building or using the appropriate egg(s).

Runtime API

The following APIs are available in the pkg_resources module:

Dependency Management

require(*requirement_strings)
Ensure that distributions matching the supplied requirement strings are active on sys.path. Raises pkg_resources.DependencyNotFound if the requirements (or their dependencies) cannot be met with already-installed distributions, or pkg_resources.VersionConflict if the version requirements (or their dependencies) conflict with a distribution already activated on sys.path.
find_distributions(path_item)
Given a string, yield Distribution instances found in the designated location. path_item is processed via the PEP 302 sys.path_hooks mechanism to obtain an importer. If no importer is available, path_item is assumed to be a filesystem directory. By default, pkg_resources only supports locating distributions via zipimporter paths and filesystem directories, but you can add support for other importer types using the register_finder API.

Resource Management

Note: you should not use use a namespace package as the modulename argument to any of these APIs, because a namespace package may comprise multiple distributions, and is thus ambiguous as to which distribution would contain the requested resource. Also, resource_name arguments should use / to separate directory components, and should not include any part of the module's __file__. For example, suppose that the fruitfly package contains a resource called sql.mar alongside grammar.py, and it needs to access it from within grammar.py:

fruitfly/
    __init__.py
    grammar.py
    sql.mar

The parameters you would use for any of the routines below would be a modulename of "fruitfly.grammar" and a resource_name of "sql.mar". So, the fruitfly.grammar module could obtain the contents of sql.mar as a string like this:

import pkg_resources
sql_grammar = pkg_resources.resource_string(__name__, 'sql.mar')

(Using __name__ automatically uses the name of the current module, which in this case is "fruitfly.grammar".) Notice that there is no need to perform any path manipulation, and in fact it might interfere with the proper operation of the resource management API.

As a more complex example, suppose that the doctools package has a directory with configuration data for various plugins:

doctools/
    plugin_finder.py
    plugins/
        plugin1.conf
        plugin2.conf
        ...

Then if we wanted to have a loop in plugin_finder.py to read each config file and process it, we might do:

from pkg_resources import resource_listdir, resource_string

for confname in pkg_resources.resource_listdir(__name__, 'plugins'):
    config = resouce_string(__name__, 'plugins/'+confname)
    parse(config)

Notice that we use a "/" to join path components, not os.path.join. The resource API expects all paths to be Unix-style, and will convert them to a local OS convention if appropriate and necessary. Paths should not contain names of "." or "..", and must not begin with a "/".

resource_exists(modulename, resource_name)
Does the named resource exist?
resource_string(modulename, resource_name)
Return the named resource as a binary string.
resource_stream(modulename, resource_name, mode='b')
Open the named resource as a file-like object, using the specified mode ('t', 'b', or 'U'). (Note that this does not necessarily return an actual file; if you need a fileno() or an actual operating system file, you should use resource_filename() instead.)
resource_isdir(modulename, resource_name)
Returns true if the specified resource exists for the package, and is a directory. Returns false otherwise.
resource_listdir(modulename, resource_name)
Returns a list of resources contained immediately within the specified resource subdirectory. Requesting the contents of a non-existent resources directory may result in an error.
resource_filename(modulename, resource_name)
Return a platform file or directory name for the named resource. If the package is in an egg distribution, the resource will be unpacked before the filename is returned. If the named resource is a directory, the entire directory's contents will be extracted before the directory name is returned. Also, if the named resource is an "eager" resource such as a Python extension or shared library, then all "eager" resources will be extracted before the resource's filename is returned. (This is to ensure that shared libraries that link to other included libraries will have their dependencies available before loading.)
set_extraction_path(path)
Set the base path where resources will be extracted to. If not set, this defaults to os.expanduser("~/.python-eggs"). (XXX this will probably change; see Implementation Status under "Resource Manager Open Issues".) Resources are extracted to subdirectories of this path, each named for the corresponding .egg file. You may set the extraction path to a temporary directory, but then you must call cleanup_resources() to delete the extracted files when done. (Note: you may not change the extraction path for a given resource manager once resources have been extracted, unless you first call cleanup_resources().)
cleanup_resources(force=False)
Delete all extracted resource files and directories, returning a list of the file and directory names that could not be successfully removed. This function does not have any concurrency protection, so it should generally only be called when the extraction path is a temporary directory exclusive to a single process. This method is not automatically called; you must call it explicitly or register it as an atexit function if you wish to ensure cleanup of a temporary directory used for extractions.

Namespace Package Support

A namespace package is a package that only contains other packages and modules, with no direct contents of its own. Such packages can be split across multiple, separately-packaged distributions. Normally, you do not need to use the namespace package APIs directly; instead you should create a namespace_packages.txt as described above in Namespace Packages. However, if for some reason you need to manipulate namespace packages or directly alter sys.path at runtime, you may find these APIs useful:

declare_namespace(name)
Declare that the dotted package name name is a "namespace package" whose contained packages and modules may be spread across multiple distributions. The named package's __path__ will be extended to include the corresponding package in all distributions on sys.path that contain a package of that name. (More precisely, if an importer's find_module(name) returns a loader, then it will also be searched for the package's contents.) Whenever a Distribution's to_install() method is invoked, it checks for the presence of namespace packages and updates their __path__ contents accordingly.
fixup_namespace_packages(path_item)
Declare that path_item is a newly added item on sys.path that may need to be used to update existing namespace packages. Ordinarily, this is called for you when an egg is automatically added to sys.path, but if your application modifies sys.path to include locations that may contain portions of a namespace package, you will need to call this function to ensure they are added to the existing namespace packages.

Distribution Objects

Distribution objects are returned or yielded by many pkg_resources APIs. Here is the public API for Distribution objects:

Distribution(path_str, metadata=None, name=None, version=None, py_version=PY_MAJOR, platform=None)
Create a new distribution object. Usually, one does not use this constructor directly. Instead, use Distribution.from_filename (which can parse most of the constructor arguments from a filename/path).
Distribution.from_filename(filename,metadata=None)
classmethod: Construct a new Distribution from the given filename, extracting the name, version, Python version, and platform from the filename if possible. metadata, if supplied, must be an object providing the methods described in IMetadataProvider, below.
installed_on(path=sys.path)
Is this distribution installed on the path list?
depends(options=())

Returns a list of the Requirement objects that must be satisfied if the specified options are requested for this distribution. options is a sequence of (case-insensitive) feature names, as might be defined in the distribution's depends.txt. (See Declaring Dependencies, above for more information on optional dependencies.)

Note that whether any options are provided or not, the returned list will always include the distribution's required dependencies.

install_on(path=sys.path)
Ensure that the distribution is importable on the path list. The distribution is not added a second time if it is already on the path list. If path is sys.path and the distribution was not already on it, then two additional operations are performed: any existing applicable namespace packages are updated to ensure they include the distribution, and any namespace packages listed in the distribution's namespace_packages.txt are declared.

In addition to the above methods, the following (read-only) attributes are available on Distribution instances:

path
A distribution object's path attribute is the string that will be added to sys.path when it is installed.
name
The name attribute lists the distribution's name, as found in its filename. For example, a distribution whose filename is SQLObject-0.6.1-py2.3-win22.egg would have a name of "SQLObject".
version
The version attribute lists the distribution's version string, as found in its filename or PKG-INFO file. For example, a distribution whose filename is SQLObject-0.6.1-py2.3-win22.egg would have a version of "0.6.1".
parsed_version
The parsed_version is a tuple representing a "parsed" form of the distribution's version. It can be used to compare or sort distributions by version, as parsed_version objects reflect the actual ordering of version strings.
py_version
The major/minor Python version the distribution supports, as a string. For example, a distribution whose filename is SQLObject-0.6.1-py2.3-win22.egg would have a py_version of "2.3". If the distribution filename doesn't contain a Python version, it defaults to the currently-running version of Python, so that e.g. "source eggs" will work correctly.
platform
A string representing the platform the distribution is intended for, or None if the distribution is "pure Python" and therefore cross-platform.
key
A lower-case version of the name attribute, used to compare distributions against requirements and to index them in AvailableDistributions objects.

Supporting Custom Importers

By default, the egg runtime system supports normal filesystem imports, and zipimport importers. If you wish to use the runtime with other (PEP 302-compatible) importers or module loaders, you may need to register various handlers and support functions using these APIs:

register_finder(importer_type, distribution_finder)
Register distribution_finder to find distributions in sys.path items. importer_type is the type or class of a PEP 302 "Importer" (sys.path item handler), and distribution_finder is a callable that, when passed a path item and the importer instance, yields Distribution instances found under that path item. See pkg_resources.find_on_path for an example.
register_loader_type(loader_type, provider_factory)
Register provider_factory to make providers for loader_type. loader_type is the type or class of a PEP 302 module.__loader__, and provider_factory is a function that, when passed a module object, returns an IResourceProvider for that module, allowing it to be used with the Resource Management API.
register_namespace_handler(importer_type, namespace_handler)

Register namespace_handler to declare namespace packages. importer_type is the type or class of a PEP 302 "Importer" (sys.path item handler), and namespace_handler is a callable like this:

def namespace_handler(importer,path_entry,moduleName,module):
    # return a path_entry to use for child packages

Namespace handlers are only called if the importer object has already agreed that it can handle the relevant path item, and they should only return a subpath if the module __path__ does not already contain an equivalent subpath. For an example namespace handler, see pkg_resources.file_ns_handler.

IMetadataProvider

IMetadataProvider is an abstract class that documents what methods are required by Distribution objects for their metadata parameter. An IMetadataProvider must support the following methods, each of which takes the name of an egg metadata file, such as "depends.txt" or "PKG-INFO":

has_metadata(name)
Does the package's distribution contain the named metadata?
get_metadata(name):
Return the named metadata resource as a string
get_metadata_lines(name)

Yield named metadata resource as list of non-blank non-comment lines. Leading and trailing whitespace must be stripped from each line, and lines with # as the first non-blank character must be omitted. The recommended way to implement this method is by using the pkg_resources.yield_lines() utility function, like this:

def get_metadata_lines(self, name):
    return pkg_resources.yield_lines(self.get_metadata(name))

There are a number of IMetadataProvider implementations provided by pkg_resources, including the PathMetadata and EggMetadata classes. Also, one of the setuptools test modules includes a mock metadata class that provides whatever metadata it's told to.

(Note: the above method list may be expanded later with methods for listing or globbing metadata resources)

IResourceProvider

IResourceProvider is an abstract class that documents what methods are required on objects returned by a provider_factory registered with register_loader_type() to provide support for custom PEP 302 module loaders. An IResourceProvider must have all of the methods defined by IMetadataProvider, above, and must also include these additional methods, where manager is an IResourceManager, and resource_name is the name of the needed resource:

get_resource_filename(manager, resource_name)
Return a true filesystem path for resource_name, co-ordinating the extraction with manager, if the resource must be unpacked to the filesystem.
get_resource_stream(manager, resource_name)
Return a readable file-like object for resource_name.
get_resource_string(manager, resource_name)
Return a string containing the contents of resource_name.
has_resource(resource_name)
Does the package contain the named resource?
resource_isdir(resource_name)
Is the named resource a directory? Return a false value if the resource does not exist or is not a directory.
resource_listdir(resource_name)
Return a list of the contents of the resource directory, ala os.listdir(). Requesting the contents of a non-existent directory may raise an exception

IResourceManager

An IResourceManager is an object that provides support for IResourceProvider objects that need to extract resources to a temporary filesystem location. They provide the following methods:

get_cache_path(archive_name, names=())

Return absolute location in cache for archive_name and names. The parent directory of the resulting path will be created if it does not already exist. archive_name should be a unique name, such as the base filename of the enclosing egg (which may not be the name of the enclosing zipfile!), including the ".egg" extension. names, if provided, should be a sequence of path name parts "under" the egg's extraction location. These names will be joined with archive_name and the current extraction location via os.path.join().

This method should only be called by resource providers that need to obtain an extraction location, and only for names they intend to extract, as it may track the generated names for possible cleanup later.

postprocess(filename)
Perform any platform-specific postprocessing of file filename, such as performing dynamic link library relocation or path relinking. filename should be a filename returned by a previous call to get_cache_path(). Resource providers should call this method ONLY after successfully extracting a compressed resource. They must NOT call it on resources that are already in the filesystem, but must ALWAYS call it following a successful extraction.

Implementation Status

Egg Builder (bdist_egg command)
  • (DONE) format suitable for use with zipimport if no resource access (including extensions) is needed
  • (DONE) add required EGG-INFO metadata directory for storing info about the egg itself
  • (DONE) include PKG-INFO metadata file in the EGG-INFO
  • (DONE) Hand-made EGG-INFO files can be placed in an PackageName.egg-info directory of the distribution
  • (DONE) By default, all .so/.dll/.pyd files are "eager" (listed in EGG-INFO/native_libs.txt)
  • (DONE) build process generates .py/.pyc/.pyo stubs to load extensions via pkg_resources.resource_filename() calls
  • Syntax-check manually-supplied metadata (depends.txt, eager_resources.txt, and namespace_pacakges) to avoid creating "bad eggs".
  • Need a way to specify features to be excluded from egg distribution
  • needs option to not include source code in .egg (default is to include source, for IDE's, debugging, etc.)
Resource Manager Open Issues
  • (DONE) Egg-info can include list of "eager resources" which must be extracted all-at-once if any of them are extracted (listed in EGG-INFO/eager_resources.txt)
  • Cleanup on Windows doesn't work, because the .pyd's remain in use as long as Python is still running. An application that really wants to clean up on exit can presumably spawn another process to do something about it, but that kind of sucks. (Michael Dubner suggested using HKEY_LOCAL_MACHINE\\SYSTEM\\CurrentControlSet\\Control\\Session Manager\\FileRenameOperations to fix the Windows problem, but unfortunately didn't give any details or sample code.) Sample code at http://aspn.activestate.com/ASPN/docs/ActivePython/2.2/PyWin32/win32api__MoveFileEx_meth.html
  • Default cache location: os.expanduser() is broken on Windows, as it can result in a directory that isn't actually writable. May need to use registry info to fix this, or default to a temporary directory on Windows. At least an application can choose a fixed location if it wants to ensure future executions will be fast. (Unix home directories aren't necessarily writable either, so maybe a temporary directory really is the right default, or perhaps try to create the home first and fall back to a temporary if the directory isn't writable.)
  • The extraction process needs to be protected by a thread lock and a process lock (i.e., prevent collisions both inside and outside the current process)
  • The extract process treats the file's timestamp in the zipfile as "local" time with "unknown" DST. It's theoretically possible that a DST change could cause the system to think that the file timestamp no longer matches the zip timestamp. Also, the resulting Unix-style timestamp for the extracted file may differ between systems with different timezones. This is an unfortunate side effect of the fact that the zip file format does not include timezone information or a UTC timestamp.
  • Fix home directory location on Windows
  • (DONE) Allow listing or globbing of resources, and unpack entire resource directories if their resource_filename() is requested from a zipped distribution
Namespace Packages
  • (DONE) Need to implement hook in Distribution.install_on() to fix up existing NS packages, and to flag namespace packages identified from distribution metadata (e.g. EGG-INFO/namespace_packages.txt).
  • (DONE) Need namespace_package() API in pkg_resources
  • (DONE) Implement thread-safety by holding the import lock while manipulating namespace packages
Dependency/Discovery
  • (DONE) Given a sys.path entry, iterate over available distributions, either .egg directories, .egg files, or distributions identified by .egg-info directories.
  • (DONE) Version spec syntax (needs to support specifying "extras" in order to implement optional dependencies)
  • (DONE) require() API call to automatically resolve dependencies and add them to sys.path
  • (DONE) EGG-INFO.in/depends.txt (feature->requirements mapping specifying distribution+version)
  • (DONE) Add support for extending the distribution finder for non-filesystem sys.path entries (by adapting PEP 302 importer objects)
  • (DONE) Add a zip distribution finder to support "baskets" -- i.e., multiple distributions inside a single .egg file.

PythonPowered
EditText of this page (last modified 2005-06-04 17:21:17)
FindPage by browsing, title search , text search or an index
Or try one of these actions: AttachFile, DeletePage, LikePages, LocalSiteMap, SpellCheck