A lot of developments have happened since our last conda blog post half a year ago. In this post, we want to focus on two new major development efforts: the resolution of install dependencies using a SAT solver and the ability to build conda packages from “conda recipes”.
The first iteration of conda was using a graph-based algorithm on top of hand-coded constraints to find a solution to the install dependency problem. With more and more packages being added to the Anaconda repository, we started seeing performance problems. This is when we started looking into using SAT solvers to solve the install dependency problem, as other package management systems, for example libzypp and Zero install use this approach already.
Translating the install problem into a Boolean problem is well described in this paper. The basic idea is the following: For each package, you create a Boolean variable, and dependencies and conflicts are then formulated as clauses. After solving the corresponding SAT problem, you know which packages need to be installed. However, there is a caveat we ran into. One is usually interested in the minimal install profile, i.e. the minimal set of packages required to be installed in order for some requirement to be satisfied. However, the Boolean problem you are solving usually has exponentially (with regard to the number of packages) many solutions. Our solution to this problem is to reduce the search space prior to applying the Boolean transformation, and to calculate the SAT solutions for which the number of true literals is minimal. The implementation can be found in this module. This single module, with about 300 lines of code, replaced about 2000 lines of complicated constraint logic, which had become too slow and not very maintainable.
The second new major development effort in conda is the ability to build
packages from conda recipes using the
conda build command.
Users can now create their own conda packages in a reproducible way and
upload them to Anaconda.org.
When we first started Anaconda a year ago, we needed to build all types of different packages, i.e. Python packages, but also Python itself as well as low level C or C++ libraries like HDF5, LLVM, bzip2, zlib, etc. The choice was made to create a bash script for each of these packages which installs the package into a build prefix. For example, the build script would look like:
#!/bin/bash ./configure --prefix=$PREFIX make make install
for a package like zlib, and it would be as simple as
#!/bin/bash python setup.py install
for a Python package. Using bash (and .bat scripts on Windows) turned out to make these scripts quite simple and easy to read. Such clarity and conciseness would not have been possible in Python. After running such a script, all new files in the build prefix are bundled up into a tarball. Note that in the second example Python is a build dependency, and therefore it needs to be pre-installed into the build prefix, and only files which are added by running the bash script are allowed to be included in the tarball. These tarballs (with some extra metadata) are the conda packages we’ve been using from the beginning (even before conda itself existed).
In order to build many of these packages efficiently, a lightweight framework was created, which:
- handles some package metadata
- extracts the source tarball of a package
- creates a “build environment” (build dependencies, e.g. Python, are installed here)
- runs the actual build script, with special environment variables set, e.g.
PREFIX(the build prefix)
- adds package metadata about the new package into the build prefix
- packages up the new files in the build prefix into a conda package
As the community became more interested in Anaconda, the need for people to
create their own conda packages became more important, and we have now made
the previously internal Anaconda build system part of conda itself, such
that the wider community can benefit from our efforts. Using the latest
version of conda (use
conda update conda to get the latest
version), you try out the build command yourself. We have created a
with many recipes in order to test the new conda build command ourselves,
and to allow users get started (pull requests are also welcome):
$ conda update conda $ git clone firstname.lastname@example.org:ContinuumIO/conda-recipes.git $ cd conda-recipes $ conda build sample/
When the build command has finished, the user will be displayed with
information on how to upload the newly created package to
Anaconda.org. To do so, a user account
needs to be created. The still required beta code is
binstar in beta“.
A more complete description about how the system works may be found here.
This work is the basis for wider ideas. We plan to add the ability to build conda packages from recipes for all Anaconda supported platforms on Anaconda.org, such that people who want to build packages, but do not have access to all operating system, can do so. Moreover, we are working on an application building framework for Wakari and Anaconda, which allows users to very easily create a applications, which can then be made available through the Anaconda-Launcher. These applications are also conda packages, but contain an icon and entry point.Tags: Packaging Anaconda conda
Back to Blog→
- Anaconda Cluster
- Anaconda Server
- Bayesian Data Analysis
- Big Data
- Boolean satisfiability problem
- Data Science
- ipython notebook
- IPython Notebook
- Open Source
- Practical Python
- Product Release
- Product Update
- Python 3
- Raspberry Pi
- SAT solver
- Social Media
- Thomson Reuters
- White Paper