submit to reddit
Ilan Schnell

New Advances in conda

A lot of developments have happened since our last conda blog post half a year ago. In this post, we want to focus on two new major development efforts: the resolution of install dependencies using a SAT solver and the ability to build conda packages from “conda recipes”.

SAT solver

The first iteration of conda was using a graph-based algorithm on top of hand-coded constraints to find a solution to the install dependency problem. With more and more packages being added to the Anaconda repository, we started seeing performance problems. This is when we started looking into using SAT solvers to solve the install dependency problem, as other package management systems, for example libzypp and Zero install use this approach already.

To get more familiar with SAT solvers, and the way different types of problems are translated into Boolean satisfiability problems, I wrote a SAT based Sudoku solver.

Translating the install problem into a Boolean problem is well described in this paper. The basic idea is the following: For each package, you create a Boolean variable, and dependencies and conflicts are then formulated as clauses. After solving the corresponding SAT problem, you know which packages need to be installed. However, there is a caveat we ran into. One is usually interested in the minimal install profile, i.e. the minimal set of packages required to be installed in order for some requirement to be satisfied. However, the Boolean problem you are solving usually has exponentially (with regard to the number of packages) many solutions. Our solution to this problem is to reduce the search space prior to applying the Boolean transformation, and to calculate the SAT solutions for which the number of true literals is minimal. The implementation can be found in this module. This single module, with about 300 lines of code, replaced about 2000 lines of complicated constraint logic, which had become too slow and not very maintainable.

Build recipes

The second new major development effort in conda is the ability to build packages from conda recipes using the conda build command. Users can now create their own conda packages in a reproducible way and upload them to binstar.org.

When we first started Anaconda a year ago, we needed to build all types of different packages, i.e. Python packages, but also Python itself as well as low level C or C++ libraries like HDF5, LLVM, bzip2, zlib, etc. The choice was made to create a bash script for each of these packages which installs the package into a build prefix. For example, the build script would look like:

#!/bin/bash
./configure --prefix=$PREFIX
make
make install

for a package like zlib, and it would be as simple as

#!/bin/bash
python setup.py install

for a Python package. Using bash (and .bat scripts on Windows) turned out to make these scripts quite simple and easy to read. Such clarity and conciseness would not have been possible in Python. After running such a script, all new files in the build prefix are bundled up into a tarball. Note that in the second example Python is a build dependency, and therefore it needs to be pre-installed into the build prefix, and only files which are added by running the bash script are allowed to be included in the tarball. These tarballs (with some extra metadata) are the conda packages we’ve been using from the beginning (even before conda itself existed).

In order to build many of these packages efficiently, a lightweight framework was created, which:

  • handles some package metadata
  • extracts the source tarball of a package
  • creates a “build environment” (build dependencies, e.g. Python, are installed here)
  • runs the actual build script, with special environment variables set, e.g. PREFIX (the build prefix)
  • adds package metadata about the new package into the build prefix
  • packages up the new files in the build prefix into a conda package

As the community became more interested in Anaconda, the need for people to create their own conda packages became more important, and we have now made the previously internal Anaconda build system part of conda itself, such that the wider community can benefit from our efforts. Using the latest version of conda (use conda update conda to get the latest version), you try out the build command yourself. We have created a github repository with many recipes in order to test the new conda build command ourselves, and to allow users get started (pull requests are also welcome):

$ conda update conda
$ git clone git@github.com:ContinuumIO/conda-recipes.git
$ cd conda-recipes
$ conda build sample/

When the build command has finished, the user will be displayed with information on how to upload the newly created package to binstar.org. To do so, a user account needs to be created. The still required beta code is “binstar in beta“.

A more complete description about how the system works may be found here.

Future plans

This work is the basis for wider ideas. We plan to add the ability to build conda packages from recipes for all Anaconda supported platforms on binstar.org, such that people who want to build packages, but do not have access to all operating system, can do so. Moreover, we are working on an application building framework for Wakari and Anaconda, which allows users to very easily create a applications, which can then be made available through the Anaconda-Launcher. These applications are also conda packages, but contain an icon and entry point.

Tags: Packaging Anaconda conda
submit to reddit
comments powered by Disqus