Vlada's technical blog: Tox

Showing posts with label Tox. Show all posts

Sunday, June 19, 2011

Setting up and using Tox in SymPy

In my previous post on the subject I mentioned we decided on using Tox. In the weeks since, we've established a workflow and successfully used Tox to find various bugs (see my post from last week for a particularly interesting one). As I've found the Tox documentation confusing in places, I've decided to write a "how-to" for setting up Tox and then talk some more about how we use it.

Setting up Tox

A simple "pip install tox" will handle the installation. Then you'll want to create a tox.ini file where your setup.py is located. The .ini can be very simple, just listing the wanted environments and the commands to run in each.

[tox]
envlist = py25, py26, py27
[testenv]
commands=python bin/test []
         python bin/doctest []

Here we are just using the default environments (py24-py32, Jython, PyPy) and then running our test suite in each. The square brackets are important - our bin/test is programmed in such a way to accept a testname as an argument, it will then run just the specified test(s). The brackets allow us to replicate this behaviour with Tox, instead of running the whole test suite every time. However, Tox makes it very easy to define a custom environment. One such that we could use is:

[testenv:py27-32-gmpy]
basepython=/Library/Frameworks/Python.framework/Versions/2.7/bin/python2.7
deps = http://gmpy.googlecode.com/files/gmpy-1.14.zip
commands = python bin/test []

SymPy can optionally use gmpy to provide better ground types. This test environment tests it with a 32-bit version of Python 2.7; the "basepython = " allows us to specify the path to a specific python interpreter to use. The "deps" command specifies dependencies (multiple dependencies should be in multiple lines, like for "commands" above). It can download them automatically using pip, or you can specify a zip/tarball like above. You can also see the full Tox specification, but these are the most important commands and the ones most likely to be used. You can also see the tox.ini.sample file we use, it has some more examples. (Note: this is how configuration files should be handled in DCVS': a "canonical" .sample file, which should then be copied to .ini and modified to needs. The .ini file itself should be ignored by your DCVS)

Using Tox

Tox works by creating a virtualenv for each environment specified, by default in a .tox directory, so there is no way it can mess up your system. It will reuse the existing environments, but it's possible to force a rebuild with "tox --recreate". You normally run Tox with a simple "tox", which runs all the environments listed in "envlist". You can optionally specify particular environments with -e (so, "tox -e py25,py26" will run just py25 and 26). A neat feature here is that you don't need to have an environment in your default envlist to run it with "-e".

Now obviously, the basic usage is a simple "tox". That will build all the default environments and run the whole test suite in each. Unfortunately, our test suite takes around 10 mins to run on average hardware and we currently support 3 Python versions (to go up to 4 or 5 when we finally get Python 3 support). If we add in {gmpy / no-gmpy} and even {32bit, 64bit}, we quickly get to an unrealistic amount of combinations to run. Yes, it is important to run these tests occasionally, but it isn't realistic to expect a developer to run all of them for every change they make (that will be the job of our CI server, after all). Still, it's important to test at least some basic testing for every change. As such, I advocate running py25-py27 by default, every time. In my opinion, this is a good compromise between time and testing coverage. If a bug occurs in some version, one can then research it further by running more variations (with gmpy, on 32bit etc) or preferably just fix it if possible.

Another useful feature is that our test suite supports running just specific tests (as mentioned above). Depending on the code changes, it might be enough to run just the appropriate test suite. This will drastically reduce the time required (and will allow the developer to run in all versions, rather than the reduced list above). Alternatively, if just one test fails after a change, it's easier to debug it if you can run just the specific test.

That's basically it! Tox is very simple to use and quite effective. The biggest challenge is probably persuading developers to use it, but the value is clear. It's also invaluable for testing before a release, per the words of the maintainer of SymPy, Aaron.

Saturday, June 11, 2011

GSoC: week 3

This week I've worked on adapting my work into commit-worthy chunks. The results are pushed to my fork on Github, you can see them here. I've tried to make the commits fairly self explanatory - the first several are just fixing the warnings "-3" produces in a matter that's Python 2.5+ compatible. Unfortunately, Googling the error didn't produce much information in most cases; because of this, I've created a page where I've collected the solutions I used to solve each warning. You can find it on the right-hand sidebar. I've since also updated the version of mpmath bundled in SymPy to 0.17 (which supports Python 3 but drops support for Python 2.4, thus representing the first commit not compatible with Python 2.4).

Unfortunately, here I've hit a problem. Namely, mpmath is written with a single code-base that can run both Python 2 and Python 3 unmodified (this is not the recommended course of action usually, but mpmath didn't need to change much so I guess it makes sense). Normally, this is good, but SymPy will need to use 2to3 and this creates errors in SymPy. Why? One banal example is something like:

try:
from itertools import izip
except ImportError:
izip = zip

Now, when 2to3 runs on this it removes the "from itertools..." line and thus we get an error. This is just one simple case, but there are plenty of others. It's obvious that running 2to3 on compatible code is not going to produce the desired results. And this is where I'm stuck currently. 2to3 doesn't support skipping some directories, so I'll probably have to develop our "internal" script that will use lib2to3 directly. To do this, I've decided to first try to integrate automatic 2to3 into our setup.py, so that I'd have an idea on how it works. [As a side note, thanks to Lennart Regebro and Benjamin Peterson, who've helped me in this thread on the Python-porting mailing list when I asked how to speed up 2to3 -- using the latest version of it was particularly good advice as it provides ~40% speedup in my case]

Once mpmath is integrated properly, I'll be able to continue on porting SymPy itself. I feel like I'm on track with the timeline in my proposal, which would be a working SymPy by the midterm evaluations.

I'd also like to take a moment here to comment on the value of Tox. While porting yesterday, I encountered an error in a polys test. As mpmath is used heavily in this code, I thought it was something to do with the new version. Later, I realized it occurs in master too and quickly bisected it down. And this is where the trouble started, as no one could reproduce the error, with any combination of Python version, architecture, ground-types and cache used. Digging down deeper, we found that the hashes of some functions really were different for everyone, though it was just me getting the error. In the end, someone else confirmed it - Python 2.6, 64-bit, with python ground types (and only on Linux, I think).

The error itself was nasty - ~~int(1) and long(1) do not have the same hash~~ [EDIT: See Aaron's comment below for the details] (I was told similar problems occurred before as well). As the underlying hash was hashing lists of lists (possibly deeply nested), the fix was to convert it to tuples of tuples and then hash it. The patch was written by asmeurer, and the whole discussion can be found on the issue page (and the IRC logs from yesterday). The moral of the story? Use Tox! How else would we find something that only appears in such specific circumstances.

Saturday, May 28, 2011

Continuous integration and SymPy: Buildbot or Jenkins (with Tox)

In software engineering, continuous integration (CI) implements continuous processes of applying quality control — small pieces of effort, applied frequently. Continuous integration aims to improve the quality of software, and to reduce the time taken to deliver it, by replacing the traditional practice of applying quality control after completing all development. --Wikipedia, Continuous integration

The above definition might seem a bit strange and archaic - isn't most open-source software developed in small chunks and merged in as soon as possible? That's why we have git and other distributed version control systems, as they make this work much more manageable. Continuous integration also means testing, which is something that is especially important for a library, and that's where we hit the first snag. SymPy has a policy that all tests must pass before making a change, which is a valid policy, but the simple fact of life is that SymPy is supposed to work on various platforms across multiple Python versions (and different ground types!). This means that a single developer cannot reasonably check all possible combinations and in the long run introduces subtle bugs in the code, especially in eg. older Python versions. This is where a continuous integration server comes in.

The goal of a continuous integration server is to, well, continually integrate. It controls some slaves, gives them tasks and collates the results. Usually, this means building the project and running the test suite, but it can be anything. This process is then repeated nightly, or after every commit, or started manually (eg. with specific parameters); in general, CI servers are very powerful and extendable, to fit the needs of specific projects. Part of my GSoC project was to investigate which CI server could be used with SymPy. I have considered buildbot, which was used in SymPy previously, and Jenkins.

Current SymPy workflow is for another developer to run the test suite on a given pull request before (thinking of) merging. To automate this process, a helper tool called SymPy-bot has been developed. It is a simple Python script which can list all the pull requests and test a particular one. The results are presented in a table on pastehtml and a comment is automatically made in the pull request. In essence, a poor man's continuous integration. :)

Buildbot

Buildbot was used by SymPy before, so it was a natural first choice (the fact that it's widely used, notably by Mozilla and Chrome, is another big plus). Buildbot has a classic master/slave structure, which means that each slave has to be setup separately, with the appropriate environment prepared in advance. As a test, I have created a local buildbot and some slaves and played around with them. Buildbot presents the information in a table format by default (like this), but can easily send it by mail or to an IRC channel or any combination of the above. It is in general a very robust project. A big advantage is the existence of a "TryScheduler", which applies a given patch and runs tests. This closely approximates the current SymPy workflow and I consider it a very desireable feature.

Unfortunately, the robustness of Buildbot comes at the cost of complexity. Setting up a build slave is a non-trivial task and in the end I decided to take a look at Jenkins first, before continuing to work with Buildbot.

Jenkins

Jenkins (formerly Hudson) is a CI server written in Java, providing much of the same functionality as Buildbot. Unfortunately, it does not support Python natively. This is where Tox comes in. Tox is an "automation project" for Python programs, which uses virtualenv to create different environments where the program can be tested in. It is extremely easy to setup, the following tox.ini file is all I needed to have (and appropriate Python versions installed, of course):

[tox]
envlist = py25, py27, docs
[testenv]
changedir=bin
commands=python test []
[testenv:docs]
commands=python doctest []

It is then run with "tox", which automatically creates the necessary virtualenv's (reusing them if they already exist, of course) and then executes the given commands. The [] brackets allow us to replicate the behaviour of "./bin/test hydrogen" (runs just hydrogen tests) with "tox hydrogen". I like Tox a lot, as it makes it easy for a single developer to test many Python versions (as long as they are installed). Alternatively, it is easy to modify the tox.ini file to test with just the available Python interpreters.

Moving on, Tox also provides seamless integration with Jenkins (through the use of a "multi-configuration project") which then provides all the features expected from a CI server, including automatic builds, nice presenting of data and so on (speaking of presentation, it would be a good idea to have our test tool support JUnitXML, which shouldn't be too hard). One disadvantage of the Tox/Jenkins combination is the lack of a TryScheduler like the one Buildbot has. I have spoken with the developers, and they will add it to their Github plugin (eventually). In the mean time, it should be possible to manually program the functionality we need (it is possible to request a build with a parameter, which could be the name of a pull request) accessing Github directly. As this is something SymPy-bot does, I wanted to speak with Ondrej how to do this exactly.

In conclusion, Tox/Jenkins impressed me enough not to go back to Buildbot. Tox is very simple to configure and can be run locally. Jenkins appears simpler than Buildbot to setup and maintain, while offering almost the same functionality. The lack of a try scheduler is unfortunate, but as we've basically already engineered a solution to the same problem, I'm confident a solution will be found. In general, though, I feel that using Tox/Jenkins is an excellent choice for any Python project concerned with compatibility. [Update: I've written a short guide to using Tox with SymPy on the Wiki, you can read it here]