Vlada's technical blog: GSoC: week 5: SymPy now runs in Python 3!

As the title says, you can now run SymPy in Python 3! No, it's not perfect, there are test failures (or rather, exceptions, some 200 of them), but ~90% tests pass and that's getting somewhere. I haven't tried actually using it, but at least for some definitions of "works", SymPy definitely works in Python 3. I'll document some of the issues I solved below, and I'll also update the porting tips and tricks page when I get the chance.

The cmp issue

While the new unicode/string features are probably the major Python 3 incompatibility for most programs, SymPy's foremost issue was the use of cmp(), which has been discontinued in Python 3 in favor of __lt__ and other rich comparisons. SymPy uses cmp() in various areas of code (sympy.core.Basic for one, and almost everything derives from that). I've tried to approach this from various angles, but most required a deeper understanding of SymPy than I currently posses. In the end, I opted for the most ingenious solution of all - sidestepping the issue. To see how I did this, first lets see the two different ways the removal of cmp() affects us:

a) builtin.sorted() and list.sort() no longer accept the cmp argument providing a comparison function. Use the key argument instead.

Now, this is a good thing! The cmp() function is slow in this case, as it will compare every two elements (or so). In comparison, the key function takes some defined "key" value and sorts based on; this means it's called exactly once on each object. However, if you rely deeply on customized __cmp__ functions, it can be difficult to simply use a key instead of cmp. The following recipe, which provides a cmp_to_key function, allows us to avoid this by doing just what the name says. It's not an elegant solution, but it allows us to move on with porting at least. The recipe has been included in the functools module in Python's 2.7 and 3.2, but we implement it in core.compatibility as we need to support more Python versions.

b) The cmp() function should be treated as gone, and the __cmp__() special method is no longer supported.

This is the root cause of the above point. Again, SymPy uses this quite extensively and fixing everything is beyond me at the moment. However, it is possible to naively define cmp as (a>b) - (a<b) if the builtin is not available, and I did exactly that in core.compatibility. That this works means that we aren't doing anything too complex with out __cmp__ implementations (it does fail in one case, though), which means that porting is possible.

Neither of the above changes is efficient or very nice at all (in fact, I described them as "evil" in a commit message). Still, going over these issues allows me to see the other problems that will arise and work on fixing them. It will allow me to finally have some visible progress on my project. This is important because my goal for the mid-term evaluation is to have a working SymPy in Python 3. Once SymPy is working completely, I will go back and implement a "correct" solution for this.

The other issues I solved are comparatively minor: I fixed some warnings produced by "-3" (the backquote is no longer a shortcut to repr(), tuple parameter unpacking has been removed) and some of the exceptions raised when running with Python3 (file() has been removed and we were encoding to ascii needlessly. I plan to continue working on these exceptions; as a lot of the tests all fail at the same points, I except to rapidly have almost all tests running in Python 3 (if not passing). I've also added a "FIXME-py3k" comment in a couple of files where I identified the issue but couldn't solve it; these are all things to discuss with my mentor. If someone wants to test it out, I've written some instructions on the relevant issue in our bug tracker. Merging will probably wait some time, but I will push for it to happen as soon as possible.

Also during this week, I worked on setting up our Jenkins server, you can see it here. Anonymous users can see just the build history; developers should contact me if they want an account set up. It is currently down, however.This was mostly a test run - we decided to reinstall our server from an ancient Debian to a current Ubuntu and only then run the "live" Jenkins server. This will happen when Ondřej, who runs the server we use, has more time (around the middle of July). I don't expect further issues to arise.

I also have a test branch with a port to distribute (instead of the currently distutils we use). We need to use Distribute to support automatic running of 2to3 when installing. The porting itself was trivial, but I need to read a lot of documentation before I figured out what the differences are (I even asked a question on StackOverflow about it). To say the documentation is confusing would be an understatement. The major problem is how to solve the mpmath issue.

As I wrote in my week 3 report, mpmath is already Py3 compatible and 2to3 shouldn't be run on it. I've since checked it out and I don't see an easy method to skip the mpmath directory. It might be possible, but I don't see an easy way without going around Distribute entirely (which is, again, not the point). I even asked about in on the distutils-SIG mailing list, the answer is a "don't do it, unbudle instead". While I've been in the unbudle camp since the start, the discussion hasn't been so one-sided (I won't summarize it here, you can read the linked issue). In the end, though, if I don't find a way to support it in Python 3, we will have to unbundle (Ronan has a branch doing just this, and it is fairly painless).

Vlada's technical blog

Sunday, June 26, 2011

GSoC: week 5: SymPy now runs in Python 3!

1 comment: