Saturday, June 11, 2011

GSoC: week 3

This week I've worked on adapting my work into commit-worthy chunks. The results are pushed to my fork on Github, you can see them here. I've tried to make the commits fairly self explanatory - the first several are just fixing the warnings "-3" produces in a matter that's Python 2.5+ compatible. Unfortunately, Googling the error didn't produce much information in most cases; because of this, I've created a page where I've collected the solutions I used to solve each warning. You can find it on the right-hand sidebar. I've since also updated the version of mpmath bundled in SymPy to 0.17 (which supports Python 3 but drops support for Python 2.4, thus representing the first commit not compatible with Python 2.4).

Unfortunately, here I've hit a problem. Namely, mpmath is written with a single code-base that can run both Python 2 and Python 3 unmodified (this is not the recommended course of action usually, but mpmath didn't need to change much so I guess it makes sense). Normally, this is good, but SymPy will need to use 2to3 and this creates errors in SymPy. Why? One banal example is something like:
    from itertools import izip
 except ImportError:
     izip = zip
Now, when 2to3 runs on this it removes the "from itertools..." line and thus we get an error. This is just one simple case, but there are plenty of others. It's obvious that running 2to3 on compatible code is not going to produce the desired results. And this is where I'm stuck currently. 2to3 doesn't support skipping some directories, so I'll probably have to develop our "internal" script that will use lib2to3 directly. To do this, I've decided to first try to integrate automatic 2to3 into our, so that I'd have an idea on how it works. [As a side note, thanks to Lennart Regebro and Benjamin Peterson, who've helped me in this thread on the Python-porting mailing list when I asked how to speed up 2to3 -- using the latest version of it was particularly good advice as it provides ~40% speedup in my case]

Once mpmath is integrated properly, I'll be able to continue on porting SymPy itself. I feel like I'm on track with the timeline in my proposal, which would be a working SymPy by the midterm evaluations.

I'd also like to take a moment here to comment on the value of Tox. While porting yesterday, I encountered an error in a polys test. As mpmath is used heavily in this code, I thought it was something to do with the new version. Later, I realized it occurs in master too and quickly bisected it down. And this is where the trouble started, as no one could reproduce the error, with any combination of Python version, architecture, ground-types and cache used. Digging down deeper, we found that the hashes of some functions really were different for everyone, though it was just me getting the error. In the end, someone else confirmed it - Python 2.6, 64-bit, with python ground types (and only on Linux, I think).

The error itself was nasty - int(1) and long(1) do not have the same hash [EDIT: See Aaron's comment below for the details] (I was told similar problems occurred before as well). As the underlying hash was hashing lists of lists (possibly deeply nested), the fix was to convert it to tuples of tuples and then hash it. The patch was written by asmeurer, and the whole discussion can be found on the issue page (and the IRC logs from yesterday). The moral of the story? Use Tox! How else would we find something that only appears in such specific circumstances.

1 comment:

  1. Cool guide. You should put it on the wiki. I'm sure other people would find it useful.

    The problem wasn't that int(1) and long(1) don't have the same hash (they do in fact have the same hash, which is just 1). The problem was that one of the Poly classes was using something like repr(list_of_ints) in __hash__. But repr(int(1)) == "1", whereas repr(long(1)) == "1L", so the repr() values were different, and hence the hashes were different (even though the two objects were otherwise equal, and compared equal using ==). See issue 2472.

    And by the way, the previous problem we had related to doing "type(a) is int" instead of "type(a) in [int, long]". But it was something that only failed in certain combinations (it didn't fail in 64-bits because the number in question was small enough to be an int in 64-bits but had to be a long in 32-bits), so tox would have helped back then too. See issue 1946.

    And by the way, testing this release candidate would be a nightmare without tox. I'm really glad you helped me to set it up.