Friday, February 10, 2012

Thoughts on Google Code-in 2011

Google Code-in is the high-school equivalent of the Google Summer of Code. The program ran from Nov 21st to Jan 16th, though we've only now gotten around to sending a "summary" mail to the list about it. As Aaron noted, we've had some translation work, some work on SymPy Live and a bevy of documentation and code improvements. With 176 tasks completed, I'd say the whole project was a success for SymPy. I was involved as a mentor, so here are some general thoughts and observations about the process.

E-mail spam. In SymPy we didn't have a clean separation of mentor duties (eg. KDE only allowed tasks for which someone volunteered to mentor), so the initial idea was to add all (most) mentors to all tasks. This meant a lot of mails, an effect worsened by the fact that each commenter to the issue starts another "conversation" when viewed from Gmail (which I even reported to Melange as a feature request/bug). At the height of activity, I could get upwards of 30-40 mails ("conversations") daily, which by far dwarfed my other mail traffic. Then, because each comment is basically a separate mail, I wasted a lot of time looking at issue that someone already addressed (again, most mentors could handle most tasks). For the second round of tasks I didn't add myself to each task, otherwise I'm sure I'd have gotten even more spam. The bug I reported in Melange was fixed, so hopefully this will be less of an issue next year.

Being a mentor takes a lot of time. Partly a consequence of above, partly due to all the work being done, but being a mentor took a lot of time. Many students were unfamiliar with git (and didn't want to read the instructions on development workflow on our excellently-written (in my opinion) GCI Landing Page) and solving issues with them was a constant topic on IRC. Students also lacked follow-through with comments (or, occasionally, expected the work handed down to them) which didn't help. Finally, many students were very anxious, and didn't appreciate that we are all volunteers and cannot be around 24/7. All of this resulted in a process that was frustrating at times and stressful for mentors.

Regardless of all of the above, a lot of work was done for SymPy. While I didn't look at the stats, my feeling is that the biggest improvement could be seen in our SymPy Live interface (and our webpage) and our documentation. Yes, we also saw some code improvements, but they were probably a smaller part of the overall contribution (though by no means less important). Interestingly, I think this exposes the two types of tasks the GCI contest is well-suited to: tasks where there is no "in-house" expertise (anything web related in our case) and uninteresting tasks/chores (writing documentation, in our case and probably for most projects). In the first case, we managed to attract experienced developers who could improve our webpage much faster and better than any of the core developers. Writing documentation is also an important task, but one that is shunned by most developers. Still, it is mostly simple work and (more importantly) doesn't usually require in-depth understanding of the code. This made it ideally suited for new contributors. The financial award (100$ for every 3 completed tasks, up to 500$) was enough of a motivation for students. The all-around improvements to our documentation are probably the single biggest advantage of our participation in GCI.

Translations. In GCI, tasks were divided into categories and we needed to have at least 5 tasks in every category. While we managed to "fill-up" most categories, Translation was probably the biggest problem. As a, basically, command-line library, it does not make a lot of sense for SymPy to be translated in other languages. In the end, we created tasks for translating our webpage and tutorial to the languages covered by the development team and some of these were done, but I consider this a waste of time. Though this issue is "near and dear" to me (I'm not a native speaker of English), I'm of the opinion that it would be impossible for someone without at least a basic knowledge of English to program with SymPy. Simply, however much effort we put into translating, the class and method names will remain in English and there's no helping that. I very much doubt the newly translated documents will be even used and they're bound to fall behind as the original document changes. We also had to start using gettext to manage the translations, which is a non-trivial amount of work (and there are still some issues). In my opinion, it adds another layer of complexity (however small) for very little gain.

In conclusion: did we get stuff done? Yes, without a doubt. Would we have gotten more if the mentors used their mentoring time for coding? Perhaps, but not necessarily. Are some of the students going to keep contributing? Most likely not. Still, I would consider the whole program, and our participation in it, a success. Ideas for next year could be focusing more on stuff none of the core developers can do (eg. the website work), but we can't really say how far along will SymPy development progress during this year or which tasks might be available to students. Hopefully, more people will volunteer to mentor next year, which would help with most issues I raised here. It is interesting, though, that even with our normally very fast development process we couldn't handle the influx of student work. It'd be interesting to see how other organizations coped.

Here's to another GCI this year!


  1. I agree about spam. The notifications ended up being useless to me, since there were so many, and I ended up just checking my dashboard for needs review tasks every day. Also, as a result of the increased activity, I am *still* behind on email by about a month.

    There was a pretty lengthy discussion about translations on the Google Summer of Code mentors list, and I think as a result, they will not be required next year.

    1. That's good to know (pity I can't see the discussion). Still, I think if we use our experiences from this year we could have a more productive GCI. Still, would you agree that it was a good thing we applied, or do you think we would have accomplished more if the mentors could use the same amount of time to just code?