Sunday, 22 March 2015

Google Summer of Code - prepost

This year (my last year as a student!) I decided to apply for GSOC (Google Summer of Code). It's been on my academical TODO list for a few years but haven't had the time or the motivation to apply. Since this is my last chance, I decided to have a go.

Now, a (not so) short intro...Google summer of code is a stipend programme for open source software that students (bsc, ms, phd) can apply to. Basically, different organizations like GNOME, Debian, CERN and MIT register and submit their proposals and open source project ideas to Google. Google then goes through the proposals and selects somewhere between 100 and 200 organizations. Once the organizations have been selected, students can contact them regarding different project ideas. These ideas are mostly already defined by the organizations but students can propose their own ideas. Once the students find a couple of organizations of interest, they usually contact the organizations before sending the official proposal to Google and before student registrations are open. It's important to start as early as possible to see about project ideas or even come to an agreement that the organization will pick them as one of the students for the project. Once a student feels confident enough and once Google opens student registrations, he/she can register, provide whatever info Google requires them and submit their proposals. Some organizations will require additional info in the proposal so make sure to check that with your prospective orgs. Organizations assign mentors to specific project ideas, pick student proposals and then Google evaluates the proposals and finally picks the students who will be given a stipend during their GSOC enrollment.

IMPORTANT: Being picked by the organization the student proposal is meant for doesn't mean you'll get accepted by Google -> after the organizations have been selected and after student proposal deadline, Google assigns student slots to organizations. So if the organization picked 4 students but Google assigns 2 slots, only 2 students will be accepted for that organization. But they may get selected for a different organization. That's why most students submit several proposals to different organizations. Basically, Google has a limited number of slots and distributes them between organizations.

Most organizations get 1-2 slots but organizations that have a vast number of student applications and are longtime contributors to GSOC get more slots. For example, Python Software Foundation (PSF) is a longtime member of GSOC and an umbrella organization. What this means is that this organization applied for GSOC, got accepted and then accepts other Python projects under its fold. Even some projects that haven't been accepted by Google (maybe because of limited number of slots) later get into GSOC through these umbrella organizations. Google is often extremely generous with slots given to umbrella organizations like PSF and Apache Software Foundation but if a large number of projects get under an umbrella organization, organizations might still get only 1-2 slots and some might theoretically even miss out. It depends.

Enough of the short intro, let's get back to what I originally wanted to write about. Since Python is the only programming language I trust myself to use to get the job done without the language being the obstacle, I've decided to work exclusively on Python projects. Now, I don't want to get stuck with a project that I'd do only for the money (and the amount Google pays isn't negligible from where I'm from) so I picked up three really cool projects to apply for:

  • GNS3 - higly veracious network simulator of real, physical networks.
  • NetworkXa software package for the creation, manipulation, and study of complex networks.
  • SunPyPython for Solar Physics

Cool, right? All three projects I've mentioned are part of the PSF umbrella organization. PSF requires students to keep a blog of their project progress and that's the reason this blog exists. It did take some of my time from writing the actual proposals but now that I'm writing it I think it's actually a nice idea.
They're ordered based on preference. I'm working on my masters thesis, porting a network simulator developed at my university from FreeBSD to Linux. That's why GNS3 is first on my list. I can elaborate on the idea there and use it in GSOC. Concretely, the project idea is "Docker support for GNS3".

Background: right now GNS3 supports QEMU, VirtualBox and Dynamips (a Cisco IOS emulator). We can think of the nodes in GNS3 and the links between them as virtual machines that have their own network stacks and communicate amongst themselves like separate machine on any other "real" network. While this is nice by itself, QEMU and VirtualBox are "slow" virtualization technologies because they provide full virtualization -> you can run any OS on them. So while QEMU and VirtualBox can run various network services, it's not very efficient. Docker, on the other hand, uses kernel-level virtualization which means it's the same OS but processes are grouped and different groups isolated between themselves, effectively creating a VM. That's why it's extremely fast and can run thousands of GNS3 nodes -> no calls between host and guest systems, it's the same kernel! Docker is quite versatile when it comes to managing custom made kernel-based VMs. It takes the load of the programmer so he/she doesn't have to think about disk space, node startup and isolation etc.

The second project, NetworkX is basically a graph analysis software written in Python. You define your graph with nodes and edges and run various graph algorithms on it. Before Google announced the selected organizations, NetworkX has been on the PSF GSOC wiki page, one of the first. They're the first organization I've contacted. While for GNS3 I just chose one of the already available project ideas, for NetworkX I've proposed to make a Tkinter GUI since it doesn't have one. It would enable users draw edges and graphs without actually writing programs. This wasn't exactly rejected but one of the core developers explained to me in a lengthy email that while they appreciate the effort, NetworkX is moving away from any kind of GUI development and that I should probably pick one of the existing ideas. So I chose to write a backend API for NetworkX.

Background: Until now, graph data has been represented as volatile Python dictionaries. It would make sense to provide a flexible backend interface to easily create modules for graph storage to efficiently access graphs at any time. This would usually include graph databases since their data representation is close what NetworkX does and such databases have efficient algorithms to access graph data but it shouldn't be restricted to such storages. Case in point would be document-store databases that can more or less directly save Python dictionaries as JSON data and load them. SQL databases are somewhat trickier because their data representation isn't directly compatible with graph.

The last project, SunPy, is a software package for solar physics computations. Now, while the previous two projects are more along the line of what I usually do and study, this is more of a whim! I mean, solar physics! Cool! The project ideas is to refactor on of its modules called Lightcurve. I have to admit I don't know a lot (close to nothing) about solar physics but this refactor project has more to do with actual Python refactoring. I'll probably have to learn something about solar physics, which I'd like to but although GNS3 was added to PSF organization list after SunPy, I've put most of my effort into writing the proposal for GNS3  because of my thesis with a touch of regret that I won't work on software that researches the Sun!

Whichever project gets selected, I'm sure it's going to be a fun and educational experience for me. Keeping my fingers crossed for the next post where I'll write in more detail about the project I'll (hopefully) work on.

Cheers