Sunday, March 11, 2012

Open Connectome Project

A couple of days ago I sent out an initial announcement about our planned GraphLab workshop and immediately I started getting a lot of interesting feedback from my blog readers.

Joshua Vogelstein, a researcher at the Dept. of Applied Mathematics & Statistics, Johns Hopkins University just sent me a note a would like very much to participate in our workshop. Joshua is a part of the Open Connectome project, a very interesting project in the area of neuroscience. The project mission is to allow open access for neuro data for researchers worldwide. Here is some examples for the data they are hosting:

I have no clue what the above picture means (although I must admit they look pretty cool)!!.. so I asked Joshua to describe in a little more detail the problems he is working on. This is what I got from him:
We have two very different kinds of data:
1) EM Connectomes - each dataset is a volumetric image of part of some animal brain, ranging in size from 1GB and 10TB. you can look at the data in 2D here
the first project is 10TB. we also designed a RESTful interface to facilitate anybody downloading and processing the data. the instructions for using it are here.
another thing that we have, but haven't yet provided the documentation for, is an annotation database. the idea is that anybody should be able to download some volume, annotate it, and upload it back to the server. we collect and store all the annotations, and can combine them to obtain meta-annotations. i expect that we'll release details for the annotation database in a week or so.

2) MR Connectomes - these are essentially multimodal images of human brains, including both time-varying and non-time-varying. the "multi" part of multimodal means that for each subject we have a number of different kinds of images. our plan for what to do with this stuff is here. Currently, we are organizing the data and pre-processing it. the output of the preprocessed data with be for each subject (there are a few thousand of them), we will have an O(10,000) vertex and O(100,000) edge graph. our vertices are attributed. in particular, each vertex has a 3d position as well as a whole time-series associated with it. we will implement a kind of spectral clustering on each graph (see this manuscript for the theoretical results of our algorithm).

Another interesting aspect in Joshua's work, is that he is part of the Institute for Data Intensive Engineering and Science which has a 5PB "Data-Scope".

Joshua is interested in exploring GraphLab at the first step for spectral clustering of brain image graphs. I promised to help him utilize our SVD and K-mean solvers and try them out on some of his data. I am looking forward to meeting Joshua at our workshop. I also think it is going to be very interesting if he could give a quick talk describing some of the challenges he is facing and what is needed out of GraphLab to help him solve them.


  1. Hi Danny,

    I am one of reader of your blog. We exchanged a few e-mails back in October about moving my work to graph-lab (if you remember), although I become busy again with work and I never have a chance to get my self familiar with graph-lab :(

    I am working on similar topic (medical imaging, fMRI and connectivity) and I think graph-lab is perfect to such setting.

    I would like to meet your workshop to talk about it. BTW, where/when is the workshop?


  2. Hi Kayhan!
    Of course I remember. I will continue to post updates on my blog post here regarding dates and venue:

    It is going to be in early July at the bay area. You are defiantly welcome!!