Large Scale Machine Learning and Other Animals: GraphLab feedback this week

Saturday, November 12, 2011

GraphLab feedback this week

I got this email from Zeno Gantner:

"Hi Danny,

how are things? Any new enhancements for GraphLab planned? Its CF
library really looks impressive. Are there any plans for .NET/Mono bindings?

We had a poster about MyMediaLite at ACM RecSys in Chicago, and
mention GraphLab in the "related work" section:
http://ismll.de/mymedialite/download/mymedialite-recsys2011.pdf

Best regards from Germany,
Zeno"

I looked at the poster above and actually the system looks pretty interesting. One thing they have covered well is different cost functions and evaluation metrics like AUC and precision recall. I will probably follow up in this blog once I learn mode.

Last week I visited Ohio State University and had some very interesting meetings. One of the recent GraphLab users there is Josh Ash, research programmer who is working on Hyperspectral imaging. Here is the feedback I got from him:

"Hi Danny,

It was great meeting you as well, and an especially unusual coincidence that you visited the very week after I discovered and started using GraphLab! For the user's page, you can note that I am using GraphLab for "Bayesian analysis of hyperspectral imagery." I am still building out the model (mathematically and in code), but so far I am very happy with the usability and results from GraphLab. Currently I'm getting a 160x speedup using Java-GraphLab on an 8-core machine versus Matlab single-core. In addition to 8 cores, part of the speedup is due to better implementation (2nd time around is always better) and part due to Java being more efficient on looping over discrete random variables.

I will send more detailed notes about my GL experiences (installation, usage, suggested features, etc.) at a later date. Shortly, I'll also likely create a small web presence for this project so I can pass along that address as well. In the meantime, if you want a link for the 'users' page, you can use my home page: http://www2.ece.ohio-state.edu/~ashj/

Cheers!
Josh"

Justin Gardin from Stony Brook University, NY sent me the following description of his project. He finds GraphLab "amazingly fast."
"I am working on bioinformatic sequence motif analysis. The concept is pretty simple actually. Break up a bunch of sequences which are known to contain a binding site(binding site sequence unknown) in to a matrix of word counts. Each rows equals to a single permutation of ACGT in K slots. Each column equals distance along the sequence. A entry equals how many times the specific "word" occurs in that window of the sequences. Apply non-negative matrix factorization with dimension reduction. Set the reduction equal to the number of motifs you expect to come out of the data (usually trial and error). Then the U and V matrices represent the contribution of each kmer, at each position (respectively), to the motifs in the dataset. If there are big peaks in the U and V matrices at positions and similar words, then there exists a conserved motif in the dataset.

I follow up this analysis with a special case of gibbs sampling, which is initialized with the expected sequence motif from the U matrix, and weight the sampling by the V matrix. Works really well.

Justin Gardin"

Mike Davies from Cambridge Univ is documenting his GraphLab experience in his blog.

Large Scale Machine Learning and Other Animals

Saturday, November 12, 2011

GraphLab feedback this week

No comments:

Post a Comment

Labels

GraphLab Users Google Group

pagerank

google analytics

syntax