Friday, August 29, 2014
Scalable data science training in Seattle
Together with the University of Washington in Seattle, we are setting up a full day of scalable data science training using Graphlab Create, on Wed Sept 17. Anyone who is interested in welcome to register here, you are welcome to use discount code GLABER.
Thursday, August 21, 2014
Do you like "The Killings"? Dive into Seattle police data!
Here is an interesting blog post analyzing Seattle police data. I got it from Carlos Guestrin, our CEO.
Another interesting dataset is Allstate insruance claims data, which is from their Kaggle competition.
Another interesting dataset is Allstate insruance claims data, which is from their Kaggle competition.
Wednesday, August 20, 2014
GraphLab Create helps analyze FCC network data
My collaborator Scott Kirkpatrick from the Hebrew University is using Graphlab Create to analyze FCC broadband data. He is using GraphLab Create to slice & dice large corpus of network measurement data. Here are some resulting beautiful plots that illustrate network traffic from different aspects. The data is free, anyone who wants to look at the code is welcome to email me and I will share the ipython notebook to generate those plots.
GraphLab Create's Boosted decision trees for Kaggle Bike Sharing Competition
My collaborator Jay Gu has just released a blog post which explains how to use Boosted Decision Trees in GraphLab Create to compete in Kaggle's Bike Sharing Competition. Using this simple solution we get to place 15 in the leaderboard out of 569 competitors!
Sunday, August 17, 2014
DataRobot raises 21M$ series A
Just got this from my collaborator Jay Gu: DataRobot raises 21M$ in series A. A Boston startup who is trying to automate data science. According to this blog post, the invested was led by NEA, who also invested in Databricks (Spark) as well as GraphLab.
A related company is SparkBeyond, an Israeli startup who raised 4M$. They also automate data science by automatically generating features and evaluating them using multiple algorithms.
A related company is SparkBeyond, an Israeli startup who raised 4M$. They also automate data science by automatically generating features and evaluating them using multiple algorithms.
Tuesday, August 12, 2014
Interesting paper from Dataiku about WCSD 2014
Dataiku recently won first prize at the Yandex WCSD 2014 competition. Here is a paper describing their methodology. Dataiku was recently present at our GraphLab Conference. They have a visual Excel like environment for data manipulation, cleaning and predictions.
Friday, August 8, 2014
Sparse K-means
I got from my collaborator Jay Gu the following recent paper: A Single-Pass Algorithm for Efficiently Recovering Sparse Cluster Centers of High-dimensional Data from ICML 2014. Basically it is K-means with L1 constraint on the cluster center. The results are sparse cluster centers, which may sense for example when clustering text documents together.
A second relevant paper I got from my collaborator Yao Wu is
Web-Scale K-Means Clustering by Scully from Google Pittsburgh. The paper uses mini batch to speed up computation and achieve sparsity using project gradient ascent.
A second relevant paper I got from my collaborator Yao Wu is
Web-Scale K-Means Clustering by Scully from Google Pittsburgh. The paper uses mini batch to speed up computation and achieve sparsity using project gradient ascent.
Tuesday, August 5, 2014
Misc News
Collaborative filtering tutorial by Netflix
I got this from my collaborator Alice Zheng, a lecture about collaborative filtering by Xavier Amatriain from Netflix at the MLSS summer school organized by Alex Smola at CMU:Deep learning @ Spotify
I got the following from my colleague Zach Nation: An Interesting blog post from Spotify about convolutional neural networks usage to learn latest factors for collaborative filtering. And here is the related NIPS paper.Cloud Service @ Databricks
Just recently Databricks has announced their business model: cloud service running Apache spark.
Here is the keynote at the Spark summit:
MapGraph: First Multi-GPU Graph Analytics System by Systap
I just heard from my colleague Bryan Thompson from Systap that they have recently released MapGraph: the first distributed graph analytics framework which supports GPUs. Here is their blog post giving additional details.
Subscribe to:
Posts (Atom)