One of the most impressive lectures in our GraphLab workshop was given by Jefferey Heer from the HCI dept in Stanford. Data Wrangler is a visual tool for helping out cleaning large datasets - a time demanding task task which is often ignored when talking about machine learning algorithms. Using Data Wrangler it is posible to visually specify how to clean the data on a small sample of it and generate map/reduce or python scripts automatically that will run on the full dataset.
Here is a quick video preview (the full lecture will be online soon):
Wrangler Demo Video from Stanford Visualization Group on Vimeo.
Here is a link to the full paper.
By the way, my second advisor, Prof. Joe Hellerstein from Berkeley is also involved in this nice project.