Large Scale Machine Learning and Other Animals: Hyperspectral imaging using GraphLab

Tuesday, January 24, 2012

Hyperspectral imaging using GraphLab - updated

Here is what I got from Josh Ash, a research scientist who is working on some pretty cool stuff at Ohio State University. He is using GraphLab's Java API for speeding up the computation.

Hyperspectral imaging (HSI) sensors record images at hundreds of wavelengths in the electromagnetic spectrum and thus provide significantly more information about a scene than broadband or standard color (tri-band) imagery. As such, HSI is well-suited for remotely classifying materials in a scene by comparing HSI-estimated material reflectances to a spectral library of material fingerprints.

Estimating intrinsic material reflectances from HSI measurements is, however, complicated by localized illumination differences in the scene resulting from in-scene shadows, and by global distortions resulting from radiative transfer through the atmosphere. With the goal of increasing material classification performance, this work combines recent direct, diffuse, and ambient scattering predictors with statistical models imposing spatial structure in the scene. Multiple interacting Markov random fields are utilized to capture spatial correlation among materials in the scene and localized distortions, such as shadows. The inference engine, which dissects a hyperspectral data cube into a material map and a shadow map, toggles between performing loopy belief propagation to generate approximate posteriors on material classifications and performing expectation maximization updates needed in characterizing and inverting the atmospheric distortion.

Computational speed-up is achieved by performing multiple EM and BP updates in parallel on different portions of the associated factor graph. The parallelization was implemented using the GraphLab framework.

Q) Can you provide a bit more details about GraphLab implementation?
How faster is it relative to Matlab? As I recall you are using our Java API. right?

Yep, using the Java API. Each node has an update function that either performs a BP factor update, a BP variable update, or an EM update--as appropriate for that node in the graph. The update functions, various data structures for different types of nodes and edges, and the graph generation were all done in Java.

Speed-up relative to Matlab is difficult to judge because I don't have apples-to-apples implementations. Basically, at one point in the project I recoded everything from Matlab to GraphLab/Java . At that point I got 160x speedup (Matlab core2 duo 2.66Ghz vs GraphLab 8 core I7 2.93GHz); however the Matlab code was NOT optimized and could probably be improved 5-10x.

Q) Can you give some more details about problem size. How many pixels are there? How many nodes in the factor graph?

Right now I'm working with toy 100x100 = 10k pixel images. There are 30k nodes in the graph with 130k (unidirectional) edges. In the future, I will be pushing these sizes significantly as I make the model more general. These sizes could grow by a couple orders of magnitude.

Q) What is the machine you are using - which os? how many cores? how much memory?

I use two computers

1) Red Hat Enterprise Linux Workstation release 6.2
8 core I7
model name : Intel(R) Core(TM) i7 CPU 870 @ 2.93GHz
cpu MHz : 1199.000
8GB ram

2) Red Hat Enterprise Linux Server release 5.5 (Tikanga)
16 core opteron
model name : Quad-Core AMD Opteron(tm) Processor 8378
cpu MHz : 2400.141
64GB ram

In the short future, I hope to be able to get access to a 512 core shared-memory machine.

Q) What additional features would you like to have in GraphLab?

Two features that would be useful: 1) Java API for distributed (not shared-memory) environment, 2) Implement the 'sync' operation for the Java API.

Large Scale Machine Learning and Other Animals

Tuesday, January 24, 2012

Hyperspectral imaging using GraphLab - updated

No comments:

Post a Comment

Labels

GraphLab Users Google Group

pagerank

google analytics

syntax