Large Scale Machine Learning and Other Animals: Bond Percolation in GraphLab

Sunday, May 12, 2013

Bond Percolation in GraphLab

I was asked by Prof. Scott Kirkpatrick to help and implement bond percolation in GraphLab. It is an oldie but goldie problem which is closely related to the connected components problem.

Here is an explanation about bond percolation from Wikipedia:
A representative question (and the source of the name) is as follows. Assume that some liquid is poured on top of some porous material. Will the liquid be able to make its way from hole to hole and reach the bottom? This physical question is modelled mathematically as a three-dimensional network of n × n × n vertices, usually called "sites", in which the edgeor "bonds" between each two neighbors may be open (allowing the liquid through) with probability p, or closed with probability 1 – p, and they are assumed to be independent. Therefore, for a given p, what is the probability that an open path exists from the top to the bottom? The behavior for large n is of primary interest. This problem, called now bond percolation, was introduced in the mathematics literature by Broadbent & Hammersley (1957), and has been studied intensively by mathematicians and physicists since.

2-d bond percolation problem. Will the water find their way

along the edges of this maze from top to bottom?

The algorithm for finding the connected edges is very simple:

In parallel

- for each edge, record the minimum id of the connected edges

- if there are no more changes in the network, break

This algorithm is frequently used in social networks to find groups of friends. And it is of course distributed and scales to very large problems. The output is the min component found for each edge. From this output we can find the edge clusters.

I have utilized the newest experimental version of GraphLab (v. 2.2) for quickly implementing the above algorithm. The resulting code is surprisingly simple. The main loop of the program:

//create a GraphLab engine
engine_type engine(dc, graph, clopts);
//register the map and combine operations that will be used in the update function

engine.register_map_reduce(BOND_PERCOLATION_MAP_REDUCE, bond_percolation_map, bond_percolation_combine);

  //in a loop
  for (int i=0; i< max_iter; i++){
     //perform update function on each node

     engine.parfor_all_local_vertices(bond_percolation_function);
     //wait until all nodes are done

     engine.wait();
     //count the number of unsatisfied links

     size_t diff = graph.map_reduce_edges<size_t>(count_component);

     //if no more links to explore we are done

     if (diff == 0)

       break;

}

bond_perculation_function which is executed in parallel across all node. In this function, we traverse all connected edges, and compare their minimum edges id using the bond_perculation_combine. As it is easy to verify, each of those implemented function has a single line of code.

//return the min component id found across this edges and its two connecting nodes

unsigned int bond_percolation_map(const graph_type::vertex_type& center,

                         graph_type::edge_type& edge,

                         const graph_type::vertex_type& other) {

   return edge.data().comp_id =  std::min(std::min(center.data().comp_id, edge.data().id), other.data().comp_id);

}

//find min component of two connected edges

void bond_percolation_combine(unsigned int& v1, const unsigned int& v2) {

    v1 = std::min(v1, v2);

}

//the main update function, go over all nodes and selects their min edge id

void bond_percolation_function(engine_type::context_type& context,

                  graph_type::vertex_type& vertex) {

     vertex.data().comp_id =  context.map_reduce<unsigned int>(BOND_PERCOLATION_MAP_REDUCE, graphlab::ALL_EDGES);

}

//count the number of components that are still not satisfied. When the number of components are zero, we are done

size_t count_components(const graph_type::edge_type & edge) {

  return = (edge.source().data().comp_id != edge.target().data().comp_id);

}

The full code is here. It is now part of the graph_analytics toolkit in v2.2. Needless to say it is working.. :-)

Large Scale Machine Learning and Other Animals

Sunday, May 12, 2013

Bond Percolation in GraphLab

No comments:

Post a Comment

Labels

GraphLab Users Google Group

pagerank

google analytics

syntax