## Sunday, May 12, 2013

### Bond Percolation in GraphLab

I was asked by Prof. Scott Kirkpatrick to help and implement bond percolation in GraphLab. It is an oldie but goldie problem which is closely related to the connected components problem.

Here is an explanation about bond percolation from Wikipedia:
A representative question (and the source of the name) is as follows. Assume that some liquid is poured on top of some porous material. Will the liquid be able to make its way from hole to hole and reach the bottom? This physical question is modelled mathematically as a three-dimensional network of n × n × n vertices, usually called "sites", in which the edgeor "bonds" between each two neighbors may be open (allowing the liquid through) with probability p, or closed with probability 1 – p, and they are assumed to be independent. Therefore, for a given p, what is the probability that an open path exists from the top to the bottom? The behavior for large n is of primary interest. This problem, called now bond percolation, was introduced in the mathematics literature by Broadbent & Hammersley (1957), and has been studied intensively by mathematicians and physicists since.

2-d bond percolation problem. Will the water find their way
along the edges of this maze from top to bottom?

The algorithm for finding the connected edges is very simple:
In parallel
- for each edge, record the minimum id of the connected edges
- if there are no more changes in the network, break

This algorithm is frequently used in social networks to find groups of friends. And it is of course distributed and scales to very large problems.  The output is the min component found for each edge. From this output we can find the edge clusters.

I have utilized the newest experimental version of GraphLab (v. 2.2) for quickly implementing the above algorithm. The resulting code is surprisingly simple. The main loop of the program:
 //create a GraphLab engine engine_type engine(dc, graph, clopts); //register the map and combine operations that will be used in the update function engine.register_map_reduce(BOND_PERCOLATION_MAP_REDUCE, bond_percolation_map, bond_percolation_combine); //in a loop   for (int i=0; i< max_iter; i++){ //perform update function on each node engine.parfor_all_local_vertices(bond_percolation_function); //wait until all nodes are done engine.wait(); //count the number of unsatisfied links size_t diff = graph.map_reduce_edges(count_component); //if no more links to explore we are done if (diff == 0) break; }

bond_perculation_function which is executed in parallel across all node. In this function, we traverse all connected edges, and compare their minimum edges id using the  bond_perculation_combine. As it is easy to verify, each of those implemented function has a single line of code.
 //return the min component id found across this edges and its two connecting nodes unsigned int bond_percolation_map(const graph_type::vertex_type& center, graph_type::edge_type& edge, const graph_type::vertex_type& other) { return edge.data().comp_id =  std::min(std::min(center.data().comp_id, edge.data().id), other.data().comp_id); } //find min component of two connected edges void bond_percolation_combine(unsigned int& v1, const unsigned int& v2) { v1 = std::min(v1, v2); } //the main update function, go over all nodes and selects their min edge id void bond_percolation_function(engine_type::context_type& context, graph_type::vertex_type& vertex) { vertex.data().comp_id =  context.map_reduce(BOND_PERCOLATION_MAP_REDUCE, graphlab::ALL_EDGES); }

 //count the number of components that are still not satisfied. When the number of components are zero, we are done
 size_t count_components(const graph_type::edge_type & edge) { return = (edge.source().data().comp_id != edge.target().data().comp_id); }

The full code is here. It is now part of the graph_analytics toolkit in v2.2. Needless to say it is working.. :-)