I got the following instructions, from my colleague and friend Aapo Kyrola:
1. INSTALL HADOOP: Must be version 0.20.203 or later.
- This is simple, just download and extract.
2. Set HADOOP_HOME variable to point to the hadoop directory.
3. Set Hadoop configuration (under HADOOP_HOME/config) according
to what is explained here.
* NOTE: set the hdfs directory appropriately: core-site.xml, property
3.5 Start Hadoop:
bin/start-all.sh
4. Install zookeeper
- just download and extract
5. Configure conf/zoo.cfg properly. (Just copy the sample config and change to sensible parameters).
- set clientPort=22181
6. Start up zookeeper:
bin/zkServer.sh start
7. Install and build Giraph as explained in the end of this website:
http://incubator.apache.org/giraph/
8. In HADOOP_HOME, run PageRank:
bin/hadoop jar ../../GraphLab/giraph/giraph/trunk/target/giraph-0.70-jar-with-dependencies.jar org.apache.giraph.benchmark.PageRankBenchmark -e 100 -s 5 -V 10000 -w 1 -v
If everything went OK you will get:
11/09/19 18:23:20 INFO mapred.JobClient: Giraph Timers 11/09/19 18:23:20 INFO mapred.JobClient: Total (milliseconds)=260128 11/09/19 18:23:20 INFO mapred.JobClient: Superstep 3 (milliseconds)=54578 11/09/19 18:23:20 INFO mapred.JobClient: Setup (milliseconds)=2771 11/09/19 18:23:20 INFO mapred.JobClient: Shutdown (milliseconds)=92 11/09/19 18:23:20 INFO mapred.JobClient: Vertex input superstep (milliseconds)=2386 11/09/19 18:23:20 INFO mapred.JobClient: Superstep 0 (milliseconds)=8059 11/09/19 18:23:20 INFO mapred.JobClient: Superstep 4 (milliseconds)=70263 11/09/19 18:23:20 INFO mapred.JobClient: Superstep 5 (milliseconds)=1879 11/09/19 18:23:20 INFO mapred.JobClient: Superstep 2 (milliseconds)=66531 11/09/19 18:23:20 INFO mapred.JobClient: Superstep 1 (milliseconds)=53564 11/09/19 18:23:20 INFO mapred.JobClient: Giraph Stats 11/09/19 18:23:20 INFO mapred.JobClient: Aggregate edges=1000000 11/09/19 18:23:20 INFO mapred.JobClient: Superstep=6 11/09/19 18:23:20 INFO mapred.JobClient: Current workers=1 11/09/19 18:23:20 INFO mapred.JobClient: Current master task partition=0 11/09/19 18:23:20 INFO mapred.JobClient: Sent messages=0 11/09/19 18:23:20 INFO mapred.JobClient: Aggregate finished vertices=10000 11/09/19 18:23:20 INFO mapred.JobClient: Aggregate vertices=10000
Anyway Aapo has a great Nordic sense of humor. This is what he sent me later:
For your convenience, I have pasted the documentation of Giraph to this email.
-- Begin --
-- End --
Additionally, a quick start document is available here:
https://github.com/aching/Giraph/wiki/Quick-Start-Guide
I have question: How to set the clientPort=22181?
ReplyDeleteThank you so much!
HAMA is another open source implementation of BSP.
ReplyDeleteThanks for you note. i did not try it out but I heard it is not stable yet.
DeleteThis is great -- are there any examples in Python? I understand that there is some way to interface Giraph through jython, but I do not know how.
ReplyDelete