Set the number of replicas as the number of nodes you plan to use. In this example, 4.
hadoop.tmp.dir /mnt/tmp/ dfs.data.dir /mnt/tmp2/ dfs.name.dir /mnt/tmp3/ dfs.replication 4 Default block replication. The actual number of replications can be specified when the file is created. The default is used if replication is not specified in create time.
2) Edit the file conf/slaves and list the DNS names of all of the machines you are going to use. For example:
ec2-67-202-45-10.compute-1.amazonaws.com
ec2-67-202-45-11.compute-1.amazonaws.com
ec2-67-202-45-12.compute-1.amazonaws.com
ec2-67-202-45-13.compute-1.amazonaws.com
3) Edit the file conf/master and enter the DNS name of the master node. For example
ec2-67-202-45-10.compute-1.amazonaws.com
Note that the master node can appear also in the salves list.
5) Edit the file conf/mapred-site.xmlfs.default.name hdfs://ec2-67-202-45-10.compute-1.amazonaws.com:9000 mapred.job.tracker ec2-67-202-45-10.compute-1.amazonaws.com:9001 hadoop.tmp.dir /mnt/tmp/
<?xml version="1.0"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> <!-- Put site-specific property overrides in this file. --> <configuration> <property> <name>fs.default.name</name> <value>hdfs://ec2-67-202-45-10.compute-1.amazonaws.com:9000</value> </property> <property> <name>mapred.job.tracker</name> <value>ec2-67-202-45-10.compute-1.amazonaws.com:9001</value> </property> <property> <name>hadoop.tmp.dir</name> <value>/mnt/tmp/</value> </property> <property> <name>mapred.map.tasks</name> <value>10</value> <!-- about the number of cores> </property> <property> <name>mapred.reduce.tasks</name> <value>10</value> <!-- about the number of cores> </property> <property> <name>mapred.tasktracker.map.tasks.maximum</name> <value>12</value> <!-- slightly more than cores> </property> <property> <name>mapred.tasktracker.reduce.tasks.maximum</name> <value>12</value> <!-- slightly more than cores> </property> </configuration>6) Login into the master node. For each of the 3 slaves machines, copy the DSA key from the master node:
sh-copy-id -i ~/.ssh/id_dsa.pub ec2-67-202-45-11.compute-1.amazonaws.com ssh-copy-id -i ~/.ssh/id_dsa.pub ec2-67-202-45-12.compute-1.amazonaws.com ssh-copy-id -i ~/.ssh/id_dsa.pub ec2-67-202-45-13.compute-1.amazonaws.com
7) To start Hadoop. On the master machine
/usr/local/hadoop-0.20.2/bin/hadoop namenode -format /usr/local/hadoop-0.20.2/bin/start-dfs.sh /usr/local/hadoop-0.20.2/bin/start-mapred.sh
8) To stop Hadoop
/usr/local/hadoop-0.20.2/bin/stop-mapred.sh /usr/local/hadoop-0.20.2/bin/stop-dfs.sh
No comments:
Post a Comment