tag:blogger.com,1999:blog-3211409948956809184.post1983973405998537656..comments2024-03-21T04:14:27.443-07:00Comments on Large Scale Machine Learning and Other Animals: Mahout/Hadoop on Amazon EC2 - part 1 - InstallationDanny Bicksonhttp://www.blogger.com/profile/01517237836051035400noreply@blogger.comBlogger37125tag:blogger.com,1999:blog-3211409948956809184.post-35131614406503111212013-01-23T12:18:27.283-08:002013-01-23T12:18:27.283-08:00This comment has been removed by the author.Anonymoushttps://www.blogger.com/profile/02337035435656698105noreply@blogger.comtag:blogger.com,1999:blog-3211409948956809184.post-78934226044557575942012-12-11T10:58:49.721-08:002012-12-11T10:58:49.721-08:00Hi,
Thank you for your reply. I will try the link...Hi,<br /><br />Thank you for your reply. I will try the link and play around with the settings,K.P.noreply@blogger.comtag:blogger.com,1999:blog-3211409948956809184.post-21176737693305459402012-12-11T10:11:54.830-08:002012-12-11T10:11:54.830-08:00Try http://www.jarvana.com/jarvana/browse/org/apac...Try http://www.jarvana.com/jarvana/browse/org/apache/mahout/mahout-distribution/0.4/<br />But please note there is a newer version of mahout now so my instructions are a bit out of date..Danny Bicksonhttps://www.blogger.com/profile/01517237836051035400noreply@blogger.comtag:blogger.com,1999:blog-3211409948956809184.post-17980793527013305782012-12-11T10:03:56.031-08:002012-12-11T10:03:56.031-08:00Hi Danny,
I'm stuck with step 6.. cant get ri...Hi Danny,<br /><br />I'm stuck with step 6.. cant get rid of the fatal errors that I've encountered. The mirror under Note b) doesnt seem to be working. <br /><br />Please advise.<br /><br />Thank youK.P.noreply@blogger.comtag:blogger.com,1999:blog-3211409948956809184.post-54693441120487212062012-12-10T04:49:14.016-08:002012-12-10T04:49:14.016-08:00You should copy and paste my command - namely used...You should copy and paste my command - namely used single qoute ' and not back qoute ` - see explanation here http://linuxreviews.org/beginner/Bash-Scripting-Introduction-HOWTO/en/x303.htmlDanny Bicksonhttps://www.blogger.com/profile/01517237836051035400noreply@blogger.comtag:blogger.com,1999:blog-3211409948956809184.post-38938127978021018442012-12-10T02:38:55.553-08:002012-12-10T02:38:55.553-08:00To generate key-based authentication that can be u...To generate key-based authentication that can be used so that ssh doesn’t require the use of a password every time it’s invoked, I typed the following<br /><br />$ ssh-keygen –t dsa –P ‘’ –f ~/.ssh/id_dsa<br /><br />It gave me this<br />Too many arguments.<br />usage: ssh-keygen [options]<br />Options:<br /> -A Generate non-existent host keys for all key types.<br /> -a trials Number of trials for screening DH-GEX moduli.<br /> -B Show bubblebabble digest of key file.<br /> -b bits Number of bits in the key to create.<br /> -C comment Provide new comment.<br /> -c Change comment in private and public key files.<br /> -D pkcs11 Download public key from pkcs11 token.<br /> -e Export OpenSSH to foreign format key file.<br /> -F hostname Find hostname in known hosts file.<br /> -f filename Filename of the key file.<br /> -G file Generate candidates for DH-GEX moduli.<br /> -g Use generic DNS resource record format.<br /> -H Hash names in known_hosts file.<br /> -h Generate host certificate instead of a user certificate.<br /> -I key_id Key identifier to include in certificate.<br /> -i Import foreign format to OpenSSH key file.<br /> -K checkpt Write checkpoints to this file.<br /> -L Print the contents of a certificate.<br /> -l Show fingerprint of key file.<br /> -M memory Amount of memory (MB) to use for generating DH-GEX moduli.<br /> -m key_fmt Conversion format for -e/-i (PEM|PKCS8|RFC4716).<br /> -N phrase Provide new passphrase.<br /> -n name,... User/host principal names to include in certificate<br /> -O option Specify a certificate option.<br /> -P phrase Provide old passphrase.<br /> -p Change passphrase of private key file.<br /> -q Quiet.<br /> -R hostname Remove host from known_hosts file.<br /> -r hostname Print DNS resource record.<br /> -S start Start point (hex) for generating DH-GEX moduli.<br /> -s ca_key Certify keys with CA key.<br /> -T file Screen candidates for DH-GEX moduli.<br /> -t type Specify type of key to create.<br /> -V from:to Specify certificate validity interval.<br /> -v Verbose.<br /> -W gen Generator to use for generating DH-GEX moduli.<br /> -y Read private key file and print public key.<br /> -z serial Specify a serial number.<br /><br /><br />What should i doAnonymousnoreply@blogger.comtag:blogger.com,1999:blog-3211409948956809184.post-56391618794363223672012-05-06T22:41:06.641-07:002012-05-06T22:41:06.641-07:00It seems you did not kill properly the previous ru...It seems you did not kill properly the previous running instance of hadoop. You should kill it first using the stop-all.sh command.Danny Bicksonhttps://www.blogger.com/profile/01517237836051035400noreply@blogger.comtag:blogger.com,1999:blog-3211409948956809184.post-88185357543335289482012-05-06T13:31:33.175-07:002012-05-06T13:31:33.175-07:00Danny,
Thanks a lot for sharing your experiences ...Danny, <br />Thanks a lot for sharing your experiences I followed these steps to setup a single instance and ran into the same issue Carlos described previously. When I run on the server:<br /><br />sudo $HADOOP_HOME/bin/start-all.sh<br /><br />I get:<br /><br />namenode running as process 14028. Stop it first.<br />localhost: Permission denied (publickey).<br />localhost: Permission denied (publickey).<br />jobtracker running as process 12087. Stop it first.<br />localhost: Permission denied (publickey).<br /><br />I'm thinking this occurs when hadoop talks to itself over localhost:port. Do you think I need to provide one of the keys for EC2?<br /><br />Note: I'm able to run this with no problems (or password):<br />ssh localhost<br /><br />Any ideas? <br /><br />Thanks!!!!Anonymoushttps://www.blogger.com/profile/12279724439797688832noreply@blogger.comtag:blogger.com,1999:blog-3211409948956809184.post-14212911236821689632011-10-04T01:58:03.050-07:002011-10-04T01:58:03.050-07:00Hi Danny, I correctly entered the configurations i...Hi Danny, I correctly entered the configurations in the configuration files in the $HADOOP_HOME/conf directory. I tried installing with hadoop-0.20.2 as well but the problem persists. Eventually i launched a public AMI which already has hadoop on it and then installed mahout and hive( i use both extensively) on it and created an image out of it and then launched cluster from that, and it worked :),<br />Thanks for the article, I am following your other tutorials as well. <br />Ehtshamehtshamhttps://www.blogger.com/profile/12379114673457018951noreply@blogger.comtag:blogger.com,1999:blog-3211409948956809184.post-71697530129443954112011-10-03T00:31:27.481-07:002011-10-03T00:31:27.481-07:00Hi!
First of all, I just noticed that the syntax h...Hi!<br />First of all, I just noticed that the syntax highligher I was using is no longer supported in the new blogger template, so some <, > where missing in the hadoop conf file. I hope this did not confuse you it is now fixed.<br />After downloading hadoop and setting the conf files (section 4) try to execute section 8 and tell me if this works.Danny Bicksonhttps://www.blogger.com/profile/01517237836051035400noreply@blogger.comtag:blogger.com,1999:blog-3211409948956809184.post-81772060177446810762011-10-02T21:14:08.253-07:002011-10-02T21:14:08.253-07:00Hi Danny, I tried with the 0.20.2 version as well ...Hi Danny, I tried with the 0.20.2 version as well by installing it on the EC2 instance and on my computer as well and the problem persits :( I would like to know that if I am following the right procedure for launching the hadoop cluster. After completing your tutorial, I launch hadoop cluster by following the steps under the heading Getting started in the original Mahout article: "mahout on ec2" (on which you have based your tutorial") thanks, Ehtshamehtshamhttps://www.blogger.com/profile/12379114673457018951noreply@blogger.comtag:blogger.com,1999:blog-3211409948956809184.post-59863805154472627472011-10-02T05:16:00.092-07:002011-10-02T05:16:00.092-07:00Hi!
It seems that you installed a more recent vers...Hi!<br />It seems that you installed a more recent version of Hadoop? My instructions where tested with hadoop-0.20.2<br />Can you retry with the version about and let me know if this works?<br /><br />Best, <br /><br />DannyDanny Bicksonhttps://www.blogger.com/profile/01517237836051035400noreply@blogger.comtag:blogger.com,1999:blog-3211409948956809184.post-68444326849589030642011-10-01T16:28:46.660-07:002011-10-01T16:28:46.660-07:00Hi Danny, great article, I followed it completely ...Hi Danny, great article, I followed it completely and successfully generated a custom ami. But When I try to launch the hadoop cluster using this, I keep getting this message over and over again: <br />hadoop-0.20.203.0/src/contrib/ec2/bin/launch-hadoop-master: line 110: [: too many arguments<br /><br />I would really appreciate your help.<br />Thanks,<br />Ehtshamehtshamhttps://www.blogger.com/profile/12379114673457018951noreply@blogger.comtag:blogger.com,1999:blog-3211409948956809184.post-9456798330376323762011-09-03T06:31:47.401-07:002011-09-03T06:31:47.401-07:00Hi danny,
Thanks for great posting on your blog f...Hi danny,<br /><br />Thanks for great posting on your blog for hadoop/mahout on Linux based Opertaing System.<br />I want to know that which is best operating system for hadoop/mahout?<br />and how to install that on Windows xp...<br /><br />-- <br />THANKS<br /><br />VIGNESH PRAJAPATI<br /><br />Answer: I advise using Linux if you consider running Hadoop on a cluster. Since Hadoop is written in Java you can potentially run it on Windows as well. One option is to install Sun's virtual box: http://www.virtualbox.org/ and have Linux run on your windows system.<br /><br />Best, <br /><br />DBDanny Bicksonhttps://www.blogger.com/profile/01517237836051035400noreply@blogger.comtag:blogger.com,1999:blog-3211409948956809184.post-3203722212424027482011-08-27T02:18:14.452-07:002011-08-27T02:18:14.452-07:00hello sir check my source.profile
# ~/.profile:...hello sir check my source.profile <br /><br /><br /># ~/.profile: executed by the command interpreter for login shells.<br /># This file is not read by bash(1), if ~/.bash_profile or ~/.bash_login<br /># exists.<br /># see /usr/share/doc/bash/examples/startup-files for examples.<br /># the files are located in the bash-doc package.<br /><br /># the default umask is set in /etc/profile; for setting the umask<br /># for ssh logins, install and configure the libpam-umask package.<br />#umask 022<br /><br /># if running bash<br /><br /> <br />if [ -n "$BASH_VERSION" ]; then<br /> # include .bashrc if it exists<br /> if [ -f "$HOME/.bashrc" ]; then<br /> . "$HOME/.bashrc"<br /> fi<br />fi<br /><br /># set PATH so it includes user's private bin if it exists<br />if [ -d "$HOME/bin" ] ; then<br /> PATH="$HOME/bin:$PATH"<br /> export JAVA_HOME=/usr/lib/jvm/java-6-openjdk/ <br /> export HADOOP_HOME=/usr/local/hadoop-0.20.2 <br /> export HADOOP_CONF_DIR=/usr/local/hadoop-0.20.2/conf <br /> export MAHOUT_HOME=/usr/local/mahout-0.4/ <br /> export MAHOUT_VERSION=0.4-SNAPSHOT <br /> export MAVEN_OPTS=-Xmx1024m <br />fi<br /><br /><br /><br /> <br /><br />and out put of echo $JAVA_HOME is blank....<br /><br />then i m set it using export command then got usr/lib/jvm/open-6-jdk/Anonymoushttps://www.blogger.com/profile/18279661984564791638noreply@blogger.comtag:blogger.com,1999:blog-3211409948956809184.post-89119230800109395162011-08-27T00:25:20.486-07:002011-08-27T00:25:20.486-07:00This comment has been removed by the author.Anonymoushttps://www.blogger.com/profile/18279661984564791638noreply@blogger.comtag:blogger.com,1999:blog-3211409948956809184.post-73845308228599202312011-08-26T10:49:19.780-07:002011-08-26T10:49:19.780-07:00try to "source .profile" and send me the...try to "source .profile" and send me the output of "echo $JAVA_HOME"Danny Bicksonhttps://www.blogger.com/profile/01517237836051035400noreply@blogger.comtag:blogger.com,1999:blog-3211409948956809184.post-69820017975458929362011-08-26T10:39:49.608-07:002011-08-26T10:39:49.608-07:00yes sir i check my both hadoop-env.sh and .profil...yes sir i check my both hadoop-env.sh and .profile file where JAVA_HOME is set as above...<br /><br />and my final error of running command <br />sagar@sagar:/usr/local/hadoop-0.20.2$ bin/start-all.sh<br />starting namenode, logging to /usr/local/hadoop-0.20.2/bin/../logs/hadoop-sagar-namenode-sagar.out<br />localhost: starting datanode, logging to /usr/local/hadoop-0.20.2/bin/../logs/hadoop-sagar-datanode-sagar.out<br />localhost: Error: JAVA_HOME is not set.<br />localhost: starting secondarynamenode, logging to /usr/local/hadoop-0.20.2/bin/../logs/hadoop-sagar-secondarynamenode-sagar.out<br />localhost: Error: JAVA_HOME is not set.<br />starting jobtracker, logging to /usr/local/hadoop-0.20.2/bin/../logs/hadoop-sagar-jobtracker-sagar.out<br />localhost: starting tasktracker, logging to /usr/local/hadoop-0.20.2/bin/../logs/hadoop-sagar-tasktracker-sagar.out<br />localhost: Error: JAVA_HOME is not set.<br /><br /><br />the output of ls -lrt /usr/lib/jvm/java-6-openjdk/<br /><br />sagar@sagar:/usr/local/hadoop-0.20.2$ ls -lrt /usr/lib/jvm/java-6-openjdk/<br />total 20<br />lrwxrwxrwx 1 root root 41 2011-08-17 22:43 docs -> ../../../share/doc/openjdk-6-jre-headless<br />drwxr-xr-x 4 root root 4096 2011-08-17 22:43 man<br />drwxr-xr-x 5 root root 4096 2011-08-18 22:39 jre<br />lrwxrwxrwx 1 root root 22 2011-08-24 18:31 THIRD_PARTY_README -> jre/THIRD_PARTY_README<br />lrwxrwxrwx 1 root root 22 2011-08-24 18:31 ASSEMBLY_EXCEPTION -> jre/ASSEMBLY_EXCEPTION<br />drwxr-xr-x 2 root root 4096 2011-08-24 18:31 bin<br />drwxr-xr-x 2 root root 4096 2011-08-24 18:31 lib<br />drwxr-xr-x 3 root root 4096 2011-08-24 18:31 include<br /><br /><br />and output of java-version<br /><br />sagar@sagar:~$ java -version<br />java version "1.6.0_22"<br />OpenJDK Runtime Environment (IcedTea6 1.10.2) (6b22-1.10.2-0ubuntu1~11.04.1)<br />OpenJDK Server VM (build 20.0-b11, mixed mode)<br /><br /><br />now tell me wht can i do..???<br /><br /><br />thnk u..Anonymoushttps://www.blogger.com/profile/18279661984564791638noreply@blogger.comtag:blogger.com,1999:blog-3211409948956809184.post-48660046076841047172011-08-25T11:08:34.890-07:002011-08-25T11:08:34.890-07:00Hi
please send me the full error. check your hadoo...Hi<br />please send me the full error. check your hadoop-env.sh to verify it points to the correct java. check your .profile settings as well. send me also the output of "ls -lrt /usr/lib/jvm/java-6-openjdk/ and also of 'java --version'<br /><br />- DannyDanny Bicksonhttps://www.blogger.com/profile/01517237836051035400noreply@blogger.comtag:blogger.com,1999:blog-3211409948956809184.post-46640771903060319592011-08-25T05:54:20.446-07:002011-08-25T05:54:20.446-07:00hello ..i got error on when i star all node using ...hello ..i got error on when i star all node using start-all.sh command ...and error is JAVA_HOME is not set . then i set through terminal export JAVA_HOME =/usr/lib/jvm/java-6-openjdk/ <br /><br />so plz guide me ...<br /><br />thnx <br /><br />SagarAnonymoushttps://www.blogger.com/profile/18279661984564791638noreply@blogger.comtag:blogger.com,1999:blog-3211409948956809184.post-84028034429950449972011-08-11T13:34:56.700-07:002011-08-11T13:34:56.700-07:00Thanks for the reply! Works fine.Thanks for the reply! Works fine.thechipshttps://www.blogger.com/profile/10770915433654563334noreply@blogger.comtag:blogger.com,1999:blog-3211409948956809184.post-24931234313704436052011-08-11T10:42:34.880-07:002011-08-11T10:42:34.880-07:00Hi Corey,
you should use one of the available edit...Hi Corey,<br />you should use one of the available editors like vi or nano. <br />For exmaple: nano $HADOOP_HOME/conf/core-site.xml <br />then edit the file and save.Danny Bicksonhttps://www.blogger.com/profile/01517237836051035400noreply@blogger.comtag:blogger.com,1999:blog-3211409948956809184.post-14813456953101130232011-08-10T11:28:20.657-07:002011-08-10T11:28:20.657-07:00Hi Danny,
I am new to the ubuntu terminal.
When...Hi Danny,<br /><br />I am new to the ubuntu terminal. <br /><br />When you say "add the following to $HADOOP_HOME/conf/hadoop-env.sh", how exactly do you go about this?<br /><br />Same question for:<br />"add the following to $HADOOP_HOME/conf/core-site.xml and also $HADOOP_HOME/conf/mapred-site.xml"<br /><br />Thanks!<br /><br />Coreythechipshttps://www.blogger.com/profile/10770915433654563334noreply@blogger.comtag:blogger.com,1999:blog-3211409948956809184.post-55791012162078800842011-06-15T05:06:13.532-07:002011-06-15T05:06:13.532-07:00Hi Carlos.
This looks like a ssh problem. My step ...Hi Carlos.<br />This looks like a ssh problem. My step 6 instructs how to start Hadoop on a single machine. It seems that you are trying to start it on multiple machines. See my post: http://bickson.blogspot.com/2011/02/mahout-on-amazon-ec2-part-4-running-on.html Try to manually ssh from the machine you run the start_dfs.sh script into the namenode and jobtracker machines. If you fail to do it, repeat all step involving key creation and propagation.<br /><br />Good luck! DBDanny Bicksonhttps://www.blogger.com/profile/01517237836051035400noreply@blogger.comtag:blogger.com,1999:blog-3211409948956809184.post-48055898287251778282011-06-14T20:29:45.948-07:002011-06-14T20:29:45.948-07:00Hi Danny:
Thanks for the detailed instructions. I ...Hi Danny:<br />Thanks for the detailed instructions. I got stuck in step 6. I think it has to do with the security keys. I get the following error:<br /><br />starting namenode, logging to /usr/local/hadoop-0.20.2/bin/../logs/hadoop-root-namenode-ip-10-2-34-166.out<br />localhost: Permission denied (publickey).<br />localhost: Permission denied (publickey).<br />starting jobtracker, logging to /usr/local/hadoop-0.20.2/bin/../logs/hadoop-root-jobtracker-ip-10-2-34-166.out<br />localhost: Permission denied (publickey).<br /><br />Any suggestions?<br /><br />Thanks in advance<br /><br />\CarlosCarloshttps://www.blogger.com/profile/10742244800522531034noreply@blogger.com