1) log into the cloud login node
ssh -L 8888:proxy.opencloud:8888 login.cloud.pdl.cmu.local.
2) copy mahout directory tree into your home folder.
3) Run Mahout example
cd mahout-0.4/
export JAVA_HOME=/usr/lib/jvm/java-6-sun/
./examples/bin/build-reuters.sh
You should see:
sh -x ./examples/bin/build-reuters.sh
11/02/01 15:13:27 INFO driver.MahoutDriver: Program took 225915 ms
+ ./bin/mahout seqdirectory -i ./examples/bin/work/reuters-out/ -o ./examples/bin/work/reuters-out-seqdir -c UTF-8 -chunk 5
Running on hadoop, using HADOOP_HOME=/usr/local/sw/hadoop
HADOOP_CONF_DIR=/etc/hadoop/conf/global
11/02/01 15:13:38 INFO driver.MahoutDriver: Program took 10087 ms
+ ./bin/mahout seq2sparse -i ./examples/bin/work/reuters-out-seqdir/ -o ./examples/bin/work/reuters-out-seqdir-sparse
Running on hadoop, using HADOOP_HOME=/usr/local/sw/hadoop
HADOOP_CONF_DIR=/etc/hadoop/conf/global
11/02/01 15:13:40 INFO vectorizer.SparseVectorsFromSequenceFiles: Maximum n-gram size is: 1
11/02/01 15:13:40 INFO vectorizer.SparseVectorsFromSequenceFiles: Minimum LLR value: 1.0
11/02/01 15:13:40 INFO vectorizer.SparseVectorsFromSequenceFiles: Number of reduce tasks: 1
11/02/01 15:13:41 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
11/02/01 15:13:42 INFO input.FileInputFormat: Total input paths to process : 3
11/02/01 15:13:47 INFO mapred.JobClient: Running job: job_201101170028_1733
11/02/01 15:13:48 INFO mapred.JobClient: map 0% reduce 0%
11/02/01 15:17:49 INFO mapred.JobClient: map 33% reduce 0%
11/02/01 15:17:55 INFO mapred.JobClient: map 66% reduce 0%
11/02/01 15:18:01 INFO mapred.JobClient: map 100% reduce 0%
11/02/01 15:18:08 INFO mapred.JobClient: Job complete: job_201101170028_1733
11/02/01 15:18:08 INFO mapred.JobClient: Counters: 6
11/02/01 15:18:08 INFO mapred.JobClient: Job Counters
11/02/01 15:18:08 INFO mapred.JobClient: Rack-local map tasks=5
11/02/01 15:18:08 INFO mapred.JobClient: Launched map tasks=5
11/02/01 15:18:08 INFO mapred.JobClient: FileSystemCounters
11/02/01 15:18:08 INFO mapred.JobClient: HDFS_BYTES_READ=13537042
11/02/01 15:18:08 INFO mapred.JobClient: HDFS_BYTES_WRITTEN=11047110
11/02/01 15:18:08 INFO mapred.JobClient: Map-Reduce Framework
11/02/01 15:18:08 INFO mapred.JobClient: Map input records=16115
11/02/01 15:18:08 INFO mapred.JobClient: Spilled Records=0
11/02/01 15:18:08 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
11/02/01 15:18:09 INFO input.FileInputFormat: Total input paths to process : 3
11/02/01 15:18:15 INFO mapred.JobClient: Running job: job_201101170028_1736
11/02/01 15:18:16 INFO mapred.JobClient: map 0% reduce 0%
...
No comments:
Post a Comment