`

mahout使用

 
阅读更多

转自:http://hi.baidu.com/pakko/blog/item/3516fd6e34032bce80cb4afb.html

运行kmeans的简单的例子:

1:将样本数据集放到hdfs中指定文件下,应该在testdata文件夹下
$HADOOP_HOME/bin/hadoop fs -put <PATH TO DATA> testdata
例如:
bin/hadoop fs   -put /home/hadoopuser/mahout-0.3/test/synthetic_control.data  /user/hadoopuser/testdata/

2:使用kmeans算法
$HADOOP_HOME/bin/hadoop jar $MAHOUT_HOME/examples/target/mahout-examples-$MAHOUT_VERSION.job org.apache.mahout.clustering.syntheticcontrol.kmeans.Job
例如:
bin/hadoop jar /home/hadoopuser/mahout-0.3/mahout-examples-0.1.job org.apache.mahout.clustering.syntheticcontrol.kmeans.Job

3:使用canopy算法
$HADOOP_HOME/bin/hadoop jar $MAHOUT_HOME/examples/target/mahout-examples-$MAHOUT_VERSION.job org.apache.mahout.clustering.syntheticcontrol.canopy.Job
例如:
bin/hadoop jar /home/hadoopuser/mahout-0.3/mahout-examples-0.1.job org.apache.mahout.clustering.syntheticcontrol.canopy.Job

4:使用dirichlet 算法
$HADOOP_HOME/bin/hadoop jar $MAHOUT_HOME/examples/target/mahout-examples-$MAHOUT_VERSION.job org.apache.mahout.clustering.syntheticcontrol.dirichlet.Job

5:使用meanshift算法
meanshift : $HADOOP_HOME/bin/hadoop jar $MAHOUT_HOME/examples/target/mahout-examples-$MAHOUT_VERSION.job org.apache.mahout.clustering.syntheticcontrol.meanshift.Job

6:查看一下结果吧
bin/mahout vectordump --seqFile /user/hadoopuser/output/data/part-00000
这个直接把结果显示在控制台上。

Get the data out of HDFS  and have a look 
All example jobs use testdata as input and output to directory output
Use bin/hadoop fs -lsr output to view all outputs
Output:
KMeans is placed into output/points
Canopy and MeanShift results are placed into output/clustered-points

分享到:
评论
1 楼 a420144030 2013-05-09  
你好,我想计算n篇文章的相似度,用mahout能处理吗,如何做能给个例子吗?谢谢

相关推荐

Global site tag (gtag.js) - Google Analytics