2013-10-24 97 views
0

我是新来的象夫和我有这样的代码:
亨利马乌:k均值聚类

public class mahout { 

public static final double[][] points = { {1, 1}, {2, 1}, {1, 2},{2, 2}, {3, 3}, {8, 8}, {9, 8}, {8, 9}, {9, 9}}; 

public static List<Vector> getPoints(double[][] raw) { 
List<Vector> points = new ArrayList<Vector>(); 
for (int i = 0; i < raw.length; i++) { 
double[] fr = raw[i]; 
    Vector vec = new RandomAccessSparseVector(fr.length); 
vec.assign(fr); 
points.add(vec); 
} 

return points; 

} 

public static void main(String args[]) throws Exception { 

int k = 2; 

List<Vector> vectors = getPoints(points); 

File testData = new File("testdata"); 
if (!testData.exists()) { 
    testData.mkdir(); 
} 
testData = new File("testdata/points"); 
if (!testData.exists()) { 
    testData.mkdir(); 
} 

Configuration conf = new Configuration(); 
FileSystem fs = FileSystem.get(conf); 
ClusterHelper.writePointsToFile(vectors, conf, new Path("testdata/points/file1")); 

Path path = new Path("testdata/clusters/part-00000"); 
SequenceFile.Writer writer = new SequenceFile.Writer(fs, conf, 
    path, Text.class, Kluster.class); 

for (int i = 0; i < k; i++) { 
    Vector vec = vectors.get(i); 
    Kluster cluster = new Kluster(vec, i, new EuclideanDistanceMeasure()); 
    writer.append(new Text(cluster.getIdentifier()), cluster); 
} 
writer.close(); 

Path output = new Path("output"); 
HadoopUtil.delete(conf, output); 

KMeansDriver.run(conf, new Path("testdata/points"), new Path("testdata/clusters"), 
    output, new EuclideanDistanceMeasure(), 0.001, 10, 
    true, 0.0,false); 

SequenceFile.Reader reader = new SequenceFile.Reader(fs, 
    new Path("output/" + Kluster.CLUSTERED_POINTS_DIR 
      + "/part-m-00000"), conf); 

IntWritable key = new IntWritable(); 
WeightedVectorWritable value = new WeightedVectorWritable(); 
while (reader.next(key, value)) { 
    System.out.println(value.toString() + " belongs to cluster " 
        + key.toString()); 
} 
reader.close(); 
} 
} 

但是当我运行的代码我有这些错误:

24-ott-2013 9.50.25 org.apache.hadoop.util.NativeCodeLoader <clinit> 
AVVERTENZA: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 
24-ott-2013 9.50.25 org.slf4j.impl.JCLLoggerAdapter info 
INFO: Deleting output 
24-ott-2013 9.50.25 org.slf4j.impl.JCLLoggerAdapter info 
INFO: Input: testdata/points Clusters In: testdata/clusters Out: output Distance: org.apache.mahout.common.distance.EuclideanDistanceMeasure 
24-ott-2013 9.50.25 org.slf4j.impl.JCLLoggerAdapter info 
INFO: convergence: 0.0010 max Iterations: 10 
24-ott-2013 9.50.25 org.apache.hadoop.security.UserGroupInformation doAs 
GRAVE: PriviledgedActionException as:hp cause:java.io.IOException: Failed to set permissions of path: \tmp\hadoop-hp\mapred\staging\hp1776229724\.staging to 0700 
Exception in thread "main" java.io.IOException: Failed to set permissions of path: \tmp\hadoop-hp\mapred\staging\hp1776229724\.staging to 0700 
    at org.apache.hadoop.fs.FileUtil.checkReturnValue(FileUtil.java:689) 
    at org.apache.hadoop.fs.FileUtil.setPermission(FileUtil.java:662) 
    at org.apache.hadoop.fs.RawLocalFileSystem.setPermission(RawLocalFileSystem.java:509) 
    at org.apache.hadoop.fs.RawLocalFileSystem.mkdirs(RawLocalFileSystem.java:344) 
    at org.apache.hadoop.fs.FilterFileSystem.mkdirs(FilterFileSystem.java:189) 
    at org.apache.hadoop.mapreduce.JobSubmissionFiles.getStagingDir(JobSubmissionFiles.java:116) 
    at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:918) 
    at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:912) 
    at java.security.AccessController.doPrivileged(Native Method) 
    at javax.security.auth.Subject.doAs(Unknown Source) 
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1149) 
    at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:912) 
    at org.apache.hadoop.mapreduce.Job.submit(Job.java:500) 
    at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:530) 
    at org.apache.mahout.clustering.iterator.ClusterIterator.iterateMR(ClusterIterator.java:182) 
    at org.apache.mahout.clustering.kmeans.KMeansDriver.buildClusters(KMeansDriver.java:223) 
    at org.apache.mahout.clustering.kmeans.KMeansDriver.run(KMeansDriver.java:143) 
    at mahout.main(mahout.java:69) 

问题出在哪里我该如何解决它?

回答

-1

它出现的问题是

Failed to set permissions of path: \tmp\hadoop-hp\mapred\staging\hp1776229724.staging to 0700 

检查运行代码的用户对在堆栈跟踪mentionned目录足够的权限。

而且跟踪

Unable to load native-hadoop library for your platform... 

真的让我涉及的事实,没有什么可以很好地运行^^

+0

它取决于Hadoop版本。它几乎肯定不能在windows上运行(如果我没有记错的话,它是1.1),这不依赖于本地库。 –

+0

而我该如何解决这个问题? – user2837896

+0

@ user2837896检查运行代码的用户是否对堆栈跟踪中提到的目录拥有足够的权限。 – Julien

0

在Windows上运行Hadoop的时候这是一个问题。

你可以看到对于此问题的一些JIRA问题:

https://issues.apache.org/jira/browse/HADOOP-7682

https://issues.apache.org/jira/browse/HADOOP-8089

唯一的解决方法是要么使用补丁的Hadoop这个补丁:

https://github.com/congainc/patch-hadoop_7682-1.0.x-win

或者升级到本地运行在Windows上的Hadoop 2.2。

+0

我从这里下载了hadoop 2.2:http://mirror.nohup.it/apache/hadoop/ common/hadoop-2.2.0/ 但是我不能在eclipse项目 – user2837896

+0

中导入你需要下载的二进制文件,而不是src。 –

+0

我下载了“hadoop-2.2.0.tar.gz”..然后我导入了名为hadoop-common-2.2.0.jar的jar。但现在我有java.lang.NoClassDefFoundError:org/apache/hadoop/util/PlatformName错误 – user2837896