2014-03-04 53 views
0

我是HDInsight的新手。我想学习和练习机器学习,HDInsight就是我想要的,但似乎没有直接的API来mahout。由于象夫建议将转化为实质上mapredure的工作,所以我也跟着在Windows Azure上的文档一些MapReduce的例子,写了下面的代码:如何使用HDInsight .NET SDK提交mahout推荐作业

// Define the MapReduce job 
MapReduceJobCreateParameters mrJobDefinition = new MapReduceJobCreateParameters() 
{ 
    JarFile = "wasb:///example/jars/mahout-core-0.9-job.jar", 
    ClassName = "org.apache.mahout.cf.taste.hadoop.item.RecommenderJob", 
}; 

mrJobDefinition.Arguments.Add(" -s SIMILARITY_COOCCURRENCE"); 
mrJobDefinition.Arguments.Add(" --input=/reply"); 
mrJobDefinition.Arguments.Add(" --output=/recommend/"); 
mrJobDefinition.Arguments.Add(" --usersFile=/data/users.txt"); 

我已经上传了“象夫核-0.9-job.jar”到指定的Azure blob存储容器中的/ example/jar。

但我接收到以下错误消息:

14/04/03 12时04分28秒ERROR security.UserGroupInformationPriviledgedActionException为:约翰尼原因:java.io.IOException的 :读取异常file:/ c:/ apps/temp/hdfs/mapred/local/taskTracker/johnny/jobcache/job_201404031203_0001/jobToken = java.security.PrivilegedActionException:java.io.IOException:异常读取文件:/ c :/应用/温度/ HDFS/mapred /本地/的TaskTracker /约翰尼/ jobcache/J obj201404031203_0001/jobToken = at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation .java:1233) at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:951) at org.apache.hadoop.mapreduce.Job.submit(Job.java:550) at org.apache .hadoop.mapreduce.Job.waitForCompletion(Job.java:580) at org.apache.mahout.cf.taste.hadoop.preparation.PreparePreferenceMatrixJob.run(PreparePreferenceMatrixJob.java:77) at org.apache.hadoop.util .ToolRunner.run(ToolRunner.java:65) at org.apache.mahout.cf.taste.hadoop.item.RecommenderJob.run(RecommenderJob.java:164) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65) at org.apache.mahout.cf.taste.hadoop.item.RecommenderJob.main(RecommenderJob.java:322) at sun。 reflect.NativeMethodAccessorImpl.invoke0(本机方法) 在sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) 在sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) 在java.lang.reflect.Method中.invoke(Method.java:601) at org.apache.hadoop.util.RunJar.main(RunJar.java:160) 引起:java.io.IOException:读取文件异常:/ c:/ apps/temp/hdfs/mapred/local/taskTracker/johnny/jobcache/job_201404031203_0001/jobToken = at org.apache.hadoop.security.Credentials.readTokenStorageFile(Credentials。 java:136) at org.apache.hadoop.mapred.JobClient.readTokensFromFiles(JobClient.java:2149) at org.apache.hadoop.mapred.JobClient.populateTokenCache(JobClient.java:2185) at org.apache。 hadoop.mapred.JobClient.access $ 300(JobClient.java:179) at org.apache.hadoop.mapred.JobClient $ 2.run(JobClient.java:964) at org.apache.hadoop.mapred.JobClient $ 2.run (JobClient.java:951) ...... 16多个 造成的:java.io.FileNotFoundException:文件文件:/ C:/应用/温度/ HDFS/mapred /本地/的TaskTracker /约翰尼/ jobcache/job_201404031203_0001/jobToken =不存在。 在org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:427) 在org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:254) 在org.apache.hadoop.fs。 ChecksumFileSystem $ ChecksumFSInputChecker(ChecksumFileSystem。java:125) at org.apache.hadoop.fs.ChecksumFileSystem.open(ChecksumFileSystem.java:283) at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:436) at org.apache。 hadoop.security.Credentials.readTokenStorageFile(Credentials.java:130) ... 21更多 线程“main”异常java.io.IOException:读取文件异常:/ c:/ apps/temp/hdfs/mapred/local/taskTracker/johnny/jobcache/job_201404031203_0001/jobToken = at org.apache.hadoop.security.Credentials.readTokenStorageFile(Credentials.java:136) at org.apache.hadoop.mapred.JobClient.readTokensFromFiles(JobClient.java:2149 ) at org.apache.hadoop.mapred.JobClient.populateTokenCache(JobClient.java:2185) at org.apache.hadoop.mapred.JobClient.access $ 300(JobCl ient.java:179) at org.apache.hadoop.mapred.JobClient $ 2.run(JobClient.java:964) at org.apache.hadoop.mapred.JobClient $ 2.run(JobClient.java:951) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1233) at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:951) at org.apache.hadoop.mapreduce.Job.submit(Job.java:550) at org.apache.hadoop.mapreduce.Job .waitForCompletion(Job.java:580) at org.apache.mahout.cf.taste.hadoop.preparation.PreparePreferenceMatrixJob.run(PreparePreferenceMatrixJob.java:77) at org.a pache.hadoop.util.ToolRunner.run(ToolRunner.java:65) at org.apache.mahout.cf.taste.hadoop.item.RecommenderJob.run(RecommenderJob.java:164) at org.apache.hadoop。 util.ToolRunner.run(ToolRunner.java:65) at org.apache.mahout.cf.taste.hadoop.item.RecommenderJob.main(RecommenderJob.java:322) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method ) 在sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) 在sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) 在java.lang.reflect.Method.invoke(Method.java:601 ) at org.apache.hadoop.util.RunJar.main(RunJar.java:160) 引起者:java.io.FileNotFoundException:Fi le文件:/ c:/ apps/temp/hdfs/mapred/local/taskTracker/johnny/jobcache/job_201404031203_0001/jobToken =不存在。 在org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:427) 在org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:254) 在org.apache.hadoop.fs。 ChecksumFileSystem $ ChecksumFSInputChecker。(ChecksumFileSystem.java:125) at org.apache.hadoop.fs.ChecksumFileSystem.open(ChecksumFileSystem.java:283) at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:436 ) 在org.apache.hadoop.security.Credentials.readTokenStorageFile(Credentials.java:130) ...... 21多个 关闭观察家/永葆线程池有力 邓普顿:作业失败,退出代码1

当我在互联网上搜索后,似乎应该对mapred-site.xml或其他hadoop配置文件进行一些更改。但是我对Apache hadoop完全陌生,并且对Linux和Java没有太多的了解。

任何帮助或方向将不胜感激。

回答

0

使用最新的Hadoop .NET Framework(http://hadoopsdk.codeplex.com/),我可以成功地使用相同的代码提交mahout作业。看来这个问题已经被SDK解决了。