2015-10-01 87 views
1

我在使用kite sdk导入json时遇到了问题。Kite SDK:Json导入时出错

使用的命令产生的Avro模式:

./kite-dataset json-schema /vagrant/satyam/kite/restaurant-sample.json -o sample.avsc --record-name HGW 

然后通过命令来创建HDFS文件系统数据集:

./kite-dataset create dataset:hdfs:/user/falcon/datasets/hgw --schema sample.avsc 

要导入JSON文件,我运行以下命令:

./kite-dataset -v json-import /vagrant/satyam/kite/restaurant-sample.json dataset:hdfs:/user/falcon/datasets/hgw 

获取错误:

1 job failure(s) occurred: 
org.kitesdk.tools.CopyTask: Kite(dataset:file:/tmp/datasets/.temp/3759e9f8-7406-4ced-... ID=1 (1/1)(1): java.io.FileNotFoundException: File does not exist: hdfs://integration.mycorp.kom:8020/tmp/crunch-878994294/p1/REDUCE 
at org.apache.hadoop.hdfs.DistributedFileSystem$18.doCall(DistributedFileSystem.java:1122) 
at org.apache.hadoop.hdfs.DistributedFileSystem$18.doCall(DistributedFileSystem.java:1114) 
at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) 
at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1114) 
at org.apache.hadoop.fs.FileSystem.resolvePath(FileSystem.java:750) 
at org.apache.hadoop.mapreduce.v2.util.MRApps.parseDistributedCacheArtifacts(MRApps.java:568) 
at org.apache.hadoop.mapreduce.v2.util.MRApps.setupDistributedCache(MRApps.java:460) 
at org.apache.hadoop.mapred.LocalDistributedCacheManager.setup(LocalDistributedCacheManager.java:93) 
at org.apache.hadoop.mapred.LocalJobRunner$Job.<init>(LocalJobRunner.java:163) 
at org.apache.hadoop.mapred.LocalJobRunner.submitJob(LocalJobRunner.java:731) 
at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:536) 
at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1296) 
at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1293) 
at java.security.AccessController.doPrivileged(Native Method) 
at javax.security.auth.Subject.doAs(Subject.java:415) 
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628) 
at org.apache.hadoop.mapreduce.Job.submit(Job.java:1293) 
at org.apache.crunch.hadoop.mapreduce.lib.jobcontrol.CrunchControlledJob.submit(CrunchControlledJob.java:329) 
at org.apache.crunch.hadoop.mapreduce.lib.jobcontrol.CrunchJobControl.startReadyJobs(CrunchJobControl.java:204) 
at org.apache.crunch.hadoop.mapreduce.lib.jobcontrol.CrunchJobControl.pollJobStatusAndStartNewOnes(CrunchJobControl.java:238) 
at org.apache.crunch.impl.mr.exec.MRExecutor.monitorLoop(MRExecutor.java:112) 
at org.apache.crunch.impl.mr.exec.MRExecutor.access$000(MRExecutor.java:55) 
at org.apache.crunch.impl.mr.exec.MRExecutor$1.run(MRExecutor.java:83) 
at java.lang.Thread.run(Thread.java:745) 

任何人都可以帮助我了解什么是错误? 在此先感谢

回答

0

我认为你使用的是Kite SDK 1.1.0版本。我在做csv-import时也遇到同样的错误。当我切换到风筝SDK 1.0.0版本时,没有这样的错误。

我建议你切换到Kite SDK 1.0.0版本。

而且一直没有风筝SDK释放后的新版本1.1.0,并于2015年六月连这个版本发生

+0

感谢您的建议。 –