Nutch从EMR问题阅读S3

嗨我想在亚马逊的EMR上运行Apache Nutch 1.2。
为此，我从S3中指定一个输入目录。我得到以下错误：Nutch从EMR问题阅读S3

 
Fetcher: java.lang.IllegalArgumentException: 
    This file system object (hdfs://ip-11-202-55-144.ec2.internal:9000) 
    does not support access to the request path 
    's3n://crawlResults2/segments/20110823155002/crawl_fetch' 
    You possibly called FileSystem.get(conf) when you should have called 
    FileSystem.get(uri, conf) to obtain a file system supporting your path.

我明白FileSystem.get(uri, conf)和FileSystem.get(conf)之间的差异。如果我自己写这个，我会FileSystem.get(uri, conf)但是我想使用现有的Nutch代码。

我问这个问题，有人告诉我，我需要修改hadoop-site.xml包括以下特性：fs.default.name，fs.s3.awsAccessKeyId，fs.s3.awsSecretAccessKey。我在core-site.xml更新这些属性（hadoop-site.xml不存在），但并没有发挥作用。有没有人有任何其他想法？感谢您的帮助。

来源

2011-08-30 Peter H

没有用过Nutch的，但也许检查您是否想获得资源是公开的（不会损害这样做只是为了测试），也尝试更换（再次只是用于测试）S3N：// - > s3：//。我想它应该与s3n和指定的信誉一起工作，但更多的测试不会造成伤害 – Kris

尝试

hadoop-site.xml

<property> 
    <name>fs.default.name</name> 
    <value>org.apache.hadoop.fs.s3.S3FileSystem</value> 
</property>

指定这将提至Nutch的，默认情况下S3应使用

属性

fs.s3.awsAccessKeyId and fs.s3.awsSecretAccessKey

规范，你只需要在情况下，当你的S3对象被置于认证之下（S3对象可以被所有用户访问，或者仅被认证上）

来源

2014-03-12 08:49:28 dmitry

Nutch从EMR问题阅读S3

回答

相关问题