2012-10-05 18 views
5

默认情况下,Hadoop将hadoop.tmp.dir设置为您的/ tmp文件夹。这是一个问题,因为/ tmp中被消灭了被Linux,当你重新启动,从JobTracker的通向这个可爱的错误:以伪分布模式运行Hadoop时,我应该使用什么目录作为hadoop.tmp.dir?

2012-10-05 07:41:13,618 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: localhost/127.0.0.1:8020. Already tried 0 time(s).  
...  
2012-10-05 07:41:22,636 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: localhost/127.0.0.1:8020. Already tried 9 time(s). 
2012-10-05 07:41:22,643 INFO org.apache.hadoop.mapred.JobTracker: problem cleaning system directory: null 
java.net.ConnectException: Call to localhost/127.0.0.1:8020 failed on connection exception: java.net.ConnectException: Connection refused 
    at org.apache.hadoop.ipc.Client.wrapException(Client.java:767)  

我发现解决这个问题是重新格式化你的名字节点的唯一方法,它会重建/ tmp/hadoop-root文件夹,当您重新启动时,该文件夹当然会再次被擦除。

因此,我继续创建一个名为/ hadoop_temp的文件夹,并为所有用户提供读/写访问权限。然后我设置该属性在我的核心-site.xml中:

<property> 
      <name>hadoop.tmp.dir</name> 
      <value>file:///hadoop_temp</value> 
</property> 

当我重新格式化我的NameNode,Hadoop的似乎很高兴,给我这个消息:

12/10/05 07:58:54 INFO common.Storage: Storage directory file:/hadoop_temp/dfs/name has been successfully formatted. 

然而,当我看着/ hadoop_temp,我注意到这个文件夹是空的。然后当我重新启动Hadoop和检查了我的JobTracker的日志中,我看到了这一点:

2012-10-05 08:02:41,988 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: localhost/127.0.0.1:8020. Already tried 0 time(s). 
... 
2012-10-05 08:02:51,010 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: localhost/127.0.0.1:8020. Already tried 9 time(s). 
2012-10-05 08:02:51,011 INFO org.apache.hadoop.mapred.JobTracker: problem cleaning system directory: null 
java.net.ConnectException: Call to localhost/127.0.0.1:8020 failed on connection exception: java.net.ConnectException: Connection refused 

当我检查了我的NameNode的日志,我看到了这一点:

2012-10-05 08:00:31,206 INFO org.apache.hadoop.hdfs.server.common.Storage: Storage directory /opt/hadoop/hadoop-0.20.2/file:/hadoop_temp/dfs/name does not exist. 
2012-10-05 08:00:31,212 ERROR org.apache.hadoop.hdfs.server.namenode.FSNamesystem: FSNamesystem initialization failed. 
org.apache.hadoop.hdfs.server.common.InconsistentFSStateException: Directory /opt/hadoop/hadoop-0.20.2/file:/hadoop_temp/dfs/name is in an inconsistent state: storage directory does not exist or is not accessible. 

因此,很明显我没有配置对的东西。尽管我将hadoop.tmp.dir设置为core-site.xml中的/ hadoop_temp,Hadoop仍然希望在/ tmp文件夹中看到它的文件。我做错了什么? hadoop.tmp.dir接受的“正确”价值是什么?

奖励问题:我应该使用什么hbase.tmp.dir?

系统信息:

的Ubuntu 12.04, 的Apache Hadoop的.20.2, 的Apache HBase的.92.1

感谢您抽空看看!

回答

3

感谢Hadoop邮件列表中的Harsh帮助我解决这个问题。引用他:

“在基于0.20.x或1.x的版本上,不要使用hadoop.tmp.dir的file:///前缀。”

我拿出文件://前缀,它工作。

0

还与HBase的0.94 *您必须指定

<property> <name>hbase.cluster.distributed</name> <value>true</value> </property>

相关问题