2013-06-18 111 views
0

我有与本文中完全相同的错误。 http://lucene.472066.n3.nabble.com/Multinode-cluster-only-recognizes-1-node-td3997585.html多节点群集只能识别1个活动节点

我该如何解决这个问题?

编辑:

我们要运行一个2节点集群。我们的代码完美工作。 我们有一个主节点和一个从节点。因为我们要使用主节点也作为一个奴隶,我们已经配置了主从文件为:

conf/master: 
master 

conf/slave: 
master 
slave 

当我将主节点上运行斌/ start-all.sh,JPS给这些预期:

从节点上
namenode 
secondarynamenode 
jobtracker 
datanode 
tasktracker 
jps 

JPS给这些预期:

datanode 
tasktracker 
jps 

一切都很正常。我们的配置mapred-site,核心站点知道主IP和端口。复制因子在hdfs-site.xml中设置为2。

在此配置上运行mapreduce应用程序。但我想它只能在masternode的jobtracker上运行。当我看着JobTracker的用户界面,节点数量为1

另一种情形:

如果我不想使用主作为一个奴隶还可以,我改变主机和从机的文件是这样的:

conf/masters: 
master 

conf/slaves: 
slave 

现在的大师rnode JPS给出:从属节点上

namenode 
secondarynamenode 
jobtracker 
jps 

JPS给这些如期望的那样

datanode 
tasktracker 
jps 

在这个配置中,它给了我“只能复制0而不是1”的错误。我在最后添加了完整的控制台输出。

顺便说一下hadoop_home目录路径对于两个节点都是相同的。这不再是一个问题。

可能是什么问题?

完整的控制台输出:

[[email protected] hadoop-1.1.2]$ bin/hadoop jar /home/adminuser/Desktop/proje/proje.jar arkadasoner.Main hdfs://10.0.2.15:9000/input/id.txt hdfs://10.0.2.15:9000/output/x.txt hdfs://10.0.2.15:9000/output/y.txthdfs://10.0.2.15:9000/output/z.txt 

13/06/16 14:36:59 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same. 
13/06/16 14:36:59 WARN hdfs.DFSClient: DataStreamer Exception: org.apache.hadoop.ipc.RemoteException: java.io.IOException: File /tmp/hadoop-adminuser/mapred/staging/adminuser/.staging/job_201306161433_0001/job.jar could only be replicated to 0 nodes, instead of 1 
    at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1639) 
    at org.apache.hadoop.hdfs.server.namenode.NameNode.addBlock(NameNode.java:736) 
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) 
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) 
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) 
    at java.lang.reflect.Method.invoke(Method.java:601) 
    at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:578) 
    at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1393) 
    at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1389) 
    at java.security.AccessController.doPrivileged(Native Method) 
    at javax.security.auth.Subject.doAs(Subject.java:415) 
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1149) 
    at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1387) 

    at org.apache.hadoop.ipc.Client.call(Client.java:1107) 
    at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:229) 
    at com.sun.proxy.$Proxy1.addBlock(Unknown Source) 
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) 
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) 
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) 
    at java.lang.reflect.Method.invoke(Method.java:601) 
    at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:85) 
    at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:62) 
    at com.sun.proxy.$Proxy1.addBlock(Unknown Source) 
    at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.locateFollowingBlock(DFSClient.java:3686) 
    at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:3546) 
    at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$2600(DFSClient.java:2749) 
    at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2989) 

13/06/16 14:36:59 WARN hdfs.DFSClient: Error Recovery for block null bad datanode[0] nodes == null 
13/06/16 14:36:59 WARN hdfs.DFSClient: Could not get block locations. Source file "/tmp/hadoop-adminuser/mapred/staging/adminuser/.staging/job_201306161433_0001/job.jar" - Aborting... 
13/06/16 14:36:59 INFO mapred.JobClient: Cleaning up the staging area hdfs://10.0.2.15:9000/tmp/hadoop-adminuser/mapred/staging/adminuser/.staging/job_201306161433_0001 
13/06/16 14:36:59 ERROR security.UserGroupInformation: PriviledgedActionException as:adminuser cause:org.apache.hadoop.ipc.RemoteException: java.io.IOException: File /tmp/hadoop-adminuser/mapred/staging/adminuser/.staging/job_201306161433_0001/job.jar could only be replicated to 0 nodes, instead of 1 
    at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1639) 
    at org.apache.hadoop.hdfs.server.namenode.NameNode.addBlock(NameNode.java:736) 
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) 
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) 
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) 
    at java.lang.reflect.Method.invoke(Method.java:601) 
    at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:578) 
    at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1393) 
    at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1389) 
    at java.security.AccessController.doPrivileged(Native Method) 
    at javax.security.auth.Subject.doAs(Subject.java:415) 
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1149) 
    at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1387) 

Exception in thread "main" org.apache.hadoop.ipc.RemoteException: java.io.IOException: File /tmp/hadoop-adminuser/mapred/staging/adminuser/.staging/job_201306161433_0001/job.jar could only be replicated to 0 nodes, instead of 1 
    at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1639) 
    at org.apache.hadoop.hdfs.server.namenode.NameNode.addBlock(NameNode.java:736) 
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) 
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) 
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) 
    at java.lang.reflect.Method.invoke(Method.java:601) 
    at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:578) 
    at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1393) 
    at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1389) 
    at java.security.AccessController.doPrivileged(Native Method) 
    at javax.security.auth.Subject.doAs(Subject.java:415) 
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1149) 
    at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1387) 

    at org.apache.hadoop.ipc.Client.call(Client.java:1107) 
    at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:229) 
    at com.sun.proxy.$Proxy1.addBlock(Unknown Source) 
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) 
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) 
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) 
    at java.lang.reflect.Method.invoke(Method.java:601) 
    at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:85) 
    at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:62) 
    at com.sun.proxy.$Proxy1.addBlock(Unknown Source) 
    at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.locateFollowingBlock(DFSClient.java:3686) 
    at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:3546) 
    at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$2600(DFSClient.java:2749) 
    at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2989) 
13/06/16 14:36:59 ERROR hdfs.DFSClient: Failed to close file /tmp/hadoop-adminuser/mapred/staging/adminuser/.staging/job_201306161433_0001/job.jar 
org.apache.hadoop.ipc.RemoteException: java.io.IOException: File /tmp/hadoop-adminuser/mapred/staging/adminuser/.staging/job_201306161433_0001/job.jar could only be replicated to 0 nodes, instead of 1 
    at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1639) 
    at org.apache.hadoop.hdfs.server.namenode.NameNode.addBlock(NameNode.java:736) 
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) 
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) 
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) 
    at java.lang.reflect.Method.invoke(Method.java:601) 
    at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:578) 
    at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1393) 
    at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1389) 
    at java.security.AccessController.doPrivileged(Native Method) 
    at javax.security.auth.Subject.doAs(Subject.java:415) 
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1149) 
    at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1387) 

    at org.apache.hadoop.ipc.Client.call(Client.java:1107) 
    at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:229) 
    at com.sun.proxy.$Proxy1.addBlock(Unknown Source) 
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) 
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) 
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) 
    at java.lang.reflect.Method.invoke(Method.java:601) 
    at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:85) 
    at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:62) 
    at com.sun.proxy.$Proxy1.addBlock(Unknown Source) 
    at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.locateFollowingBlock(DFSClient.java:3686) 
    at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:3546) 
    at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$2600(DFSClient.java:2749) 
    at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2989) 
+0

你介意在这里输入'你的'问题吗?另外,显示日志会有帮助。我不太明白这个问题。 JPS在你的主机和从机上显示了什么?你有多少奴隶? – Tariq

回答

1

请您检查,在每一个主+从配置文件使用一个名称IP地址或主机像HDFS://主:54310(它应该是每个主相同,奴隶)。其中master是我的/ etc/hosts文件中指向主节点的主机名。

我也遇到了同样的问题,但我在所有节点上都使用hdfs:// localhost:54310,然后将其更改为hdfs:// master:54310或hdfs:// xxxx:54310其中xxxx是主节点的地址。