2014-02-24 40 views
7

断开我从时间到时间了Cloudera管理器中的以下错误:HDFS数据节点从NameNode的

This DataNode is not connected to one or more of its NameNode(s). 

The Cloudera Manager agent got an unexpected response from this role's web server. 

(通常在一起,有时只是其中之一)

在大多数SO和Google中提到这些错误时,问题是配置问题(并且数据节点从不连接到名称节点)

在我的情况下,数据节点通常在启动时连接,但在一段时间后会松开连接 - 所以它看起来不是一个错误的配置。

  • 还有其他的选择吗?
  • 是否有可能强制数据节点重新连接到名称节点?
  • 是否有可能从数据节点“ping”名称节点(模拟数据节点的连接尝试)
  • 它可能是某种资源问题(对于许多打开的文件\连接)?

样品日志(误差变化不时)

2014-02-25 06:39:49,179 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: exception: 
java.net.SocketTimeoutException: 480000 millis timeout while waiting for channel to be ready for write. ch : java.nio.channels.SocketChannel[connected local=/10.56.144.18:50010 remote=/10.56.144.28:48089] 
     at org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:165) 
     at org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:153) 
     at org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:114) 
     at org.apache.hadoop.hdfs.server.datanode.BlockSender.sendPacket(BlockSender.java:504) 
     at org.apache.hadoop.hdfs.server.datanode.BlockSender.sendBlock(BlockSender.java:673) 
     at org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:338) 
     at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opReadBlock(Receiver.java:92) 
     at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:64) 
     at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:221) 
     at java.lang.Thread.run(Thread.java:662) 
2014-02-25 06:39:49,180 INFO org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src: /10.56.144.18:50010, dest: /10.56.144.28:48089, bytes: 132096, op: HDFS_READ, cliID: DFSClient_NONMAPREDUCE_1315770947_27, offset: 0, srvID: DS-990970275-10.56.144.18-50010-1384349167420, blockid: BP-1381780028-10.56.144.16-1384349161741:blk_-8718668700255896235_5121440, duration: 480291679056 
2014-02-25 06:39:49,180 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(10.56.144.18, storageID=DS-990970275-10.56.144.18-50010-1384349167420, infoPort=50075, ipcPort=50020, storageInfo=lv=-40;cid=cluster16;nsid=7043943;c=0):Got exception while serving BP-1381780028-10.56.144.16-1384349161741:blk_-8718668700255896235_5121440 to /10.56.144.28:48089 
java.net.SocketTimeoutException: 480000 millis timeout while waiting for channel to be ready for write. ch : java.nio.channels.SocketChannel[connected local=/10.56.144.18:50010 remote=/10.56.144.28:48089] 
     at org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:165) 
     at org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:153) 
     at org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:114) 
     at org.apache.hadoop.hdfs.server.datanode.BlockSender.sendPacket(BlockSender.java:504) 
     at org.apache.hadoop.hdfs.server.datanode.BlockSender.sendBlock(BlockSender.java:673) 
     at org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:338) 
     at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opReadBlock(Receiver.java:92) 
     at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:64) 
     at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:221) 
     at java.lang.Thread.run(Thread.java:662) 
2014-02-25 06:39:49,181 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: host.com:50010:DataXceiver error processing READ_BLOCK operation src: /10.56.144.28:48089 dest: /10.56.144.18:50010 
java.net.SocketTimeoutException: 480000 millis timeout while waiting for channel to be ready for write. ch : java.nio.channels.SocketChannel[connected local=/10.56.144.18:50010 remote=/10.56.144.28:48089] 
     at org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:165) 
     at org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:153) 
     at org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:114) 
     at org.apache.hadoop.hdfs.server.datanode.BlockSender.sendPacket(BlockSender.java:504) 
     at org.apache.hadoop.hdfs.server.datanode.BlockSender.sendBlock(BlockSender.java:673) 
     at org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:338) 
     at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opReadBlock(Receiver.java:92) 
     at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:64) 
     at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:221) 
     at java.lang.Thread.run(Thread.java:662) 
+0

检查hadoop是否处于安全模式或某种程度上它发生因为此 –

+0

它与安全模式不相关(通常发生在服务器不在安全模式时) 某些数据节点已连接,有些未连接。 有没有特定的数据节点倾向于松动连接(每次它是不同的数据节点) –

+0

任何连接问题? –

回答

0

Hadoop的使用特定的端口数据管理部和名称节点之间进行通信。可能是防火墙阻止了这些特定的端口。检查Cloudera WebSite中的默认端口并使用特定端口测试与NameNode的连接。