2016-03-01 59 views
4

我部署了一个长时间运行的Storm拓扑。几个小时运行后,整个拓扑结构都下降了。我查看了工作人员日志,并找到了这些日志。正如它所说的,zookeeper客户端会话超时并导致重新连接。我怀疑它与我的断开的拓扑有关。现在我试图找出可能导致客户端超时的原因。什么会导致zookeeper客户端会话超时

2016-02-29T10:34:12.386+0800 o.a.s.z.ClientCnxn [INFO] Client session timed out, have not heard from server in 23789ms for sessionid 0x252f862028c0083, closing socket connection and attempting reconnect 
2016-02-29T10:34:12.986+0800 o.a.s.c.f.s.ConnectionStateManager [INFO] State change: SUSPENDED 
2016-02-29T10:34:13.059+0800 b.s.cluster [WARN] Received event :disconnected::none: with disconnected Zookeeper. 
2016-02-29T10:34:13.197+0800 o.a.s.z.ClientCnxn [INFO] Opening socket connection to server zk-3.cloud.mos/172.16.13.147:2181. Will not attempt to authenticate using SASL (unknown error) 
2016-02-29T10:34:13.241+0800 o.a.s.z.ClientCnxn [WARN] Session 0x252f862028c0083 for server null, unexpected error, closing socket connection and attempting reconnect 
java.net.ConnectException: Connection refused 
    at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) ~[na:1.8.0_31] 
    at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:716) ~[na:1.8.0_31] 
    at org.apache.storm.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:361) ~[storm-core-0.9.6.jar:0.9.6] 
    at org.apache.storm.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1081) ~[storm-core-0.9.6.jar:0.9.6] 

回答

1

您的客户端不能再与ZooKeeper服务器通话。所发生的第一件事是有协商的会话超时内心跳没有答案:

2016-02-29T10:34:12.386 + 0800 oaszClientCnxn [INFO]客户端会话超时,都没有音信在23789ms的会话ID 0x252f862028c0083服务器,关闭套接字连接,并尝试重新连接

然后当它试图重新连接,它有一个连接被拒绝:

2016-02-29T10:34:13.241 + 0800 oasz ClientCnxn [WARN] Session 0x252f862028c0083 for服务器无效,意外的错误,关闭套接字连接,并尝试重新连接 java.net.ConnectException:连接被拒绝

这意味着要么你的ZooKeeper服务器:

  • 不可达(网络连接下)
  • 已死(所以没有任何东西正在监听插座)
  • GCing本身已经死亡并且无法通信(尽管这可能会发出连接超时错误,我不确定)

要告诉您更多信息,您需要检查您的(Hadoop?)群集上的ZooKeeper服务器日志。

+0

请问我面临同样的问题如果我有GC问题,我该如何解决它? – user5520049

0

始终使用2181作为zookeeper连接的端口号,直到您尚未配置zookeeper !!!