2011-08-04 54 views
4

我设立一个RabbitMQ的集群,并在过程中的一个步骤中遇到了一个问题。它直接出自rabbitmq集群指南。的Mnesia无法连接到另一个节点

[email protected]:~# rabbitmqctl status 
Status of node [email protected] ... 
[{pid,20410}, 
{running_applications,[{rabbit,"RabbitMQ","2.5.1"}, 
         {os_mon,"CPO CXC 138 46","2.2.4"}, 
         {sasl,"SASL CXC 138 11","2.1.8"}, 
         {mnesia,"MNESIA CXC 138 12","4.4.12"}, 
         {stdlib,"ERTS CXC 138 10","1.16.4"}, 
         {kernel,"ERTS CXC 138 10","2.13.4"}]}, 
{os,{unix,linux}}, 
{erlang_version,"Erlang R13B03 (erts-5.7.4) [source] [64-bit] [smp:4:4] [rq:4] [async-threads:30] [hipe] [kernel-poll:true]\n"}, 
{memory,[{total,25296704}, 
      {processes,9680280}, 
      {processes_used,9662720}, 
      {system,15616424}, 
      {atom,1099393}, 
      {atom_used,1082732}, 
      {binary,89768}, 
      {code,11606637}, 
      {ets,726848}]}] 
...done. 
[email protected]:~# rabbitmqctl cluster_status 
Cluster status of node [email protected] ... 
[{nodes,[{disc,[[email protected]]}]},{running_nodes,[[email protected]]}] 
...done. 
[email protected]:~# rabbitmqctl stop_app 
Stopping node [email protected] ... 
...done. 
[email protected]:~# rabbitmqctl reset 
Resetting node [email protected] ... 
...done. 
[email protected]:~# rabbitmqctl cluster [email protected] 
Clustering node [email protected] with [[email protected]] ... 
Error: {failed_to_cluster_with,[[email protected]], 
           "Mnesia could not connect to some nodes."} 

一个节点无法连接到另一个节点的可能原因是什么?

这里是我下面的指南:http://www.rabbitmq.com/clustering.html

回答

5

我跳进freenode上的#rabbitmq通道。下面是随后进行的讨论:

14:29 shakakai: hey all, i'm having a little issue with clustering rabbitmq http://stackoverflow.com/questions/6948624/mnesia-cant-connect-to-another-node 
14:30 shakakai: has anyone run into that problem before? 
14:30 daysmen has left IRC (Read error: Connection reset by peer) 
14:30 antares_: shakakai: make sure that epmd is running on every node 
14:30 antares_: shakakai: and that port it uses (4369) is open in your firewall 
14:31 |Blaze|: shakakai: is your dns correct? Can you ping worker1 from celery and celery from worker1 
14:31 shakakai: |Blaze|: hmm...i'll check 
14:31 daysmen has joined ([email protected]) 
14:32 shakakai: |Blaze|: this is where I'm a little confused, the rabbitmq nodename is [email protected] but the fqdn to ping the box is "ping worker1.mydomain.com" 
14:33 |Blaze|: can you "ping worker1" 
14:34 shakakai: |Blaze|: no 
14:34 |Blaze|: k, you'll need to fix that 
14:34 hyperboreean has left IRC (Ping timeout: 250 seconds) 
14:37 shakakai: |Blaze|: gotcha, so I setup a hosts file and i should be good 
14:37 |Blaze|: yup 
14:37 |Blaze|: in both directions 

TL; DR

确保你可以从每个你聚类箱平安兔节点名。如果你不能,为每个兔子节点名设置一个hosts文件。

+0

我不认为这是禁忌接受你自己的答案,特别是因为它是一个很好的一个。 – scvalex

+0

哎呦 - 忘了那个:P – Shakakai

0

有几件事情来检查,然后才能获得集群运行良好: 0)确保您设置了网络中的每个节点 1)上运行完全相同的RabbitMQ的版本,直到你能够从Ping通服务器对方 2)饼干 - 你必须得到在.erlang.cookie文件完全相同的二郎神的cookie每个服务器 一个窍门是有用的是,在一个节点试试这个命令来查看是否可以从RabbitMQ的 达到一个又一个rabbitmqctl eval'net_adm:ping(rabbit @ othernode)'。'

这应该说庞如果是nok或pong如果没关系 注意不要忘记接近eval表达式结尾的点。

我得到了它几个小时不成功的试验后工作正常。

3)请记住,重新启动群集的节点时,如果该节点是不是最后一次,这是停在那里可能是一个问题 - 它不会前的最后一站是重新开始启动。 当所有上述(0-2)是正确的,3可能是你的问题的根本原因......

希望这有助于, 欢呼, JB

-1

有一件事我读过的erlang cookie需要在所有群集节点上进行通信。我相信它生活在/var/lib/rabbitmq/.erlang.cookie

相关问题