2016-11-08 28 views
1

我有一个numOwners = 2的3节点infinispan集群,当节点从网络断开连接并加入时,我遇到了集群视图的问题。以下是日志:infinispan和jgroups的不正确的合并视图

(来电-1,BrokerPE-0-28575)ISPN000094:收到新集群用于信道ISPN视图:[BrokerPE-0-28575 | 2] (3) [BrokerPE-0-28575 ,SEM03VVM-201-59385,SEM03VVM-202-33714]

ISPN000094:通道ISPN接收的新集群视图:[BrokerPE-0-28575 | 3] (2) [BrokerPE-0-28575,SEM03VVM- 202-33714] - >一个节点断开

ISPN000093:收到新的合并簇的信道ISPN视图:MergeView :: [BrokerPE-0-28575 | 4](2)[BrokerPE-0-28 (Broker PE-0-28575 | 3](2)[Broker PE-0-28575,SEM03VVM-202-33714],[Broker PE-0-28575 | 2](3)[575,SEM03VVM-201-59385],2个亚组: [BrokerPE-0-28575,SEM03VVM-201-59385,SEM03VVM-202-33714] - >不正确合并

以下是我的JGroups配置:

<config xmlns="urn:org:jgroups" 
     xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
     xsi:schemaLocation="urn:org:jgroups http://www.jgroups.org/schema/jgroups-3.6.xsd"> 
    <TCP 
      bind_addr="${jgroups.tcp.address:127.0.0.1}" 
     bind_port="${jgroups.tcp.port:7800}" 
     loopback="true" 
     port_range="30" 
     recv_buf_size="20m" 
     send_buf_size="640k" 
     max_bundle_size="31k" 
     use_send_queues="true" 
     enable_diagnostics="false" 
     sock_conn_timeout="300" 
     bundler_type="old" 

     thread_naming_pattern="pl" 

     timer_type="new3" 
     timer.min_threads="4" 
     timer.max_threads="10" 
     timer.keep_alive_time="3000" 
     timer.queue_max_size="500" 


     thread_pool.enabled="true" 
     thread_pool.min_threads="2" 
     thread_pool.max_threads="30" 
     thread_pool.keep_alive_time="60000" 
     thread_pool.queue_enabled="true" 
     thread_pool.queue_max_size="100" 
     thread_pool.rejection_policy="Discard" 

     oob_thread_pool.enabled="true" 
     oob_thread_pool.min_threads="2" 
     oob_thread_pool.max_threads="30" 
     oob_thread_pool.keep_alive_time="60000" 
     oob_thread_pool.queue_enabled="false" 
     oob_thread_pool.queue_max_size="100" 
     oob_thread_pool.rejection_policy="Discard" 

     internal_thread_pool.enabled="true" 
     internal_thread_pool.min_threads="1" 
     internal_thread_pool.max_threads="10" 
     internal_thread_pool.keep_alive_time="60000" 
     internal_thread_pool.queue_enabled="true" 
     internal_thread_pool.queue_max_size="100" 
     internal_thread_pool.rejection_policy="Discard" 
     /> 

    <!-- Ergonomics, new in JGroups 2.11, are disabled by default in TCPPING until JGRP-1253 is resolved --> 
    <TCPPING timeout="3000" initial_hosts="${jgroups.tcpping.initial_hosts:HostA[7800],HostB[7801]}" 
      port_range="2" 
      num_initial_members="3" 
      ergonomics="false" 
     /> 

    <!-- MPING bind_addr="${jgroups.bind_addr:127.0.0.1}" break_on_coord_rsp="true" 
     mcast_addr="${jboss.default.multicast.address:228.2.4.6}" 
     mcast_port="${jgroups.mping.mcast_port:43366}" 
     ip_ttl="${jgroups.udp.ip_ttl:2}" 
     num_initial_members="3"/--> 
    <!-- <MPING bind_addr="${jgroups.bind_addr:127.0.0.1}" break_on_coord_rsp="true" 
     mcast_addr="${jboss.default.multicast.address:228.2.4.6}" 
     mcast_port="${jgroups.mping.mcast_port:43366}" 
     ip_ttl="${jgroups.udp.ip_ttl:2}" 
     num_initial_members="3"/> --> 
    <MERGE3 max_interval="30000" min_interval="10000"/> 

    <FD_SOCK bind_addr="${jgroups.bind_addr}"/> 
    <FD timeout="3000" max_tries="3"/> 
    <VERIFY_SUSPECT timeout="3000"/> 
    <!-- <BARRIER /> --> 
    <!-- <pbcast.NAKACK use_mcast_xmit="false" retransmit_timeout="300,600,1200,2400,4800" discard_delivered_msgs="true"/> --> 
    <pbcast.NAKACK2 use_mcast_xmit="false" 
        xmit_interval="1000" 
        xmit_table_num_rows="100" 
        xmit_table_msgs_per_row="10000" 
        xmit_table_max_compaction_time="10000" 
        max_msg_batch_size="100" discard_delivered_msgs="true"/> 
    <UNICAST3 xmit_interval="500" 
      xmit_table_num_rows="20" 
      xmit_table_msgs_per_row="10000" 
      xmit_table_max_compaction_time="10000" 
      max_msg_batch_size="100" 
      conn_expiry_timeout="0"/> 

    <pbcast.STABLE stability_delay="1000" desired_avg_gossip="50000" max_bytes="400000"/> 
    <pbcast.GMS print_local_addr="true" join_timeout="3000" view_bundling="true" merge_timeout="6000"/> 
    <tom.TOA/> <!-- the TOA is only needed for total order transactions--> 

    <UFC max_credits="2m" min_threshold="0.40"/> 
    <!-- <MFC max_credits="2m" min_threshold="0.40"/> --> 
    <FRAG2 frag_size="30k"/> 
    <RSVP timeout="60000" resend_interval="500" ack_on_delivery="false" /> 
    <!-- <pbcast.STATE_TRANSFER/> --> 
</config> 

我使用的Infinispan 7.0。 2和jgroups 3.6.1版本。我尝试了很多配置,但没有任何工作。您的帮助将不胜感激。

[UPDATE]将以下属性设置为1以上后,情况正常:“internal_thread_pool.min_threads”。

+0

您是否尝试过用较新的Infinispan版本,例如8.2.4.Final? –

+0

@DanBerindei我没有,但这里的问题似乎与jgroups集群合并。 – geekprogrammer

+0

@DanBerindei我们也尝试过使用Infinispan 8.2.4,并得到同样的问题。 – geekprogrammer

回答

1

所以为了简化,我们有

  • 查看经纪人| 2 = {经纪人,201,202}
  • 201叶,认为现在是经纪人| 3 = {经纪人,202}
  • 然后在视图broker | 3和broker | 2之间出现合并,导致视图代理不正确| 4 = {broker,201}

我创建了[1]来调查这里发生了什么。首先,合并视图的子视图应该包括202作为子组协调员,但事实并非如此。

你能描述一下究竟发生了什么吗?这可以复制吗?这将是不错的有FD,FD_ALL,MERGE3和GMS跟踪级别日志...

[1] https://issues.jboss.org/browse/JGRP-2128

+0

是的,当我们手动将我们的一个节点从网络中断开并将其连接回来时,它在我们的环境中始终可以重复使用。感谢您创建错误;我将添加跟踪日志。 – geekprogrammer

+0

有没有解决这个问题的方法? – geekprogrammer