2014-06-09 84 views
4

我想获得一个mariadb集群并运行,但它并没有为我工作。现在我在64位红帽子ES6机器上使用MariaDB Galera 5.5.36。我通过这次回购这里安装MariaDB的:MariaDB Galera集群设置问题

[mariadb] 
name = MariaDB 
baseurl = http://yum.mariadb.org/5.5-galera/rhel6-amd64/ 
gpgkey=https://yum.mariadb.org/RPM-GPG-KEY-MariaDB 
gpgcheck=1 

在server.conf中的服务器1我有以下几点:

[mariadb] 
log_error=/var/log/mariadb.log 
query_cache_size=0 
query_cache_type=0 
binlog_format=ROW 
default_storage_engine=innodb 
innodb_autoinc_lock_mode=2 
wsrep_provider=/usr/lib64/galera/libgalera_smm.so 
wsrep_cluster_address=gcomm://192.168.211.133 
wsrep_cluster_name='cluster' 
wsrep_node_address='192.168.211.132' 
wsrep_node_name='cluster1' 
wsrep_sst_method=rsync 

和服务器2我

[mariadb] 
log_error=/var/log/mariadb.log 
query_cache_size=0 
query_cache_type=0 
binlog_format=ROW 
default_storage_engine=innodb 
innodb_autoinc_lock_mode=2 
wsrep_provider=/usr/lib64/galera/libgalera_smm.so 
wsrep_cluster_address=gcomm://192.168.211.132 
wsrep_cluster_name='cluster' 
wsrep_node_address='192.168.211.133' 
wsrep_node_name='cluster2' 
wsrep_sst_method=rsync 

当我使用以下命令启动服务器1:sudo service mysql start --wsrep-new-cluster它启动就好,如果我打开mysql并检查wsrep的状态,它表示一切正常并运行,这是好事,但当我tr y以第二服务器我得到的做sudo的服务mysql的启动在错误日志如下:

140609 14:47:55 mysqld_safe Starting mysqld daemon with databases from /var/lib/mysql 
140609 14:47:56 mysqld_safe WSREP: Running position recovery with --log_error='/var/lib/mysql/wsrep_recovery.i5qfm2' --pid-file='/var/lib/mysql/localhost.localdomain-recover.pid' 
140609 14:47:57 mysqld_safe WSREP: Recovered position 85448d73-ebe8-11e3-9c20-fbc1995fee11:0 
140609 14:47:57 [Note] WSREP: wsrep_start_position var submitted: '85448d73-ebe8-11e3-9c20-fbc1995fee11:0' 
140609 14:47:57 [Note] WSREP: Read nil XID from storage engines, skipping position init 
140609 14:47:57 [Note] WSREP: wsrep_load(): loading provider library '/usr/lib64/galera/libgalera_smm.so' 
140609 14:47:57 [Note] WSREP: wsrep_load(): Galera 25.3.2(r170) by Codership Oy <[email protected]> loaded successfully. 
140609 14:47:57 [Note] WSREP: CRC-32C: using hardware acceleration. 
140609 14:47:57 [Note] WSREP: Found saved state: 85448d73-ebe8-11e3-9c20-fbc1995fee11:-1 
140609 14:47:57 [Note] WSREP: Passing config to GCS: base_host = 192.168.211.133; base_port = 4567; cert.log_conflicts = no; gcache.dir = /var/lib/mysql/; gcache.keep_pages_size = 0; gcache.mem_size = 0; gcache.name = /var/lib/mysql//galera.cache; gcache.page_size = 128M; gcache.size = 128M; gcs.fc_debug = 0; gcs.fc_factor = 1; gcs.fc_limit = 16; gcs.fc_master_slave = NO; gcs.max_packet_size = 64500; gcs.max_throttle = 0.25; gcs.recv_q_hard_limit = 9223372036854775807; gcs.recv_q_soft_limit = 0.25; gcs.sync_donor = NO; repl.causal_read_timeout = PT30S; repl.commit_order = 3; repl.key_format = FLAT8; repl.proto_max = 5 
140609 14:47:57 [Note] WSREP: Assign initial position for certification: 0, protocol version: -1 
140609 14:47:57 [Note] WSREP: wsrep_sst_grab() 
140609 14:47:57 [Note] WSREP: Start replication 
140609 14:47:57 [Note] WSREP: Setting initial position to 85448d73-ebe8-11e3-9c20-fbc1995fee11:0 
140609 14:47:57 [Note] WSREP: protonet asio version 0 
140609 14:47:57 [Note] WSREP: Using CRC-32C (optimized) for message checksums. 
140609 14:47:57 [Note] WSREP: backend: asio 
140609 14:47:57 [Note] WSREP: GMCast version 0 
140609 14:47:57 [Note] WSREP: (0c085f34-efe5-11e3-9f6b-8bfd1706e2a4, 'tcp://0.0.0.0:4567') listening at tcp://0.0.0.0:4567 
140609 14:47:57 [Note] WSREP: (0c085f34-efe5-11e3-9f6b-8bfd1706e2a4, 'tcp://0.0.0.0:4567') multicast: , ttl: 1 
140609 14:47:57 [Note] WSREP: EVS version 0 
140609 14:47:57 [Note] WSREP: PC version 0 
140609 14:47:57 [Note] WSREP: gcomm: connecting to group 'cluster', peer '192.168.211.132:,192.168.211.134:' 
140609 14:48:00 [Warning] WSREP: no nodes coming from prim view, prim not possible 
140609 14:48:00 [Note] WSREP: view(view_id(NON_PRIM,0c085f34-efe5-11e3-9f6b-8bfd1706e2a4,1) memb { 
     0c085f34-efe5-11e3-9f6b-8bfd1706e2a4,0 
} joined { 
} left { 
} partitioned { 
}) 
140609 14:48:01 [Warning] WSREP: last inactive check more than PT1.5S ago (PT3.50775S), skipping check 
140609 14:48:31 [Note] WSREP: view((empty)) 
140609 14:48:31 [ERROR] WSREP: failed to open gcomm backend connection: 110: failed to reach primary view: 110 (Connection timed out) 
     at gcomm/src/pc.cpp:connect():141 
140609 14:48:31 [ERROR] WSREP: gcs/src/gcs_core.c:gcs_core_open():196: Failed to open backend connection: -110 (Connection timed out) 
140609 14:48:31 [ERROR] WSREP: gcs/src/gcs.c:gcs_open():1291: Failed to open channel 'cluster' at 'gcomm://192.168.211.132,192.168.211.134': -110 (Connection timed out) 
140609 14:48:31 [ERROR] WSREP: gcs connect failed: Connection timed out 
140609 14:48:31 [ERROR] WSREP: wsrep::connect() failed: 7 
140609 14:48:31 [ERROR] Aborting 

140609 14:48:31 [Note] WSREP: Service disconnected. 
140609 14:48:32 [Note] WSREP: Some threads may fail to exit. 
140609 14:48:32 [Note] /usr/sbin/mysqld: Shutdown complete 

140609 14:48:32 mysqld_safe mysqld from pid file /var/lib/mysql/localhost.localdomain.pid ended 

我很茫然,为什么第二个服务器无法检测到集群启动并运行。这些机器可以相互沟通,我可以从一个SSH到另一个,他们可以互相ping通。我尝试删除galera缓存,试图降级我的mariadb galera版本,尝试禁用SELinux,尝试以不同的用户身份运行mysql服务,确认正确的端口已打开,尝试在具有不同IP地址的单独计算机上的2个VM上运行它们等等。有没有人知道这里发生了什么,因为我一直在寻找3天试图解决这个问题,但没有解决方案似乎与我一起工作。

回答

1

我相信你需要列出wsrep_cluster_address参数中的所有IP。

wsrep_cluster_address =的gcomm://192.168.211.132,192.168.211.133

这应该在两台主机上完成。顺便说一句,你可能想要三个节点不是两个,以避免分裂大脑的情况。

1

下面是我如何解决我的类似问题。

CentOS 7 w/MariaDB Galera 10.1。

节点2我看到这一点:

016-12-27 15:40:38 140703512762624 [Warning] WSREP: no nodes coming from prim view, prim not possible 

做一些阅读后,我试图在node1上运行此。

service mysql start --wsrep-new-cluster 

但这种失败,并在日志中,我发现这...

2016-12-27 15:44:08 140438853814528 [ERROR] WSREP: It may not be safe to bootstrap the cluster from this node. It was not the last one to leave the cluster and may not contain all the updates. To force cluster bootstrap with this node, edit the grastate.dat file manually and set safe_to_bootstrap to 1 . 

所以我编辑的文件/var/lib/mysql/grastate.dat,改变safe_to_bootstrap1

当时我能够开始使用主节点:

service mysql start --wsrep-new-cluster 

然后在别人,我只是用

service mysql start 

注:这是一个演示预生产环境。我在同一时间重新启动所有服务器后立即将所有内容都解决了,但是我知道没有写入,并且数据库同步。如果您正在进行生产并发生这种情况,可以使用以下内容来确定要在哪个节点上运行“new-cluster”,这类似于说让我为主。

mysqld_safe --wsrep-recover 

如果这是一个生产问题,我强烈建议阅读本文,并在向客户端抛出命令之前备份w/CloneZilla!

https://www.percona.com/blog/2014/09/01/galera-replication-how-to-recover-a-pxc-cluster/

0

群集必须由主节点上该命令启动:

galera_new_cluster 

开始第一节点之后就可以在集群成功启动的其他节点。