2015-03-08 43 views
0

会发生什么情况? 当我开始keepalived所有工作正常。当node01失败并且无法启动postgresql时,它会不断尝试强制进行选举。即使postgresql无法启动。选举现在每秒钟都在发生。keepalived脚本使故障转移疯狂

我想要实现 应该检查是否PostgreSQL对NODE01启动时NODE02是主人,但不强制进行选举,所有的时间是什么。有人可以尝试帮助并正确理解它吗?

这是我的代码

停止pgsql的:

#!/usr/bin/python 

import sys 
import subprocess 

sys.exit(
    subprocess.call(['/usr/bin/systemctl', 'stop', 'postgresql.service']) 
) 

通知:

#!/usr/bin/python 

import sys 
import subprocess 

state = sys.argv[3] 

with open('/var/run/keepalived.pgsql.state', 'w+') as f: 
    f.write(state) 

if state == 'MASTER': 
    sys.exit(
     subprocess.call(['/usr/bin/systemctl', 'start', 'postgresql.service']) 
    ) 

if state == 'BACKUP': 
    sys.exit(
     subprocess.call(['/usr/bin/systemctl', 'stop', 'postgresql.service']) 
    ) 

if state == 'FAULT': 
    sys.exit(
     subprocess.call(['/usr/bin/systemctl', 'stop', 'postgresql.service']) 
    ) 

检查的pgsql:

#!/usr/bin/python 

import sys 
import subprocess 
from time import sleep 

sleep(1) 

with open('/var/run/keepalived.pgsql.state', 'r') as f: 
    state = f.read().strip().strip("\n") 

# status 0: Postgresql is running 
# status 3: Postgresql has been stopped 
status = subprocess.call(['/usr/bin/systemctl', 'status', 'postgresql.service']) 

if status == 0 and state == 'MASTER': 
    sys.exit(0) 

if status == 0 and state == 'BACKUP': 
    sys.exit(3) 

if status == 3 and state == 'MASTER': 
    sys.exit(3) 

if status == 3 and state == 'BACKUP': 
    sys.exit(0) 

keepalived配置:

vrrp_script chk_pgsql { 
    script  "/etc/keepalived/check-pgsql" 
    interval 1 
    fall 3 
    rise 3 
    weight -4 
} 

vrrp_instance pgsql_vip { 
    state EQUAL 
    interface eth0 
    virtual_router_id 4 
    priority 100(node01)|99{node02} 
    advert_int 1 
    authentication { 
     auth_type PASS 
     auth_pass 1111 
    } 
    track_script { 
     chk_pgsql 
    } 
    virtual_ipaddress { 
     192.168.1.20 
    } 
    notify "/etc/keepalived/notify" 
    notify_stop "/etc/keepalived/stop" 
} 

回答

0

node01死后,node02被选为主。然后,你的检查脚本检查node01。脚本看到node01现在处于BACKUP状态,posgresql停止,并返回0.在检查脚本返回0 3次(根据您的VRRP配置)后,node01认为它是正确的。然后,由于node01具有比node02更高的优先级,因此它通过选举过程进行控制。然后检查脚本失败,因为node01处于MASTER状态并且posgresql停止。这导致keepalived在节点之间开始振荡。

我想你可以在2的一个方法解决这个问题:

  1. 化妆NODE01和NODE02同等优先
  2. 改变你的check脚本来只返回posgresql的状态