MHA 0.58 check_repl 缺少对配置文件关键参数的正确性检查.

       本菜在对MHA进行测试时意外发现check_repl没有对配置文件masterha_default.cnf中的repl_user和password正确性进行检查.特别记录以下内容,不对之处欢迎批评指正.
1.环境node1(manger) 192.168.99.183
node2:192.168.99.184
node3:192.168.99.185
vip:192.168.99.253
MHA::MasterMonitor version 0.58.
 
2.当masterha_default.cnf配置中slave 同步账号密码填写错误时,通过masterha_check_repl无法检查到相应账号问题(只会检查slave的同步状态):
#masterha_check_repl --conf=/etc/masterha/app1.conf
Thu May 31 19:50:35 2018 - [info] Reading default configuration from /etc/masterha_default.cnf..
Thu May 31 19:50:35 2018 - [info] Reading application default configuration from /etc/masterha/app1.conf..
Thu May 31 19:50:35 2018 - [info] Reading server configuration from /etc/masterha/app1.conf..
Thu May 31 19:50:35 2018 - [info] MHA::MasterMonitor version 0.58.
Thu May 31 19:50:37 2018 - [info] GTID failover mode = 1
Thu May 31 19:50:37 2018 - [info] Dead Servers:
Thu May 31 19:50:37 2018 - [info] Alive Servers:
Thu May 31 19:50:37 2018 - [info]   192.168.99.183(192.168.99.183:3307)
Thu May 31 19:50:37 2018 - [info]   192.168.99.184(192.168.99.184:3307)
Thu May 31 19:50:37 2018 - [info]   192.168.99.185(192.168.99.185:3307)
Thu May 31 19:50:37 2018 - [info] Alive Slaves:
Thu May 31 19:50:37 2018 - [info]   192.168.99.183(192.168.99.183:3307)  Version=5.7.20-log (oldest major version between slaves) log-bin:enabled
Thu May 31 19:50:37 2018 - [info]     GTID ON
Thu May 31 19:50:37 2018 - [info]     Replicating from 192.168.99.185(192.168.99.185:3307)
Thu May 31 19:50:37 2018 - [info]   192.168.99.184(192.168.99.184:3307)  Version=5.7.20-log (oldest major version between slaves) log-bin:enabled
Thu May 31 19:50:37 2018 - [info]     GTID ON
Thu May 31 19:50:37 2018 - [info]     Replicating from 192.168.99.185(192.168.99.185:3307)
Thu May 31 19:50:37 2018 - [info]     Primary candidate for the new Master (candidate_master is set)
Thu May 31 19:50:37 2018 - [info] Current Alive Master: 192.168.99.185(192.168.99.185:3307)
Thu May 31 19:50:37 2018 - [info] Checking slave configurations..
Thu May 31 19:50:37 2018 - [info]  read_only=1 is not set on slave 192.168.99.183(192.168.99.183:3307).
Thu May 31 19:50:37 2018 - [info]  read_only=1 is not set on slave 192.168.99.184(192.168.99.184:3307).
Thu May 31 19:50:37 2018 - [info] Checking replication filtering settings..
Thu May 31 19:50:37 2018 - [info]  binlog_do_db= , binlog_ignore_db=
Thu May 31 19:50:37 2018 - [info]  Replication filtering check ok.
Thu May 31 19:50:37 2018 - [info] GTID (with auto-pos) is supported. Skipping all SSH and Node package checking.
Thu May 31 19:50:37 2018 - [info] Checking SSH publickey authentication settings on the current master..
Thu May 31 19:50:37 2018 - [info] HealthCheck: SSH to 192.168.99.185 is reachable.
Thu May 31 19:50:37 2018 - [info]
192.168.99.185(192.168.99.185:3307) (current master)
+--192.168.99.183(192.168.99.183:3307)
+--192.168.99.184(192.168.99.184:3307)

Thu May 31 19:50:37 2018 - [info] Checking replication health on 192.168.99.183..
Thu May 31 19:50:37 2018 - [info]  ok.
Thu May 31 19:50:37 2018 - [info] Checking replication health on 192.168.99.184..

Thu May 31 19:50:37 2018 - [info]  ok.
Thu May 31 19:50:37 2018 - [info] Checking master_ip_failover_script status:
Thu May 31 19:50:37 2018 - [info]   /etc/masterha/master_ip_failover --command=status --ssh_user=root --orig_master_host=192.168.99.185 --orig_master_ip=192.168.99.185 --orig_master_port=3307  --orig_master_ssh_port=3322
Thu May 31 19:50:37 2018 - [info]  OK.
Thu May 31 19:50:37 2018 - [warning] shutdown_script is not defined.
Thu May 31 19:50:37 2018 - [info] Got exit code 0 (Not master dead).

MySQL Replication Health is OK.


3.但在节点切换测试时,新主节点和VIP都可以正常启动,但主从同步没办法正常:
主节点切换日志:
192.168.99.184(192.168.99.184:3307): OK: Applying all logs succeeded.
192.168.99.184(192.168.99.184:3307): OK: Activated master IP address.
192.168.99.185(192.168.99.185:3307): ERROR: Starting slave failed.
Master failover to 192.168.99.184(192.168.99.184:3307) done, but recovery on slave partially failed.
 
slave节点show slave status\G;提示:
Last_IO_Error: error connecting to master 'repl@192.168.99.184:3307' - retry-time: 60  retries: 13

4.在新主节点error.log中找到关键提示:
2018-05-30T13:16:21.305305Z 63 [Note] Access denied for user 'repl'@'db5' (using password: YES)
2018-05-30T13:17:21.307134Z 64 [Note] Access denied for user 'repl'@'db5' (using password: YES)
2018-05-30T13:18:21.308872Z 65 [Note] Access denied for user 'repl'@'db5' (using password: YES)

5.将masterha_default.cnf的密码填写正确,就正常了.

0 个评论

要回复文章请先登录注册