翼度科技»论坛 编程开发 mysql 查看内容

MySQL MHA切换过程分析

10

主题

10

帖子

30

积分

新手上路

Rank: 1

积分
30
启动 
MHA的启动脚本为masterha_manager(安装后,默认路径--/usr/local/bin/masterha_manager)。启动的过程中会主动检查各节点的SSH连接和主从复制的状态是否正常。运行期间,manager会调用masterha_master_monitor脚本(masterha_master_monitor进一步调用XXX/mha4mysql-manager-0.5?/lib/MHA/MasterMonitor.pm 和 HealthCheck.pm 等脚本),探测各节点的运行情况。探测间隔由manager配置文件中的ping_interval参数决定,探测三次主节点无反应,就判定为宕机。
 故障选主
---读取配置文件中是否有候选主库的参数--candidate_master=1;如果有该参数,并且check_repl_delay=0,则将该节点提升为新的主库。
--如果没有指定候选主节点,则自动判断所有从库的日志量,将最接近主数据库的从库提升为新的主库。
---按照配置文件中,节点的先后顺序选主。
数据补偿
---判断主库SSH的连通性,如果能连通,则通过“save_binary_logs”脚本将缺失的binlog发送给从库,并恢复;
---如果主库无法连通,则通过“apply_diff_relay_logs”脚本计算从库的relay log的差异,并恢复到其他从库;
角色切换
新选出的主库,解除从库身份,剩余从库与新的主库建立主从关系。
VIP偏移
虚拟IP的绑定。
 
思考
如果在FailOver的过程中,主库恢复了怎么办?
要分情况了,可能会FailOver继续也可能要FailOver终止。下面是FailOver终止的Log。
Sat Jan 20 09:27:28 2024 - [warning] Got timeout on MySQL Ping(SELECT) child process and killed it! at /usr/local/share/perl5/MHA/HealthCheck.pm line 431.
Sat Jan 20 09:27:28 2024 - [info] Executing SSH check script: exit 0
Sat Jan 20 09:27:32 2018 - [warning] Got error on MySQL connect: 2003 (Can't connect to MySQL server on '172.171.172.171' (4))
Sat Jan 20 09:27:32 2018 - [warning] Connection failed 2 time(s)..
Sat Jan 20 09:27:34 2024 - [warning] HealthCheck: Got timeout on checking SSH connection to 172.171.172.171! at /usr/local/share/perl5/MHA/HealthCheck.pm line 342.
Sat Jan 20 09:27:35 2024 - [warning] Got error on MySQL connect: 2003 (Can't connect to MySQL server on '172.171.172.171' (4))
Sat Jan 20 09:27:35 2024 - [warning] Connection failed 3 time(s)..
Sat Jan 20 09:27:38 2024 - [warning] Got error on MySQL connect: 2003 (Can't connect to MySQL server on '172.171.172.171' (4))
Sat Jan 20 09:27:38 2024 - [warning] Connection failed 4 time(s)..
Sat Jan 20 09:27:38 2024 - [warning] Master is not reachable from health checker!
Sat Jan 20 09:27:38 2024 - [warning] Master 172.171.172.171(172.171.172.171:3307) is not reachable!
Sat Jan 20 09:27:38 2024 - [warning] SSH is NOT reachable.
Sat Jan 20 09:27:38 2024 - [info] Connecting to a master server failed. Reading configuration file /etc/masterha_default.cnf and /data/mhacnf/qqweixinod.cnf again, and trying to connect to all servers to check server status..
Sat Jan 20 09:27:38 2024 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
Sat Jan 20 09:27:38 2024 - [info] Reading application default configuration from /data/mhacnf/qqweixinod.cnf..
Sat Jan 20 09:27:38 2024 - [info] Reading server configuration from /data/mhacnf/qqweixinod.cnf..
Sat Jan 20 09:27:39 2024 - [info] GTID failover mode = 1
Sat Jan 20 09:27:39 2024 - [info] Dead Servers:
Sat Jan 20 09:27:39 2024 - [info]   172.171.172.171(172.171.172.171:3307)
Sat Jan 20 09:27:39 2024 - [info] Alive Servers:
Sat Jan 20 09:27:39 2024 - [info]   172.171.172.172(172.171.172.172:3307)
Sat Jan 20 09:27:39 2024 - [info]   172.171.172.173(172.171.172.173:3307)
Sat Jan 20 09:27:39 2024 - [info] Alive Slaves:
Sat Jan 20 09:27:39 2024 - [info]   172.171.172.172(172.171.172.172:3307)  Version=5.7.21-log (oldest major version between slaves) log-bin:enabled
Sat Jan 20 09:27:39 2024 - [info]     GTID ON
Sat Jan 20 09:27:39 2024 - [info]     Replicating from 172.171.172.171(172.171.172.171:3307)
Sat Jan 20 09:27:39 2024 - [info]     Primary candidate for the new Master (candidate_master is set)
Sat Jan 20 09:27:39 2024 - [info]   172.171.172.173(172.171.172.173:3307)  Version=5.7.21-log (oldest major version between slaves) log-bin:enabled
Sat Jan 20 09:27:39 2024 - [info]     GTID ON
Sat Jan 20 09:27:39 2024 - [info]     Replicating from 172.171.172.171(172.171.172.171:3307)
Sat Jan 20 09:27:39 2024 - [info] Checking slave configurations..
Sat Jan 20 09:27:39 2024 - [info] Checking replication filtering settings..
Sat Jan 20 09:27:39 2024 - [info]  Replication filtering check ok.
Sat Jan 20 09:27:39 2024 - [info] Master is down!
Sat Jan 20 09:27:39 2024 - [info] Terminating monitoring script.
Sat Jan 20 09:27:39 2024 - [info] Got exit code 20 (Master dead).
Sat Jan 20 09:27:39 2024 - [info] MHA::MasterFailover version 0.56.
Sat Jan 20 09:27:39 2024 - [info] Starting master failover.
Sat Jan 20 09:27:39 2024 - [info]
Sat Jan 20 09:27:39 2024 - [info] * Phase 1: Configuration Check Phase..
Sat Jan 20 09:27:39 2024 - [info]
Sat Jan 20 09:27:40 2024 - [info] GTID failover mode = 1
Sat Jan 20 09:27:40 2024 - [info] Dead Servers:
Sat Jan 20 09:27:40 2024 - [info]   172.171.172.171(172.171.172.171:3307)
  1. <strong>Sat Jan 20 09:27:40 2018 - [info] Checking master reachability via MySQL(double check)...</strong><br><strong>Sat Jan 20 09:27:40 2018 - [error][/usr/local/share/perl5/MHA/MasterFailover.pm, ln218] The master 172.171.172.171(172.171.172.171:3307) is reachable via MySQL (error=1:Connection Succeeded) ! Stop failover.
  2. Sat Jan 20 09:27:40 2018 - [error][/usr/local/share/perl5/MHA/ManagerUtil.pm, ln177] Got ERROR:  at /usr/local/bin/masterha_manager line 65.</strong>
复制代码
注:Log中的3307是数据库的DB端口,别奇怪. 
如果是在 Checking master reachability via MySQL(double check) 的过程中(或者check前),发现恢复了,则退出切换过程。并且MHA的进程也会被退出(KIll),masterha_manager 需要重新手动启动。
Checking master reachability via MySQL(double check) ---MasterFailover.pm
源码如下:
  1. # quick check that the dead server is really dead
  2. # not double check when ping_type is insert,
  3. # because check_connection_fast_util can rerurn true if insert-check detects I/O failure.
  4.   if ( $servers_config[0]->{ping_type} ne $MHA::ManagerConst::PING_TYPE_INSERT )
  5.   {
  6.     $log->info("Checking master reachability via MySQL(double check)...");
  7.     if (
  8.       my $rc = MHA::DBHelper::check_connection_fast_util(
  9.         $dead_master->{hostname}, $dead_master->{port},
  10.         $dead_master->{user},     $dead_master->{password}
  11.       )
  12.       )
  13.     {
  14.       $log->error(
  15.         sprintf(
  16.           "The master %s is reachable via MySQL (error=%s) ! Stop failover.",
  17.           $dead_master->get_hostinfo(), $rc
  18.         )
  19.       );
  20.       croak;
  21.     }
  22.     $log->info(" ok.");
  23.   }
复制代码
 

来源:https://www.cnblogs.com/xuliuzai/p/17978546
免责声明:由于采集信息均来自互联网,如果侵犯了您的权益,请联系我们【E-Mail:cb@itdo.tech】 我们会及时删除侵权内容,谢谢合作!

举报 回复 使用道具