redis sentinel failover测试

1、Setinel的参数配置

1) 主观下线时间(down-after-milliseconds):20s,20s没有收到redis的任何回复,sentinel认为redis已经下线。

2) 客观下线:sentinel集群中一半以上的节点都确认了主观下线,redis将进入客观下线状态。O_DOWN将会触发sentinel leader执行failover。

3) fail-over timeout:在leader节点执行了failover之后,其他节点将重新尝试failover,避免之前的failover失败。

2、failover测试与过程观察

数据写入脚本

#!/bin/bash
host=$1
port=$2

for((i=1;$i<=1000000;i=$i+1))
do
    nu=$RANDOM
    date
    echo -en "set $nu $nu \r\n" | /apps/svr/redis/bin/redis-cli -h $host -p $port
    echo -en "del $nu     \r\n" | /apps/svr/redis/bin/redis-cli -h $host -p $port
    sleep 1
done

写入日志分析

OK
1

Thu Apr 30 18:05:29 CST 2015     # 第一次出现error的时间
ERR Connection refused

....

ERR Connection refused

Thu Apr 30 18:05:50 CST 2015  # 最后一次出现error的时间,中间长达21s时间内,有写入异常
OK

failover时长21s

2) master 日志

...skipping...
[64965] 30 Apr 18:05:29.342 * Removing the pid file.        # 关机时间
[64965] 30 Apr 18:05:29.342 # Redis is now ready to exit, bye bye...

3) 从库redis日志

[6217] 30 Apr 18:05:49.732 * Connecting to MASTER 192.168.40.178:8085
[6217] 30 Apr 18:05:49.732 * MASTER <-> SLAVE sync started
[6217] 30 Apr 18:05:49.732 # Error condition on socket for SYNC: Connection refused
[6217] 30 Apr 18:05:49.815 * Discarding previously cached master state.
[6217] 30 Apr 18:05:49.815 * MASTER MODE enabled (user request)          # slave进入master状态
[6217] 30 Apr 18:05:49.816 # CONFIG REWRITE executed with success.

从库进入master状态,并行执行了configure rewrite

4)sentinel日志

[17421] 30 Apr 18:05:49.439 # +sdown master twemproxy-9208-master01 192.168.40.178 8085               
         # subjectively down 时间
[17421] 30 Apr 18:05:49.503 # +odown master twemproxy-9208-master01 192.168.40.178 8085 #quorum 8/6    # objectively down 时间
[17421] 30 Apr 18:05:49.503 # +new-epoch 9490 
[17421] 30 Apr 18:05:49.503 # +try-failover master twemproxy-9208-master01 192.168.40.178 8085
[17421] 30 Apr 18:05:49.574 # +vote-for-leader 0f7a9f82cd7b954a612678e0d587c2bc2ea71449 9490
[17421] 30 Apr 18:05:49.575 # 192.168.55.76:26379 voted for 85d43c165ad0e6a2b08d726c4e77cdf9416543fd 9490
[17421] 30 Apr 18:05:49.575 # 192.168.55.75:26379 voted for 85d43c165ad0e6a2b08d726c4e77cdf9416543fd 9490
[17421] 30 Apr 18:05:49.575 # 192.168.56.66:26379 voted for 814ef0eef1b0b5acd89dfff1470bbf65740270f1 9490
[17421] 30 Apr 18:05:49.578 # 192.168.55.79:26379 voted for b59a31537966b250a5451bd8e61d6824b6d4aad2 9490
[17421] 30 Apr 18:05:49.580 # 192.168.56.69:26379 voted for da24c25199866de685d59f110a69b53fe3793b1e 9490
[17421] 30 Apr 18:05:49.585 # 192.168.56.65:26379 voted for 85d43c165ad0e6a2b08d726c4e77cdf9416543fd 9490
[17421] 30 Apr 18:05:49.595 # 192.168.55.77:26379 voted for 85d43c165ad0e6a2b08d726c4e77cdf9416543fd 9490
[17421] 30 Apr 18:05:49.603 # 192.168.56.68:26379 voted for 85d43c165ad0e6a2b08d726c4e77cdf9416543fd 9490
[17421] 30 Apr 18:05:49.616 # 192.168.55.78:26379 voted for 85d43c165ad0e6a2b08d726c4e77cdf9416543fd 9490
[17421] 30 Apr 18:05:50.659 # +config-update-from sentinel 192.168.55.75:26379 192.168.55.75 26379 @ twemproxy-9208-master01 192.168.40.178 8085
[17421] 30 Apr 18:05:50.659 # +switch-master twemproxy-9208-master01 192.168.40.178 8085 192.168.40.177 8085
[17421] 30 Apr 18:05:50.659 * +slave slave 192.168.40.178:8085 192.168.40.178 8085 @ twemproxy-9208-master01 192.168.40.177 8085
[17421] 30 Apr 18:05:50.828 # -script-error /apps/sh/redis/reconfig.py 0 2                                
        # 脚本执行时间
[17421] 30 Apr 18:05:50.855 # +reset-master master twemproxy-9208-master01 192.168.40.177 8085

结论:
1) 整个主从的切换时间在down-after-milliseconds多点(+2s)内完成切换。
2) sentinel leader的推选在产生odown之后进行的。

此条目发表在redis分类目录,贴了, , , 标签。将固定链接加入收藏夹。

发表评论

电子邮件地址不会被公开。 必填项已用*标注

您可以使用这些HTML标签和属性: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>