Centos 7 搭建 Redis 高可用集群

前言

软件环境

软件版本
CentOS7.9
Redis6.0.6

集群节点规划

Redis 集群至少一共需要 6 个节点,包括 3 个 Master 节点和 3 个 Slave 节点,且每个 Master 节点对应 1 个 Slave 节点,对应的关系如下:

  • 1 Master –> 1 Slave,Redis 集群需要 6 个节点,如图所示
  • 1 Master –> 2 Slave,Redis 集群需要 9 个节点,以此类推,如图所示
名称 IP 端口
Master192.168.1097001
Master192.168.1097002
Master192.168.1097003
Slave192.168.1097004
Slave192.168.1097005
Slave192.168.1097006

Redis 集群特性

Redis 集群的优点

无中心架构,分布式提供服务。数据按照 slot 存储分布在多个 Redis 实例上。增加 Slave 做 Standby 数据副本,用于 Failover,使集群快速恢复。实现故障 Auto Failover,节点之间通过 gossip 协议交换状态信息;投票机制完成 Slave 到 Master 角色的提升。支持在线增加或减少节点,降低硬件成本和运维成本,提高系统的扩展性和可用性。

Redis 集群的缺点

客户端实现复杂,驱动要求实现 Smart Client,缓存 Slots Mapping 信息并及时更新。目前仅 JedisCluster 相对成熟,异常处理部分还不完善。客户端的不成熟,影响应用的稳定性,提高开发难度。节点会因为某些原因发生阻塞(阻塞时间大于 clutser-node-timeout),被判断为下线。这种 Failover 是没有必要的,Sentinel 模式也存在这种切换场景。

Redis 集群搭建

系统初始化

1
2
3
4
5
6
7
8
9
# 添加配置一
# echo "net.core.somaxconn = 1024" >> /etc/sysctl.conf
# echo "vm.overcommit_memory = 1" >> /etc/sysctl.conf

# 添加配置二
# echo "echo never > /sys/kernel/mm/transparent_hugepage/enabled" >> /etc/rc.local

# 重启系统
# reboot

创建 Redis 用户

1
2
3
4
5
# 创建redis用户组
# groupadd redis

# 创建redis用户(不允许远程登录)
# useradd -g redis redis -s /bin/false

Redis 编译安装

Redis 各版本可以从官网下载,这里使用的版本是 6.0.6

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
# 安装依赖
# yum install -y centos-release-scl devtoolset-9 scl-utils-build tcl

# 临时启用GCC9编译环境
# scl enable devtoolset-9 bash

# 下载文件
# wget http://download.redis.io/releases/redis-6.0.6.tar.gz

# 解压文件
# tar -xvf redis-6.0.6.tar.gz

# 进入解压目录
# cd redis-6.0.6

# 编译
# make

# 安装
# make install PREFIX=/usr/local/redis

# 创建软连接(可选)
# ln -s /usr/local/redis/bin/redis-benchmark /usr/local/bin/redis-benchmark
# ln -s /usr/local/redis/bin/redis-check-aof /usr/local/bin/redis-check-aof
# ln -s /usr/local/redis/bin/redis-check-rdb /usr/local/bin/redis-check-rdb
# ln -s /usr/local/redis/bin/redis-sentinel /usr/local/bin/redis-sentinel
# ln -s /usr/local/redis/bin/redis-server /usr/local/bin/redis-server
# ln -s /usr/local/redis/bin/redis-cli /usr/local/bin/redis-cli

# 拷贝配置文件
# cp redis.conf /usr/local/redis

# 创建日志目录
# mkdir -p /var/log/redis

# 文件授权
# chown -R redis:redis /var/log/redis
# chown -R redis:redis /usr/local/redis

更改 Redis 的基础配置内容,其中有些配置文件的文件名都包含了端口号,这是为了后面方便使用不同的端口号来区分各个节点

1
2
3
4
5
6
7
8
9
10
11
12
13
14
# 更改基础配置
# vim /usr/local/redis/redis.conf

io-threads 2
daemonize yes
# bind 127.0.0.1
protected-mode no
masterauth 123456
requirepass 123456
dbfilename dump_6379.rdb
pidfile /var/run/redis_6379.pid
cluster-config-file nodes_6379.conf
appendfilename "appendonly_6379.aof"
logfile "/var/log/redis/redis_6379.log"

验证 Redis 是否安装成功

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
# 切换Redis用户
# su redis

# 进入安装目录
$ cd /usr/local/redis

# 启动Redis
$ ./bin/redis-server redis.conf

# 查看Redis的运行状态
$ ps -aux|grep redis

# 查看Redis的启动日志
$ more /var/log/redis/redis_6379.log

# 关闭Redis
$ ./bin/redis-cli
127.0.0.1:6379> auth 123456
127.0.0.1:6379> shutdown

Redis 搭建集群

创建 Redis 集群各节点的安装文件,并更改与端口相关的所有配置内容(例如:port、pidfile、dbfilename、logfile、cluster-config-file),同时开启对集群的支持

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
# 创建集群目录
# mkdir -p /usr/local/redis-cluster

# 拷贝各节点的安装文件
# cp -r /usr/local/redis /usr/local/redis-cluster/redis-7001
# cp -r /usr/local/redis /usr/local/redis-cluster/redis-7002
# cp -r /usr/local/redis /usr/local/redis-cluster/redis-7003
# cp -r /usr/local/redis /usr/local/redis-cluster/redis-7004
# cp -r /usr/local/redis /usr/local/redis-cluster/redis-7005
# cp -r /usr/local/redis /usr/local/redis-cluster/redis-7006

# 更改各节点里与端口相关的所有配置项
# sed -i "s/6379/7001/g" /usr/local/redis-cluster/redis-7001/redis.conf
# sed -i "s/6379/7002/g" /usr/local/redis-cluster/redis-7002/redis.conf
# sed -i "s/6379/7003/g" /usr/local/redis-cluster/redis-7003/redis.conf
# sed -i "s/6379/7004/g" /usr/local/redis-cluster/redis-7004/redis.conf
# sed -i "s/6379/7005/g" /usr/local/redis-cluster/redis-7005/redis.conf
# sed -i "s/6379/7006/g" /usr/local/redis-cluster/redis-7006/redis.conf

# 开启各节点对集群的支持
# sed -i "s/# cluster-enabled/cluster-enabled/g" `find /usr/local/redis-cluster -type f -name "redis.conf"`
# sed -i "s/# cluster-config-file/cluster-config-file/g" `find /usr/local/redis-cluster -type f -name "redis.conf"`
# sed -i "s/# cluster-node-timeout/cluster-node-timeout/g" `find /usr/local/redis-cluster -type f -name "redis.conf"`

# 文件授权
# chown -R redis:redis /usr/local/redis-cluster

拷贝 Redis 的集群管理工具

1
2
3
4
5
6
7
8
# 进入Redis的解压目录
# cd redis-6.0.6

# 拷贝集群管理工具
# cp src/redis-trib.rb /usr/local/redis-cluster

# 文件授权
# chown -R redis:redis /usr/local/redis-cluster/redis-trib.rb

创建 Shell 脚本批量启动 Redis 集群的各个节点

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
# vim /usr/local/redis-cluster/start-cluster.sh

#!/bin/bash
REDIS_CLUSTER_HOME=/usr/local/redis-cluster
cd $REDIS_CLUSTER_HOME
cd redis-7001
./bin/redis-server redis.conf
cd ..
cd redis-7002
./bin/redis-server redis.conf
cd ..
cd redis-7003
./bin/redis-server redis.conf
cd ..
cd redis-7004
./bin/redis-server redis.conf
cd ..
cd redis-7005
./bin/redis-server redis.conf
cd ..
cd redis-7006
./bin/redis-server redis.conf

Shell 脚本授权执行

1
2
3
# 文件授权
# chmod +x /usr/local/redis-cluster/start-cluster.sh
# chown -R redis:redis /usr/local/redis-cluster/start-cluster.sh

Redis 集群设置密码

若需要对集群各节点设置密码,那么 requirepassmasterauth 都需要同时设置,且两者的密码必须一致,否则发生主从切换时,就会遇到授权问题。值得一提的是,在使用 redis-trib.rb 或者 redis-cli 构建集群的时候,两者设置密码的方式是不一样的,具体如下:

  • redis-trib.rb:如果是使用 redis-trib.rb 工具构建集群,集群构建完成前不要配置密码,集群构建完毕需要执行以下命令逐个节点机器设置密码,不需要重启节点
1
2
3
4
$ redis-cli -c -p 7001
config set masterauth 123456
config set requirepass 123456
config rewrite
  • redis-cli:如果是使用 redis-cli 构建集群,首先需要在集群各节点的 redis.conf 中配置密码,包括 requirepassmasterauth,然后在构建集群的命令行里加入 -a password 参数,其中的 password 就是集群各节点的密码
1
2
masterauth 123456
requirepass 123456
1
2
3
4
5
6
7
8
$ redis-cli -a 123456 --cluster create \
192.168.109:7001 \
192.168.109:7002 \
192.168.109:7003 \
192.168.109:7004 \
192.168.109:7005 \
192.168.109:7006 \
--cluster-replicas 1

Redis 集群构建启动

首先执行 Shell 脚本批量启动所有 Redis 节点,切记不能以 Root 用户的身份启动 Redis,否则会造成系统重大安全隐患

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
# 切换到Redis用户
# su redis

# 启动集群节点
$ ./usr/local/redis-cluster/start-cluster.sh

# 查看各节点的运行状态
$ ps -aux|grep redis
redis 32641 0.0 0.0 181880 7688 ? Ssl 21:33 0:00 ./bin/redis-server *:7001 [cluster]
redis 32649 0.0 0.0 181880 7688 ? Ssl 21:33 0:00 ./bin/redis-server *:7002 [cluster]
redis 32657 0.0 0.0 181880 7688 ? Ssl 21:33 0:00 ./bin/redis-server *:7003 [cluster]
redis 20814 0.0 0.0 181880 7688 ? Ssl 21:33 0:00 ./bin/redis-server *:7004 [cluster]
redis 20822 0.0 0.0 181880 7688 ? Ssl 21:33 0:00 ./bin/redis-server *:7005 [cluster]
redis 20830 0.0 0.0 181880 7688 ? Ssl 21:33 0:00 ./bin/redis-server *:7006 [cluster]

使用 redis-trib.rb 工具构建集群时,在 6.0.6 里面会给打印提示,让你使用 redis-cli 命令来构建集群,并提供给你需要使用的命令,使其和 redis-trib.rb 达到一致的效果(这样就可以不用再单独的安装 Ruby),原本使用 redis-trib.rb 的语句如下

1
2
3
4
5
6
7
$ ./redis-trib.rb create --replicas 1 \
192.168.109:7001 \
192.168.109:7002 \
192.168.109:7003 \
192.168.109:7004 \
192.168.109:7005 \
192.168.109:7006

提供使用的 redis-cli 的语句如下,建议使用 redis-cli 命令来构建 Redis 集群,因为这样就不需要额外安装 Ruby

1
2
3
4
5
6
7
8
$ redis-cli -a 123456 --cluster create \
192.168.109:7001 \
192.168.109:7002 \
192.168.109:7003 \
192.168.109:7004 \
192.168.109:7005 \
192.168.109:7006 \
--cluster-replicas 1

可以看出两个语句都差不多,而且语句意思也差不多,--cluster-replicas 1 表示主备的比例关系为 1,即一个主节点对应一个备节点,前三个 ip:port 默认表示主节点,后面的依次为前三个主节点的备节点。在生产环境使用多台服务器搭建 Redis 集群时,为了保证高可用(在任意一台服务器挂了的情况下都不影响 Redis 集群的使用),主备节点不可以部署在同一台服务器上,因为主备节点在同一台服务器上,则备节点也没有太大的意义了,所以要错开对应。当主节点宕机后,备节点可以充当主节点继续工作,使 Redis 集群正常运行。


执行完构建集群的命令后(只需执行一次),Redis 默认罗列出集群的对应关系来让你确定,输入 yes 完成集群创建即可

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
>>> Performing hash slots allocation on 6 nodes...
Master[0] -> Slots 0 - 5460
Master[1] -> Slots 5461 - 10922
Master[2] -> Slots 10923 - 16383
Adding replica 192.168.1.109:7006 to 192.168.1.109:7001
Adding replica 192.168.1.109:7003 to 192.168.1.109:7004
Adding replica 192.168.1.109:7005 to 192.168.1.109:7002
M: 225e37e5bb340467fb58b6f9d14cfb1893bf92d5 192.168.1.109:7001
slots:[0-5460] (5461 slots) master
M: 283abb498445ffd6206f24c451ac0b9fb7129383 192.168.1.109:7002
slots:[10923-16383] (5461 slots) master
M: 7a1229732ada6ae8d8eb51ae7b7cac6242a6f8d4 192.168.1.109:7004
slots:[5461-10922] (5462 slots) master
S: cde86683e2d314fd52cf8708f78935c6648ea3c6 192.168.1.109:7003
replicates 7a1229732ada6ae8d8eb51ae7b7cac6242a6f8d4
S: 1f3f441d619ceeac55ae91015a3f46ede37352bb 192.168.1.109:7005
replicates 283abb498445ffd6206f24c451ac0b9fb7129383
S: f8a5d94e9928ed615514f23ddaabd259134af709 192.168.1.109:7006
replicates 225e37e5bb340467fb58b6f9d14cfb1893bf92d5
Can I set the above configuration? (type 'yes' to accept):
>>> Nodes configuration updated
>>> Assign a different config epoch to each node
>>> Sending CLUSTER MEET messages to join the cluster
Waiting for the cluster to join
.
>>> Performing Cluster Check (using node 192.168.1.109:7001)
M: 225e37e5bb340467fb58b6f9d14cfb1893bf92d5 192.168.1.109:7001
slots:[0-5460] (5461 slots) master
1 additional replica(s)
M: 7a1229732ada6ae8d8eb51ae7b7cac6242a6f8d4 192.168.1.109:7004
slots:[5461-10922] (5462 slots) master
1 additional replica(s)
S: f8a5d94e9928ed615514f23ddaabd259134af709 192.168.1.109:7006
slots: (0 slots) slave
replicates 225e37e5bb340467fb58b6f9d14cfb1893bf92d5
S: 1f3f441d619ceeac55ae91015a3f46ede37352bb 192.168.1.109:7005
slots: (0 slots) slave
replicates 283abb498445ffd6206f24c451ac0b9fb7129383
M: 283abb498445ffd6206f24c451ac0b9fb7129383 192.168.1.109:7002
slots:[10923-16383] (5461 slots) master
1 additional replica(s)
S: cde86683e2d314fd52cf8708f78935c6648ea3c6 192.168.1.109:7003
slots: (0 slots) slave
replicates 7a1229732ada6ae8d8eb51ae7b7cac6242a6f8d4
[OK] All nodes agree about slots configuration.
>>> Check for open slots...
>>> Check slots coverage...
[OK] All 16384 slots covered.

测试 Redis 集群

Redis 客户端登录进某个集群节点,登录时需要指定密码,下面可以看到数据放入的哈希槽为 [12182],属于 192.168.1.109:7002 所管控的节点,所以就直接跳转到 192.168.1.109:7002 节点来获取刚才放入的数据

1
2
3
4
5
6
7
8
$ redis-cli -c -p 7001 -a 123456

127.0.0.1:7001> set foo hello
-> Redirected to slot [12182] located at 192.168.1.109:7002
OK
192.168.1.109:7002> get foo
"hello"
192.168.1.109:7002>

查看 Redis 当前集群的信息

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
$ redis-cli -c -p 7001 -a 123456

127.0.0.1:7001> cluster info
cluster_state:ok
cluster_slots_assigned:16384
cluster_slots_ok:16384
cluster_slots_pfail:0
cluster_slots_fail:0
cluster_known_nodes:6
cluster_size:3
cluster_current_epoch:7
cluster_my_epoch:1
cluster_stats_messages_ping_sent:3154
cluster_stats_messages_pong_sent:3377
cluster_stats_messages_fail_sent:4
cluster_stats_messages_auth-ack_sent:1
cluster_stats_messages_sent:6536
cluster_stats_messages_ping_received:3372
cluster_stats_messages_pong_received:3154
cluster_stats_messages_meet_received:5
cluster_stats_messages_auth-req_received:1
cluster_stats_messages_received:6532

查看 Redis 特定节点的状态

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
$ redis-cli --cluster check 192.168.1.109:7003 -a 123456

Warning: Using a password with '-a' or '-u' option on the command line interface may not be safe.
192.168.1.109:7003 (cde86683...) -> 0 keys | 5462 slots | 1 slaves.
192.168.1.109:7002 (283abb49...) -> 1 keys | 5461 slots | 1 slaves.
192.168.1.109:7001 (225e37e5...) -> 0 keys | 5461 slots | 1 slaves.
[OK] 1 keys in 3 masters.
0.00 keys per slot on average.
>>> Performing Cluster Check (using node 192.168.1.109:7003)
M: cde86683e2d314fd52cf8708f78935c6648ea3c6 192.168.1.109:7003
slots:[5461-10922] (5462 slots) master
1 additional replica(s)
S: 1f3f441d619ceeac55ae91015a3f46ede37352bb 192.168.1.109:7005
slots: (0 slots) slave
replicates 283abb498445ffd6206f24c451ac0b9fb7129383
S: 7a1229732ada6ae8d8eb51ae7b7cac6242a6f8d4 192.168.1.109:7004
slots: (0 slots) slave
replicates cde86683e2d314fd52cf8708f78935c6648ea3c6
M: 283abb498445ffd6206f24c451ac0b9fb7129383 192.168.1.109:7002
slots:[10923-16383] (5461 slots) master
1 additional replica(s)
S: f8a5d94e9928ed615514f23ddaabd259134af709 192.168.1.109:7006
slots: (0 slots) slave
replicates 225e37e5bb340467fb58b6f9d14cfb1893bf92d5
M: 225e37e5bb340467fb58b6f9d14cfb1893bf92d5 192.168.1.109:7001
slots:[0-5460] (5461 slots) master
1 additional replica(s)
[OK] All nodes agree about slots configuration.
>>> Check for open slots...
>>> Check slots coverage...
[OK] All 16384 slots covered.

查看 Redis 所有集群节点的信息

1
2
3
4
5
6
7
8
9
$ redis-cli -c -p 7001 -a 123456

127.0.0.1:7001> cluster nodes
7a1229732ada6ae8d8eb51ae7b7cac6242a6f8d4 192.168.1.109:7004@17004 master - 0 1616460018217 3 connected 5461-10922
225e37e5bb340467fb58b6f9d14cfb1893bf92d5 192.168.1.109:7001@17001 myself,master - 0 1616460015000 1 connected 0-5460
f8a5d94e9928ed615514f23ddaabd259134af709 192.168.1.109:7006@17006 slave 225e37e5bb340467fb58b6f9d14cfb1893bf92d5 0 1616460018000 1 connected
1f3f441d619ceeac55ae91015a3f46ede37352bb 192.168.1.109:7005@17005 slave 283abb498445ffd6206f24c451ac0b9fb7129383 0 1616460016000 2 connected
283abb498445ffd6206f24c451ac0b9fb7129383 192.168.1.109:7002@17002 master - 0 1616460016000 2 connected 10923-16383
cde86683e2d314fd52cf8708f78935c6648ea3c6 192.168.1.109:7003@17003 slave 7a1229732ada6ae8d8eb51ae7b7cac6242a6f8d4 0 1616460017000 3 connected

验证主从切换,从上面的集群信息可以观察到 192.168.1.109:7003 节点是 192.168.1.109:7004 的 Slave 节点,因此可以 Kill 掉 192.168.1.109:7004 Master 节点的进程,然后观察 192.168.1.109:7003 节点会不会选举为新的 Master 节点,若可以则说明主从切换成功,此时 192.168.1.109:7003 节点的日志信息如下:

1
2
3
4
5
6
7
8
9
11970:S 21 Jul 2020 22:48:40.080 * Connecting to MASTER 192.168.1.109:7004
11970:S 21 Jul 2020 22:48:40.080 * MASTER <-> REPLICA sync started
11970:S 21 Jul 2020 22:48:40.081 # Error condition on socket for SYNC: Operation now in progress
11970:S 21 Jul 2020 22:48:40.982 # Starting a failover election for epoch 7.
11970:S 21 Jul 2020 22:48:40.985 # Failover election won: I'm the new master.
11970:S 21 Jul 2020 22:48:40.985 # configEpoch set to 7 after successful failover
11970:M 21 Jul 2020 22:48:40.985 * Discarding previously cached master state.
11970:M 21 Jul 2020 22:48:40.985 # Setting secondary replication ID to 00c7b21f3980b471d3373792d9d61bedf7e424e6, valid up to offset: 2059. New replication ID is c9f299ab0a8124a56d76e0e8a458135893b45336
11970:M 21 Jul 2020 22:48:40.985 # Cluster state changed: ok

最后重新启动 192.168.1.109:7004 节点,可以发现它会变为 192.168.1.109:7003 节点的 Slave 节点

1
2
3
4
5
6
7a1229732ada6ae8d8eb51ae7b7cac6242a6f8d4 192.168.1.109:7004@17004 slave cde86683e2d314fd52cf8708f78935c6648ea3c6 0 1616461490000 7 connected
225e37e5bb340467fb58b6f9d14cfb1893bf92d5 192.168.1.109:7001@17001 myself,master - 0 1616461492000 1 connected 0-5460
f8a5d94e9928ed615514f23ddaabd259134af709 192.168.1.109:7006@17006 slave 225e37e5bb340467fb58b6f9d14cfb1893bf92d5 0 1616461492000 1 connected
1f3f441d619ceeac55ae91015a3f46ede37352bb 192.168.1.109:7005@17005 slave 283abb498445ffd6206f24c451ac0b9fb7129383 0 1616461492010 2 connected
283abb498445ffd6206f24c451ac0b9fb7129383 192.168.1.109:7002@17002 master - 0 1616461491000 2 connected 10923-16383
cde86683e2d314fd52cf8708f78935c6648ea3c6 192.168.1.109:7003@17003 master - 0 1616461493010 7 connected 5461-10922

Redis 集群重建(初始化)

若 Redis 集群出现无法正常使用的问题,可以尝试执行以下操作来重建 Redis 集群来解决,下述操作会删除 Redis 的所有 RDB 快照数据,切记先备份好数据再进行操作。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
# 关闭所有节点服务器上的Redis
$ pkill -9 redis

# 在所有节点服务器上执行以下命令(切记先备份好Redis的快照数据)
$ find /usr/local/redis-cluster -type f -iname "dump*.rdb" | xargs rm -rf
$ find /usr/local/redis-cluster -type f -iname "nodes_*.conf" | xargs rm -rf
$ rm -rf /var/log/redis/*

# 启动所有节点服务器上的Redis
$ ./usr/local/redis-cluster/start-cluster.sh

# 执行集群构建操作
$ redis-cli -a 123456 --cluster create \
192.168.109:7001 \
192.168.109:7002 \
192.168.109:7003 \
192.168.109:7004 \
192.168.109:7005 \
192.168.109:7006 \
--cluster-replicas 1

# 查询集群信息和状态
$ redis-cli -c -p 7001 -a 123456
127.0.0.1:7001> cluster info
127.0.0.1:7001> cluster nodes

参考博客