PGPool-II+PG流复制实现HA主备切换

架构图

基础信息

PostgreSQL

版本:10.4
主库db:192.168.1.15 监听端口5432
备库db:192.168.1.16 监听端口5432
vip:192.168.1.88

PGPool-II

版本:3.7.3

机器信息

1
2
3
4
# cat /etc/redhat-release
CentOS Linux release 7.4.1708 (Core)
# uname -a
Linux 192-168-1-16 3.10.0-693.5.2.el7.x86_64 #1 SMP Fri Oct 20 20:32:50 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux

安装PGPool-II(RPM方式)

安装依赖

1
yum -y install libmemcached-devel

准备RPM包

下载地址

1
2
3
4
pgpool-II-pg10-3.7.3-1pgdg.rhel7.x86_64.rpm
pgpool-II-pg10-debuginfo-3.7.3-1pgdg.rhel7.x86_64.rpm
pgpool-II-pg10-devel-3.7.3-1pgdg.rhel7.x86_64.rpm
pgpool-II-pg10-extensions-3.7.3-1pgdg.rhel7.x86_64.rpm

安装

1
rpm -ivh *.rpm

配置

默认配置文件目录:/etc/pgpool-II

配置ssh秘钥

配置秘钥使master和slave的postgres用户能免密连接

先修改postgres的密码,在root用户下

1
passwd postgres

在master,slave机器上都生成ssh

1
2
3
4
[root@localhost ~]# su - postgres
[postgres@localhost ~]$ ssh-keygen -t rsa
[postgres@localhost ~]$ cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
[postgres@localhost ~]$ chmod 600 ~/.ssh/authorized_keys

分别将master的公钥复制到slave,slave的公钥复制到master

配置系统命令权限

配置 ifconfig, arping 执行权限 ,执行failover_stream.sh需要用到,可以让其他普通用户执行

1
2
chmod u+s /sbin/ifconfig 
chmod u+s /usr/sbin

配置pcp.conf

pcp.conf是配置pgpool-II自己的用户名和密码,pgpool提供pcp接口,可以查看,管理pgpool的状态,并且可以远程操作pgpool

1
2
3
4
5
6
7
# 使用pg_md5生成密码
pg_md5 password
5f4dcc3b5aa765d61d8327deb882cf99
# 添加到pcp.conf配置文件
vim pcp.conf
# 编辑内容如下
postgres:5f4dcc3b5aa765d61d8327deb882cf99

配置pool_hba.conf

和PostgreSQL的pg_hba.conf文件一样, Pgpool-II使用名为pool_hba.conf的配置文件支持类似的客户端认证功能,要和pg的pg_hba.conf保持一致,要么都是trust,要么都是md5验证方式

1
2
3
4
5
6
7
# TYPE  DATABASE    USER        CIDR-ADDRESS          METHOD

# "local" is for Unix domain socket connections only
local all all md5
# IPv4 local connections:
host all all 0.0.0.0/0 md5
host all all ::0/0 md5

在pgpool中添加pg数据库的用户名和密码

由于Pgpool-II是一个在PostgreSQL服务器和PostgreSQL数据库客户端之间工作的中间件 ,所以当客户端应用程序连接到Pgpool-II时,Pgpool-II inturn会使用相同的凭据连接到PostgreSQL服务器,以便为客户端连接提供服务。
执行如下命令,会在配置目录生成pool_passwd文件

1
2
3
pg_md5 -p -m -u postgres pool_passwd
cat pool_passwd
postgres:md5a3556571e93b0d20722ba62be61e8c2d

配置pgpool.conf

pgpool.conf是Pgpool-II的主要配置文件

创建健康检查用户

1
2
create user pgpool_healthchk with password 'xxxx';
grant all privileges on database postgres to pgpool_healthchk;

master配置

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
# CONNECTIONS
listen_addresses = '*'
port = 9999
pcp_listen_addresses = '*'
pcp_port = 9898


# - Backend Connection Settings -

backend_hostname0 = 'master'
backend_port0 = 5432
backend_weight0 = 1
backend_data_directory0 = '/opt/PostgreSQL/data'
backend_flag0 = 'ALLOW_TO_FAILOVER'

backend_hostname1 = 'slave'
backend_port1 = 5432
backend_weight1 = 1
backend_data_directory1 = '/opt/PostgreSQL/data'
backend_flag1 = 'ALLOW_TO_FAILOVER'

# - Authentication -
enable_pool_hba = on
pool_passwd = 'pool_passwd'

# FILE LOCATIONS
pid_file_name = '/opt/pgpool/pgpool.pid'

replication_mode = off
master_slave_mode = on
master_slave_sub_mode = 'stream'

sr_check_period = 5
sr_check_user = 'repuser'
sr_check_password = 'xxxx'
sr_check_database = 'postgres'

health_check_period = 10 # Health check period
# Disabled (0) by default
health_check_timeout = 20
# Health check timeout
# 0 means no timeout
health_check_user = 'pgpool_healthchk'
# Health check user
health_check_password = 'xxxx'
# Password for health check user
health_check_database = 'postgres'
#必须设置,否则primary数据库down了,pgpool不知道,不能及时切换。从库流复制还在连接数据,报连接失败。
#只有下次使用pgpool登录时,发现连接不上,然后报错,这时候,才知道挂了,pgpool进行切换。


#主备切换的命令行配置
#------------------------------------------------------------------------------
# FAILOVER AND FAILBACK
#------------------------------------------------------------------------------

failover_command = '/opt/pgpool/failover_stream.sh %H'

#------------------------------------------------------------------------------
# WATCHDOG
#------------------------------------------------------------------------------

# - Enabling -
use_watchdog = on
# - Watchdog communication Settings -

wd_hostname = 'master'
# Host name or IP address of this watchdog
# (change requires restart)
wd_port = 9000
# port number for watchdog service
# (change requires restart)
# - Virtual IP control Setting -

delegate_IP = '192.168.1.88'
# delegate IP address
# If this is empty, virtual IP never bring up.
# (change requires restart)
if_cmd_path = '/sbin'
# path to the directory where if_up/down_cmd
# (change requires restart)
if_up_cmd = 'ifconfig eth0:0 inet $_IP_$ netmask 255.255.255.0'
# startup delegate IP command
# (change requires restart)
if_down_cmd = 'ifconfig eth0:0 down'
# shutdown delegate IP command
# (change requires restart)
# -- heartbeat mode --

wd_heartbeat_port = 9694
# Port number for receiving heartbeat signal
# (change requires restart)
wd_heartbeat_keepalive = 2
# Interval time of sending heartbeat signal (sec)
# (change requires restart)
wd_heartbeat_deadtime = 30
# Deadtime interval for heartbeat signal (sec)
# (change requires restart)
heartbeat_destination0 = 'slave'
# Host name or IP address of destination 0
# for sending heartbeat signal.
# (change requires restart)
heartbeat_destination_port0 = 9694
# Port number of destination 0 for sending
# heartbeat signal. Usually this is the
# same as wd_heartbeat_port.
# (change requires restart)
heartbeat_device0 = 'eth0'
# Name of NIC device (such like 'eth0')
# used for sending/receiving heartbeat
# signal to/from destination 0.
# This works only when this is not empty
# and pgpool has root privilege.
# (change requires restart)
# - Other pgpool Connection Settings -

other_pgpool_hostname0 = 'slave' # Host name or IP address to connect to for
# (change requires restart)
other_pgpool_port0 = 9999
# Port number for othet pgpool 0
# (change requires restart)
other_wd_port0 = 9000
# Port number for othet watchdog 0
# (change requires restart)

slave配置

仅列出与master不同的配置项

1
2
heartbeat_destination0 = 'master'
other_pgpool_hostname0 = 'master'

创建failover_stream脚本

在master\slave分别操作

1
2
3
4
mkdir /opt/pgpool 
chown postgres:postgres /opt/pgpool
cd /opt/pgpool
vim failover_stream.sh
1
2
3
4
5
6
7
8
9
10
11
12
13
#! /bin/sh 
# Failover command for streaming replication.
# Arguments: $1: new master hostname.

PGHOME='/usr/pgsql-10'
PGDATA='/opt/PostgreSQL/data'
new_master=$1
trigger_command="$PGHOME/bin/pg_ctl promote -D $PGDATA"

# Prompte standby database.
/usr/bin/ssh -T $new_master $trigger_command

exit 0;
1
chown postgres:postgres failover_stream.sh &&chmod 777 failover_stream.sh

启动

创建相关文件夹

1
2
3
4
mkdir /var/log/pgpool
chown -R postgres.postgres /var/log/pgpool
mkdir /var/run/pgpool
chown -R postgres.postgres /var/run/pgpool

启动

1
2
3
4
# 启动
systemctl start pgpool.service
# 停止
systemctl stop pgpool.service

查看集群状态

1
2
3
4
5
6
7
8
9
10
11
12
13
[root@172-16-200-15 ~]# psql -h vip -p 9999 -U postgres
用户 postgres 的口令:
psql (10.4)
输入 "help" 来获取帮助信息.

postgres=# show pool_nodes;
node_id | hostname | port | status | lb_weight | role | select_cnt | load_balance_node | replication_delay
---------+----------+------+--------+-----------+---------+------------+-------------------+-------------------
0 | master | 5432 | up | 0.500000 | primary | 0 | true | 0
1 | slave | 5432 | up | 0.500000 | standby | 0 | false | 0
(2 行记录)

#在slave上节点也是psql -h vip -p 9999,双pgpool使用虚拟ip,做到高可用。

HA测试

模拟pgpool宕机

在master节点操作

1
2
3
4
5
6
7
8
9
10
11
12
13
14
# 停止pgpool
systemctl stop pgpool
# 访问
[postgres@postgres ~]# psql -h vip -p 9999 -U postgres
用户 postgres 的口令:
psql (10.4)
输入 "help" 来获取帮助信息.

postgres=# show pool_nodes;
node_id | hostname | port | status | lb_weight | role | select_cnt | load_balance_node | replication_delay
---------+----------+------+--------+-----------+---------+------------+-------------------+-------------------
0 | master | 5432 | up | 0.500000 | primary | 0 | true | 0
1 | slave | 5432 | up | 0.500000 | standby | 0 | false | 0
(2 行记录)

模拟PG主库宕机

在master操作

1
2
# 停止postgresql
systemctl stop postgresql-10.service

恢复master节点

配置recovery.conf

1
2
find  / -name "recovery.conf.sample"
cp /usr/pgsql-10/share/recovery.conf.sample $PGDATA/recovery.done

然后在recovery.done中添加如下内容

1
2
3
4
recovery_target_timeline = 'latest'
standby_mode = on # 指明是否开启服务器作为一个备机。在流复制里,这个参数必须要开启。
primary_conninfo = 'host=salve port=5432 user=repl password=xxxx' # 指明用于备服务器连接到主服务器的连接字符串。
trigger_file = '/opt/PostgreSQL/pg.trigger' # 指定一个触发文件让备服务器感觉到它的时候就会停止流复制(即:故障转移),不要创建这个文件。当你想主从切换的时候才需要创建它。

启动

1
systemctl start postgresql-10.service

加入节点

1
pcp_attach_node -d -U postgres -h vip -p 9898 -n 0

参考
  1. http://www.pgpool.net/docs/latest/en/html/example-configs.html
  2. http://www.pgpool.net/pgpool-web/contrib_docs/watchdog/en.html#more_switch
  3. https://www.jianshu.com/p/ef183d0a9213
  4. https://www.jianshu.com/p/41d857a5d743