[PostgreSQL - Repmgr 구성 & Failover test] part 1

이번엔엔 postgresql의 db서버들을 고가용성 구성을 해볼까 합니다.

DB server 3대와 witness 서버를 repmgr로 이중화를 시킨뒤, 윗단에 proxy server를 두어 connection pool과 로드밸런싱 기능으로 구성할 생각입니다.

이번 글에선 repmgr을 이용하여 DB서버를 이중화 하려고 합니다. 전체적인 구성은 아래 사진과 같습니다. 

최종 구성도
아키텍처

 

 

 테스트 환경

 
IP
Version
Primary
172.40.40.64
PostgreSQL 12.3
Replica1
172.40.40.65
PostgreSQL 12.3
Replica2
172.40.40.66
PostgreSQL 12.3
witness
172.40.40.67
 

 

 

 

 사전 준비

 

 편의를 위해 Alias 추가 (모든 노드)

[postgres@primary ~]$ vi ~/.bash_profile

alias reload='/usr/pgsql-12/bin/pg_ctl reload -D /mnt/pgsql/12/data/'
alias restart='/usr/pgsql-12/bin/pg_ctl -D /mnt/pgsql/12/data/ -l /mnt/pgsql/12/pgsql.log restart'
alias start='/usr/pgsql-12/bin/pg_ctl -D /mnt/pgsql/12/data/ -l /mnt/pgsql/12/pgsql.log start'
alias stop='/usr/pgsql-12/bin/pg_ctl -D /mnt/pgsql/12/data/ -l /mnt/pgsql/12/pgsql.log stop'
export PS1='\u@\h:$PWD# '
export PATH=$PATH:/usr/pgsql-12/bin

###REPMGR
alias show='repmgr cluster show'
alias event='repmgr cluster event'
alias log='tail -200f /mnt/pgsql/12/data/log/repmgr.log'
 
 
 

ssh키 공유 (모든 노드)

db서버에 ssh접근시 password 입력 필요없이 접속하기위해 해당 작업 수행

[root@primary ~]# su - postgres
마지막 로그인: 화  9월  7 06:55:45 KST 2021 일시 pts/0


## 전부 Enter 입력
[postgres@primary ~]$ ssh-keygen -t rsa
Generating public/private rsa key pair.
Enter file in which to save the key (/home/postgres/.ssh/id_rsa):
Created directory '/home/postgres/.ssh'.
Enter passphrase (empty for no passphrase):
Enter same passphrase again:
Your identification has been saved in /home/postgres/.ssh/id_rsa.
Your public key has been saved in /home/postgres/.ssh/id_rsa.pub.
The key fingerprint is:
SHA256:cV0W4vFWAPkI2cMqns/GTwfsgsI9hiEF9x+phRvqFSc postgres@primary
The key's randomart image is:
+---[RSA 2048]----+
|    . .    ++o=o.|
|     o . .o+=* . |
|      . E *oo+o  |
|     . ..%.o...  |
|    . o.So. o    |
|     + =o. . .   |
|      = =+. o .  |
|       o .=o .   |
|         . ..    |
+----[SHA256]-----+


[postgres@primary ~]$ cd .ssh/
[postgres@primary .ssh]$ cat id_rsa.pub >> authorized_keys

# chmod 700 .ssh/*


ssh-copy-id -i ~/.ssh/id_rsa.pub primary
ssh-copy-id -i ~/.ssh/id_rsa.pub replica1
ssh-copy-id -i ~/.ssh/id_rsa.pub replica2
ssh-copy-id -i ~/.ssh/id_rsa.pub witness
 
 
 
 
 

 테스트 시작

 

repmgr (모든 노드)

[root@primary ~]# yum install -y https://download.postgresql.org/pub/repos/yum/reporpms/EL-7-x86_64/pgdg-redhat-repo-latest.noarch.rpm


[root@primary ~]# yum list | grep repmgr*
repmgr_12.x86_64                          5.2.1-1.rhel7              @pgdg12    
repmgr_12-devel.x86_64                    5.2.1-1.rhel7              @pgdg12    
repmgr10.x86_64                           5.2.0-1.rhel7              pgdg10     
repmgr11.x86_64                           5.2.0-1.rhel7              pgdg11     
repmgr11-devel.x86_64                     5.2.0-1.rhel7              pgdg11     
repmgr12.x86_64                           5.2.0-1.rhel7              pgdg12     
repmgr12-devel.x86_64                     5.2.0-1.rhel7              pgdg12     
repmgr13.x86_64                           5.2.0-1.rhel7              pgdg13     
repmgr13-devel.x86_64                     5.2.0-1.rhel7              pgdg13     
repmgr96.x86_64                           5.2.0-1.rhel7              pgdg96     
repmgr_10.x86_64                          5.2.1-1.rhel7              pgdg10     
repmgr_11.x86_64                          5.2.1-1.rhel7              pgdg11     
repmgr_11-devel.x86_64                    5.2.1-1.rhel7              pgdg11     
repmgr_13.x86_64                          5.2.1-1.rhel7              pgdg13     
repmgr_13-devel.x86_64                    5.2.1-1.rhel7              pgdg13     
repmgr_96.x86_64                          5.2.1-1.rhel7              pgdg96     
yum -y install repmgr_12*
[root@primary ~]# vi /etc/repmgr/12/repmgr.conf  --> node_id / node_name / conninfo 각각 서버에서 변경
cluster='failover'
node_id=101
node_name='primary'
conninfo='host=primary dbname=repmgr user=repmgr'
data_directory='/mnt/pgsql/12/data'
config_directory='/mnt/pgsql/12/data'
log_file='/mnt/pgsql/12/data/log/repmgr.log'
repmgrd_service_start_command = '/usr/pgsql-12/bin/repmgrd -d'
repmgrd_service_stop_command = 'kill `cat $(/usr/pgsql-12/bin/repmgrd --show-pid-file)`'
promote_command='/usr/pgsql-12/bin/repmgr standby promote  --siblings-follow --log-to-file'
follow_command='/usr/pgsql-12/bin/repmgr standby follow  --log-to-file --upstream-node-id=%n'
failover=automatic
reconnect_attempts=3
reconnect_interval=3
ssh_options='-q -o StrictHostKeyChecking=no -o ConnectTimeout=10'
monitoring_history=yes
monitor_interval_secs=10

 

 

User & DB생성 (Primary)

postgres=# create user repmgr with superuser replication createdb;
CREATE ROLE

postgres=# alter user repmgr password'repmgr';
ALTER ROLE

postgres=# create database repmgr owner repmgr;
CREATE DATABASE

 

 

configuration authentication in pg_hba.conf config추가( 모든 노드 ) 

[root@primary ~]# vi /mnt/pgsql/12/data/postgresql.conf
listen_addresses = '*'
wal_level = 'replica'
max_wal_senders = 10
wal_keep_segments = 8
archive_mode = on
archive_command = 'true'
shared_preload_libraries='repmgr'



**파라미터 설명
wal_level
WAL (Write Ahead Log)에 기록되는 정보의 양을 결정한다. 9.4이하 버전에서는 archive 이상, 9.6이상 버전에서는 replica 이상으로 설정한다.
minimal: 기본값은 충돌 또는 즉시 셧다운으로부터 복구하기 위해 필요한 정보만 기록한다.
archive: Wal 아카이브에 필요한 로깅만 추가한다. (9.6 버전부터는 hot_standby와 archive 모두 replica로 대체 되었다.)
replica: 9.4버전까지는 hot_standby였다. 9.6 버전부터는 replica라는 값으로 대체 되었고, Slave 노드에서 읽기 전용 쿼리에 필요한 정보를 좀 더 추가 한다.
logical: 논리적 디코딩을 지원하는데 필요한 정보를 추가한다.
wal_level을 logical 까지 올리면 replica에서 사용하는 정보 뿐만 아니라 WAL로부터 논리적 변경세트를 사용하는데 필요한 정보도 로깅 되기 때문에, WAL의 볼륨이 증가한다.
max_wal_sanders
스트리밍 기반의 백업 클라이언트로부터의 동시 연결 최대 수를 지정한다. 기본값은 복제를 하지않는 0이며, WAL Sender 프로세서는 max_connections 보다 큰 값을 설정 할 수 없다. 스트리밍 클라이언트의 연결이 갑작스럽게 끊어지면 타임아웃이 될때까지 orphan slot이 생기기 때문에, 예상 되는 클라이언트의 최대 수 보다 약간 크게 설정해야 한다. 그래서 일반적으로 slave의 수+1 로 많이 설정을 한다.
wal_keep_segments
Slave노드가 스트리밍 복제를 위해 예전 로그파일을 가져와야 하는 경우 pg_xlog 디렉토리에 저장되는 과거 로그 파일 세그먼트의 최소 수를 지정한다. 각 wal segment의 사이즈는 16MB이다. wal segment가 너무 빨리 갱신 되어 빠른 속도로 사라지게 되고, slave 노드에 기록되는 wal segment가 master노드의 wal의 갱신 속도를 따라가지 못한다면, replication이 중단되며, Downstream 연결 역시 실패 한다. Maximum 수가 지정 되어 있지 않기 때문에서 서버의 디스크 공간, 그리고 DB 트랜잭션에 따른 wal의 갱신 속도를 고려해서 설정값을 찾아야 한다.
archive_mode
아카이브 모드가 사용되는것으로 설정되면, wal segment가 archive_command설정에 의해   아카이브 저장소로 전달 된다. 쉽게 생각하면 pg_xlog에 생성되는 wal segment를 다른 archive 디렉토리를 만들어서 그 곳으로 전달한다고 보면 된다. pg_xlog에 생성되는 wal segment들은 wal_keep_segments에 의해 지정된 숫자를 넘기면 삭제되기 때문에, 지나간 wal segment를 아카이브로 저장하는 것이다. wal_level이 minimal인 경우는 아카이브 모드를 사용할 수 없다.
archive_command
완료된 wal segment를 아카이브 하기위해 실행하는 로컬 쉘 명령. %p는 아카이브할 파일의 경로명으로 대체 되고, %f는 파일명으로 대체된다 (%문자를 명령에 포함하려면 %%으로 사용).
ex) archive_command = ‘cp %p /var/lib/edb/as9.6/archive/%f’ 이렇게 설정하면 pg_xlog에 있는 wal segment를 var/lib/edb/as9.6/archive/밑에 파일로 복사하라는 것이 된다.
archive_command = ‘true’로 설정하면 wal segment는 계속 만들어지지만, 아카이빙은 되지 않기 때문에 복구의 필요한 WAL 파일의 체인이 단절되는 현상이 발생한다.  지나간 WAL파일을 사용할 수 없다는 이야기다.




root@primary:/root# vi /mnt/pgsql/12/data/pg_hba.conf
local   replication   repmgr                                   trust
host    replication   repmgr      127.0.0.1/32            trust
host    replication   repmgr      172.40.40.0/24         trust
 
local   repmgr        repmgr                                    trust
host    repmgr        repmgr      127.0.0.1/32             trust
host    repmgr        repmgr      172.40.40.0/24          trust

 

 

Replcation & Repmgr register

<PRIMARY>
postgres@primary:/home/postgres# /usr/pgsql-12/bin/repmgr primary register
WARNING: the following problems were found in the configuration file:
  parameter "cluster" is deprecated and will be ignored
INFO: connecting to primary database...
NOTICE: attempting to install extension "repmgr"
NOTICE: "repmgr" extension successfully installed
NOTICE: primary node record (ID: 101) registered




※아래 에러 발생 시 db가 꺼져있는 상태라 db켜주면 된다.
WARNING: the following problems were found in the configuration file:
  parameter "cluster" is deprecated and will be ignored
INFO: connecting to primary database...
ERROR: connection to database failed
DETAIL:
could not connect to server: Connection refused
    Is the server running on host "primary" (172.40.40.64) and accepting
    TCP/IP connections on port 5432?
DETAIL: attempted to connect using:
  user=repmgr dbname=repmgr host=primary connect_timeout=2 fallback_application_name=repmgr options=-csearch_path=




postgres@primary:/home/postgres# /usr/pgsql-12/bin/repmgr daemon start
WARNING: the following problems were found in the configuration file:
  parameter "cluster" is deprecated and will be ignored
NOTICE: executing: "/usr/pgsql-12/bin/repmgrd -d"
NOTICE: repmgrd was successfully started




postgres@primary:/home/postgres# /usr/pgsql-12/bin/repmgr daemon status
WARNING: the following problems were found in the configuration file:
  parameter "cluster" is deprecated and will be ignored
ID | Name    | Role    | Status    | Upstream | repmgrd | PID  | Paused? | Upstream last seen
----+---------+---------+-----------+----------+---------+------+---------+--------------------
101 | primary | primary | * running |          | running | 3358 | no      | n/a     





<Replica1, Replica2>  
stop 상태로 primary의 clone을 떠옵니다. (기존에 data파일이 받아져 있는게 있으면 -F 옵션줘서 강제로 덮거나 data dir삭제 후 하면됩니다.)

postgres@replica1:/home/postgres# /usr/pgsql-12/bin/repmgr -h primary -U repmgr -d repmgr standby clone
postgres@replica2:/home/postgres# /usr/pgsql-12/bin/repmgr -h primary -U repmgr -d repmgr standby clone
WARNING: the following problems were found in the configuration file:
  parameter "cluster" is deprecated and will be ignored
WARNING: following problems with command line parameters detected:
  "config_directory" set in repmgr.conf, but --copy-external-config-files not provided
NOTICE: destination directory "/mnt/pgsql/12/data" provided
INFO: connecting to source node
DETAIL: connection string is: host=primary user=repmgr dbname=repmgr
DETAIL: current installation size is 31 MB
INFO: replication slot usage not requested;  no replication slot will be set up for this standby
NOTICE: checking for available walsenders on the source node (2 required)
NOTICE: checking replication connections can be made to the source server (2 required)
WARNING: data checksums are not enabled and "wal_log_hints" is "off"
DETAIL: pg_rewind requires "wal_log_hints" to be enabled
WARNING: directory "/mnt/pgsql/12/data" exists but is not empty
NOTICE: -F/--force provided - deleting existing data directory "/mnt/pgsql/12/data"
NOTICE: starting backup (using pg_basebackup)...
HINT: this may take some time; consider using the -c/--fast-checkpoint option
INFO: executing:
  /usr/pgsql-12/bin/pg_basebackup -l "repmgr base backup"  -D /mnt/pgsql/12/data -h primary -p 5432 -U repmgr -X stream
NOTICE: standby clone (using pg_basebackup) complete
NOTICE: you can now start your PostgreSQL server
HINT: for example: pg_ctl -D /mnt/pgsql/12/data start
HINT: after starting the server, you need to register this standby with "repmgr standby register"




※5432포트 open해줍니다
firewall-cmd --permanent --add-port=5432/tcp




## 복제 후 DB 기동
postgres@replica1:/home/postgres# start
서버를 시작하기 위해 기다리는 중.... 완료
서버 시작됨

postgres@replica2:/home/postgres# start
서버를 시작하기 위해 기다리는 중.... 완료
서버 시작됨



##Replica repmgr 등록
postgres@replica1:/home/postgres# /usr/pgsql-12/bin/repmgr standby register -h primary -U repmgr

postgres@replica2:/home/postgres# /usr/pgsql-12/bin/repmgr standby register -h primary -U repmgr
WARNING: the following problems were found in the configuration file:
  parameter "cluster" is deprecated and will be ignored
INFO: connecting to local node "replica2" (ID: 103)
WARNING: database connection parameters not required when the standby to be registered is running
DETAIL: repmgr uses the "conninfo" parameter in "repmgr.conf" to connect to the standby
INFO: connecting to primary database
WARNING: --upstream-node-id not supplied, assuming upstream node is primary (node ID 101)
INFO: standby registration complete
NOTICE: standby node "replica2" (ID: 103) successfully registered





postgres@replica1:/home/postgres# /usr/pgsql-12/bin/repmgr daemon start
WARNING: the following problems were found in the configuration file:
  parameter "cluster" is deprecated and will be ignored
NOTICE: executing: "/usr/pgsql-12/bin/repmgrd -d"
NOTICE: repmgrd was successfully started




postgres@replica2:/home/postgres# /usr/pgsql-12/bin/repmgr daemon status
WARNING: the following problems were found in the configuration file:
  parameter "cluster" is deprecated and will be ignored
ID | Name     | Role    | Status    | Upstream | repmgrd | PID   | Paused? | Upstream last seen
----+----------+---------+-----------+----------+---------+-------+---------+--------------------
101 | primary  | primary | * running |          | running | 3358  | no      | n/a                
102 | replica1 | standby |   running | primary  | running | 31673 | no      | 9 second(s) ago    
103 | replica2 | standby |   running | primary  | running | 14775 | no      | 8 second(s) ago 

 
 
 

Replication 조회

<primary>
postgres=# select pid, usename, client_addr, backend_start, state, sync_state from pg_stat_replication;
  pid  | usename | client_addr  |         backend_start         |   state   | sync_state
-------+---------+--------------+-------------------------------+-----------+------------
11015 | repmgr  | 172.40.40.65 | 2021-09-07 17:27:39.932476+09 | streaming | async
11711 | repmgr  | 172.40.40.66 | 2021-09-07 17:37:43.674404+09 | streaming | async
(2 rows)



<standby>
postgres=# \x
Expanded display is on.

postgres=# select * from pg_stat_wal_receiver;
-[ RECORD 1 ]---------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
pid                   | 31367
status                | streaming
receive_start_lsn     | 0/4000000
receive_start_tli     | 1
received_lsn          | 0/6CC6E20
received_tli          | 1
last_msg_send_time    | 2021-09-10 08:41:07.486016+09
last_msg_receipt_time | 2021-09-10 08:41:07.485994+09
latest_end_lsn        | 0/6CC6E20
latest_end_time       | 2021-09-10 08:41:07.486016+09
slot_name             |
sender_host           | primary
sender_port           | 5432
conninfo              | user=repmgr passfile=/home/postgres/.pgpass dbname=replication host=primary port=5432 application_name=replica1 fallback_application_name=walreceiver sslmode=disable sslcompression=0 gssencmode=disable krbsrvname=postgres target_session_attrs=any




#Primary 작업 수행
postgres=# create database test2 owner postgres;
CREATE DATABASE

postgres=# create table test(a int,v char(100));
CREATE TABLE

postgres=# insert into test values(1,'asd');
INSERT 0 1




#standby 조회
postgres=# select * from pg_stat_wal_receiver;
-[ RECORD 1 ]---------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
pid                   | 31367
status                | streaming
receive_start_lsn     | 0/4000000
receive_start_tli     | 1
received_lsn          | 0/6CDE6C0
received_tli          | 1
last_msg_send_time    | 2021-09-10 08:42:56.895763+09
last_msg_receipt_time | 2021-09-10 08:42:56.89563+09
latest_end_lsn        | 0/6CDE6C0
latest_end_time       | 2021-09-10 08:42:56.895763+09
slot_name             |
sender_host           | primary
sender_port           | 5432
conninfo              | user=repmgr passfile=/home/postgres/.pgpass dbname=replication host=primary port=5432 application_name=replica1 fallback_application_name=walreceiver sslmode=disable sslcompression=0 gssencmode=disable krbsrvname=postgres target_session_attrs=any
 

 

PostgreSQL Withness Configuration

-Primary DB에 문제가 발생하여 새 마스터를 투표하기를 원하는 경우 별도의 Withness 서버가 있으면 좋습니다.
-감시를 설정할때 주의해야하는 사항은 withness가 자체 repmgr DB를 사용해여 합니다.
-replica Server를 withness로 만들면 안됩니다. 

 

root@repmgr:/root# vi /mnt/pgsql/12/data/postgresql.conf
listen_addresses='*'
shared_preload_libraries='repmgr'

추가 후 postgresql 재시작


postgres@repmgr:/home/postgres# psql
psql (12.3, server 12.8)
Type "help" for help.


postgres=# create user repmgr replication createdb superuser createrole password 'repmgr';
CREATE ROLE

postgres=# create database repmgr owner repmgr;
CREATE DATABASE



-맨 마지막줄에 추가-
postgres@repmgr:/home/postgres# vi /mnt/pgsql/12/data/pg_hba.conf

local   replication   repmgr                              trust
host    replication   repmgr      127.0.0.1/32            trust
host    replication   repmgr      172.40.40.0/24          trust

local   repmgr        repmgr                              trust
host    repmgr        repmgr      127.0.0.1/32            trust
host    repmgr        repmgr      172.40.40.0/24          trust




##레지스터 등록
postgres@repmgr:/home/postgres# /usr/pgsql-12/bin/repmgr witness register -h primary -U repmgr
WARNING: the following problems were found in the configuration file:
  parameter "cluster" is deprecated and will be ignored
INFO: connecting to witness node "repmgr" (ID: 104)
INFO: connecting to primary node
NOTICE: attempting to install extension "repmgr"
NOTICE: "repmgr" extension successfully installed
INFO: witness registration complete
NOTICE: witness node "repmgr" (ID: 104) successfully registered





root@repmgr:/root# mkdir -p /mnt/pgsql/12/log
root@repmgr:/mnt/pgsql/12# chown -R postgres.postgres /mnt/pgsql/12/






## repmgr 데몬 실행
postgres@repmgr:/home/postgres# /usr/pgsql-12/bin/repmgr daemon start
WARNING: the following problems were found in the configuration file:
  parameter "cluster" is deprecated and will be ignored
NOTICE: executing: "/usr/pgsql-12/bin/repmgrd -d"
NOTICE: repmgrd was successfully started





##각 노드별로 firewall을 꺼주거나 5432포트를 등록해줘야 서로 통신됨. 나는 그냥 꺼버림 systemctl stop firewalld.service
postgres@repmgr:/home/postgres# /usr/pgsql-12/bin/repmgr daemon status
WARNING: the following problems were found in the configuration file:
  parameter "cluster" is deprecated and will be ignored
ID | Name     | Role    | Status    | Upstream | repmgrd | PID   | Paused? | Upstream last seen
----+----------+---------+-----------+----------+---------+-------+---------+--------------------
101 | primary  | primary | * running |          | running | 3358  | no      | n/a                
102 | replica1 | standby |   running | primary  | running | 31673 | no      | 9 second(s) ago    
103 | replica2 | standby |   running | primary  | running | 14775 | no      | 8 second(s) ago    
104 | repmgr   | witness | * running | primary  | running | 23561 | no      | 1 second(s) ago   





postgres@repmgr:/home/postgres# /usr/pgsql-12/bin/repmgr cluster show
WARNING: the following problems were found in the configuration file:
  parameter "cluster" is deprecated and will be ignored
ID  | Name     | Role    | Status    | Upstream | Location | Priority | Timeline | Connection string                      
-----+----------+---------+-----------+----------+----------+----------+----------+-----------------------------------------
101 | primary  | primary | * running |          | default  | 100      | 1        | host=primary dbname=repmgr user=repmgr
102 | replica1 | standby |   running | primary  | default  | 100      | 1        | host=replica1 dbname=repmgr user=repmgr
103 | replica2 | standby |   running | primary  | default  | 100      | 1        | host=replica2 dbname=repmgr user=repmgr
104 | repmgr   | witness | * running | primary  | default  | 0        | n/a      | host=repmgr dbname=repmgr user=repmgr  
 
 
※ primary노드로부터 clone을 떠서 넣었기 때문에 upstream이 primary로 되어 있어야합니다.





## replication에 등록된 cluster의 event들 보기
postgres@replica2:/home/postgres# /usr/pgsql-12/bin/repmgr cluster event
WARNING: the following problems were found in the configuration file:
  parameter "cluster" is deprecated and will be ignored
Node ID | Name     | Event                   | OK | Timestamp           | Details                                                                        
---------+----------+-------------------------+----+---------------------+---------------------------------------------------------------------------------
101     | primary  | child_node_new_connect  | t  | 2021-09-10 09:25:09 | new witness "repmgr" (ID: 104) has connected                                   
104     | repmgr   | repmgrd_start           | t  | 2021-09-10 09:20:18 | witness monitoring connection to primary node "primary" (ID: 101)              
101     | primary  | child_node_new_connect  | t  | 2021-09-10 08:55:47 | new witness "repmgr" (ID: 104) has connected                                   
104     | repmgr   | witness_register        | t  | 2021-09-10 08:55:47 | witness registration succeeded; upstream node ID is 101                        
103     | replica2 | repmgrd_start           | t  | 2021-09-10 08:38:26 | monitoring connection to upstream node "primary" (ID: 101)                     
102     | replica1 | repmgrd_start           | t  | 2021-09-07 17:31:39 | monitoring connection to upstream node "primary" (ID: 101)                     
101     | primary  | child_node_new_connect  | t  | 2021-09-07 17:29:53 | new standby "replica1" (ID: 102) has connected                                 
102     | replica1 | standby_register        | t  | 2021-09-07 17:29:45 | standby registration succeeded; upstream node ID is 101                        
101     | primary  | child_node_new_connect  | t  | 2021-09-07 17:28:43 | new standby "replica2" (ID: 103) has connected                                 
103     | replica2 | standby_register        | t  | 2021-09-07 17:28:37 | standby registration succeeded; upstream node ID is 101                        
103     | replica2 | standby_clone           | t  | 2021-09-07 17:26:29 | cloned from host "primary", port 5432; backup method: pg_basebackup; --force: Y
102     | replica1 | standby_clone           | t  | 2021-09-07 17:23:48 | cloned from host "primary", port 5432; backup method: pg_basebackup; --force: Y
101     | primary  | repmgrd_local_reconnect | t  | 2021-09-07 17:22:23 | reconnected to primary node after 40 seconds, resuming monitoring              
101     | primary  | repmgrd_start           | t  | 2021-09-07 15:46:13 | monitoring cluster primary "primary" (ID: 101)                                 
101     | primary  | primary_register        | t  | 2021-09-07 15:06:27 |                                                                                
101     | primary  | cluster_created         | t  | 2021-09-07 15:06:27 |                                                                               
WARNING: the following problems were found in the configuration file:
  parameter "cluster" is deprecated and will be ignored
Node ID | Name   | Event           | OK | Timestamp           | Details
---------+--------+-----------------+----+---------------------+---------
104     | repmgr | cluster_created | t  | 2021-09-10 08:55:47 |        
 
 
 
 

Switchover

Primary -> Replica1 (DB Switchover 시 DB 재부팅됨)

※ primary로 만들 곳에 가서 해당 명령어 실행 ( replica1을 primary로 만들거기 때문에 replica1에서 명령어 실행함)

<replica1에서 실행>
postgres@replica1:/var/lib/pgsql# repmgr -h replica1 standby switchover --siblings-follow
WARNING: the following problems were found in the configuration file:
  parameter "cluster" is deprecated and will be ignored
WARNING: following problems with command line parameters detected:
  database connection parameters not required when executing STANDBY SWITCHOVER
NOTICE: executing switchover on node "replica1" (ID: 102)
WARNING: the following problems were found in the configuration file:
  parameter "cluster" is deprecated and will be ignored
NOTICE: local node "replica1" (ID: 102) will be promoted to primary; current primary "primary" (ID: 101) will be demoted to standby
NOTICE: stopping current primary node "primary" (ID: 101)
WARNING: the following problems were found in the configuration file:
  parameter "cluster" is deprecated and will be ignored
NOTICE: issuing CHECKPOINT on node "primary" (ID: 101)
DETAIL: executing server command "/usr/pgsql-12/bin/pg_ctl  -D '/mnt/pgsql/12/data' -W -m fast stop"
INFO: checking for primary shutdown; 1 of 60 attempts ("shutdown_check_timeout")
INFO: checking for primary shutdown; 2 of 60 attempts ("shutdown_check_timeout")
WARNING: the following problems were found in the configuration file:
  parameter "cluster" is deprecated and will be ignored
NOTICE: current primary has been cleanly shut down at location 0/7000028
NOTICE: promoting standby to primary
DETAIL: promoting server "replica1" (ID: 102) using pg_promote()
NOTICE: waiting up to 60 seconds (parameter "promote_check_timeout") for promotion to complete
NOTICE: STANDBY PROMOTE successful
DETAIL: server "replica1" (ID: 102) was successfully promoted to primary
WARNING: the following problems were found in the configuration file:
  parameter "cluster" is deprecated and will be ignored
INFO: local node 101 can attach to rejoin target node 102
DETAIL: local node's recovery point: 0/7000028; rejoin target node's fork point: 0/70000A0
NOTICE: setting node 101's upstream to node 102
WARNING: unable to ping "host=primary dbname=repmgr user=repmgr"
DETAIL: PQping() returned "PQPING_NO_RESPONSE"
NOTICE: starting server using "/usr/pgsql-12/bin/pg_ctl  -w -D '/mnt/pgsql/12/data' start"
NOTICE: NODE REJOIN successful
DETAIL: node 101 is now attached to node 102
NOTICE: node  "replica1" (ID: 102) promoted to primary, node "primary" (ID: 101) demoted to standby
NOTICE: executing STANDBY FOLLOW on 2 of 2 siblings
정보:  node 104 received notification to follow node 102
INFO: STANDBY FOLLOW successfully executed on all reachable sibling nodes
NOTICE: switchover was successful
DETAIL: node "replica1" is now primary and node "primary" is attached as standby
NOTICE: STANDBY SWITCHOVER has completed successfully





postgres@replica1:/var/lib/pgsql# show
WARNING: the following problems were found in the configuration file:
  parameter "cluster" is deprecated and will be ignored
ID  | Name     | Role    | Status    | Upstream | Location | Priority | Timeline | Connection string                      
-----+----------+---------+-----------+----------+----------+----------+----------+-----------------------------------------
101 | primary  | standby |   running | replica1 | default  | 100      | 1        | host=primary dbname=repmgr user=repmgr
102 | replica1 | primary | * running |          | default  | 100      | 2        | host=replica1 dbname=repmgr user=repmgr
103 | replica2 | standby |   running | replica1 | default  | 100      | 1        | host=replica2 dbname=repmgr user=repmgr
104 | repmgr   | witness | * running | replica1 | default  | 0        | n/a      | host=repmgr dbname=repmgr user=repmgr  




<Primary repmgr.log 확인>
root@primary:/root# vi /mnt/pgsql/12/data/log/repmgr.log
[2021-09-10 09:29:09] [INFO] monitoring primary node "primary" (ID: 101) in normal state
[2021-09-10 09:30:39] [WARNING] unable to ping "host=primary dbname=repmgr user=repmgr"
[2021-09-10 09:30:39] [DETAIL] PQping() returned "PQPING_NO_RESPONSE"
[2021-09-10 09:30:39] [WARNING] connection to node "primary" (ID: 101) lost
[2021-09-10 09:30:39] [DETAIL]
치명적오류:  관리자 요청에 의해서 연결을 끝냅니다
server closed the connection unexpectedly
        This probably means the server terminated abnormally
        before or while processing the request.
[2021-09-10 09:30:39] [INFO] attempting to reconnect to node "primary" (ID: 101)
[2021-09-10 09:30:39] [ERROR] connection to database failed
[2021-09-10 09:30:39] [DETAIL]
could not connect to server: Connection refused
        Is the server running on host "primary" (172.40.40.64) and accepting
        TCP/IP connections on port 5432?
[2021-09-10 09:30:39] [DETAIL] attempted to connect using:
  user=repmgr dbname=repmgr host=primary connect_timeout=2 fallback_application_name=repmgr options=-csearch_path=
[2021-09-10 09:30:39] [WARNING] reconnection to node "primary" (ID: 101) failed
[2021-09-10 09:30:39] [WARNING] unable to connect to local node
[2021-09-10 09:30:39] [WARNING] unable to connect to local node
[2021-09-10 09:30:39] [INFO] checking state of node 101, 1 of 3 attempts
[2021-09-10 09:30:39] [WARNING] unable to ping "user=repmgr dbname=repmgr host=primary connect_timeout=2 fallback_application_name=repmgr"
[2021-09-10 09:30:39] [DETAIL] PQping() returned "PQPING_NO_RESPONSE"
[2021-09-10 09:30:39] [INFO] sleeping 3 seconds until next reconnection attempt
[2021-09-10 09:30:42] [INFO] checking state of node 101, 2 of 3 attempts
[2021-09-10 09:30:42] [NOTICE] node 101 has recovered, reconnecting
[2021-09-10 09:30:42] [INFO] connection to node 101 succeeded
[2021-09-10 09:30:42] [NOTICE] reconnected to local node after 3 seconds
정보:  set_repmgrd_pid(): provided pidfile is /tmp/repmgrd.pid
[2021-09-10 09:30:42] [NOTICE] node is now a standby, switching to standby monitoring




<Primary DB log>
root@primary:/mnt/pgsql/12# vi /mnt/pgsql/12/pgsql.log
2021-09-10 09:30:38.085 KST [30923] 치명적오류:  관리자 요청에 의해서 >연결을 끝냅니다
2021-09-10 09:30:38.086 KST [27738] 치명적오류:  관리자 요청에 의해서 >연결을 끝냅니다
2021-09-10 09:30:38.087 KST [27678] 치명적오류:  관리자 요청에 의해서 >연결을 끝냅니다
2021-09-10 09:30:38.088 KST [11282] 치명적오류:  관리자 요청에 의해서 >연결을 끝냅니다
2021-09-10 09:30:38.089 KST [10653] 치명적오류:  관리자 요청에 의해서 >연결을 끝냅니다
2021-09-10 09:30:38.090 KST [10651] 치명적오류:  관리자 요청에 의해서 >연결을 끝냅니다
2021-09-10 09:30:38.097 KST [10637] 로그:  백그라운드 작업자 "logical replication launcher" (PID 10645) 프로그램은 1 코드로 마쳤습니다
2021-09-10 09:30:38.097 KST [10639] 로그:  서비스를 멈추고 있습니다
2021-09-10 09:30:38.160 KST [31966] 치명적오류:  데이터베이스 시스템이 중지 중입니다
2021-09-10 09:30:38.164 KST [31969] 치명적오류:  데이터베이스 시스템이 중지 중입니다
2021-09-10 09:30:38.165 KST [31970] 치명적오류:  데이터베이스 시스템이 중지 중입니다
2021-09-10 09:30:38.168 KST [10637] 로그:  데이터베이스 시스템 서비스를
중지했습니다
 
 

Failover test                        Primary -> Replica1

root@replica1:/root# systemctl restart repmgr12.service
Job for repmgr12.service failed because a configured resource limit was exceeded. See "systemctl status repmgr12.service" and "journalctl -xe" for details.


root@replica1:/root# systemctl stop repmgr12.service
root@replica1:/root# su - postgres
마지막 로그인: 금  9월 10 09:52:19 KST 2021 일시 pts/0


postgres@primary:/home/postgres# repmgr daemon stop
postgres@primary:/home/postgres# repmgr daemon start
postgres@primary:/home/postgres# stop


postgres@replica1:/mnt/pgsql/12# repmgr daemon status
WARNING: the following problems were found in the configuration file:
  parameter "cluster" is deprecated and will be ignored
ID | Name     | Role    | Status    | Upstream | repmgrd     | PID   | Paused? | Upstream last seen
----+----------+---------+-----------+----------+-------------+-------+---------+--------------------
101 | primary  | primary | - failed  | ?        | n/a         | n/a   | n/a     | n/a                
102 | replica1 | primary | * running |          | running     | 30065 | no      | n/a                
103 | replica2 | standby |   running | replica1 | not running | n/a   | n/a     | n/a                
104 | repmgr   | witness | * running | replica1 | running     | 23561 | no      | 0 second(s) ago    

WARNING: following issues were detected
  - unable to  connect to node "primary" (ID: 101)



## primary노드가 끊어지고 replica1 노드가 primary가 되었습니다
postgres@primary:/home/postgres# vi /mnt/pgsql/12/data/log/repmgr.log
[2021-09-10 13:05:22] [INFO] switching to primary monitoring mode
[2021-09-10 13:05:22] [NOTICE] monitoring cluster primary "replica1" (ID: 102)
[2021-09-10 13:05:22] [INFO] child node "replica2" (ID: 103) is attached
[2021-09-10 13:05:22] [INFO] child node "repmgr" (ID: 104) is not yet attached
[2021-09-10 13:05:32] [NOTICE] new witness "repmgr" (ID: 104) has connected
[2021-09-10 13:10:23] [INFO] monitoring primary node "replica1" (ID: 102) in normal state




postgres@replica1:/# show
WARNING: the following problems were found in the configuration file:
  parameter "cluster" is deprecated and will be ignored
ID  | Name     | Role    | Status    | Upstream | Location | Priority | Timeline | Connection string                      
-----+----------+---------+-----------+----------+----------+----------+----------+-----------------------------------------
101 | primary  | primary | - failed  | ?        | default  | 100      |          | host=primary dbname=repmgr user=repmgr
102 | replica1 | primary | * running |          | default  | 100      | 6        | host=replica1 dbname=repmgr user=repmgr
103 | replica2 | standby |   running | replica1 | default  | 100      | 6        | host=replica2 dbname=repmgr user=repmgr
104 | repmgr   | witness | * running | replica1 | default  | 0        | n/a      | host=repmgr dbname=repmgr user=repmgr  
 
WARNING: following issues were detected
  - unable to connect to node "primary" (ID: 101)
 
HINT: execute with --verbose option to see connection error messages




<replica2 repmgr.log>
[2021-09-10 13:06:19] [ERROR] unable to execute repmgr.get_new_primary()
[2021-09-10 13:06:19] [DETAIL]
no connection to the server

[2021-09-10 13:06:19] [DETAIL] query text is:
SELECT repmgr.get_new_primary()
[2021-09-10 13:06:20] [WARNING] no notification received from new primary after 60 seconds
[2021-09-10 13:06:20] [WARNING] unable to ping "host=primary dbname=repmgr user=repmgr"
[2021-09-10 13:06:20] [DETAIL] PQping() returned "PQPING_NO_RESPONSE"
[2021-09-10 13:06:20] [ERROR] unable to determine if server is in recovery
[2021-09-10 13:06:20] [DETAIL]
no connection to the server

[2021-09-10 13:06:20] [DETAIL] query text is:
SELECT pg_catalog.pg_is_in_recovery()
[2021-09-10 13:06:20] [ERROR] unable to execute "SELECT repmgr.repmgrd_is_paused()"
[2021-09-10 13:06:20] [DETAIL] no connection to the server

[2021-09-10 13:06:20] [ERROR] get_active_sibling_records(): unable to execute query
[2021-09-10 13:06:20] [DETAIL]
no connection to the server

[2021-09-10 13:06:20] [WARNING] local connection is not available, unable to update monitoring history
[2021-09-10 13:06:20] [INFO] attempting to reconnect to node "replica2" (ID: 103)
[2021-09-10 13:06:20] [INFO] reconnected to node "replica2" (ID: 103)
정보:  set_repmgrd_pid(): provided pidfile is /tmp/repmgrd.pid
[2021-09-10 13:06:20] [ERROR] unable to determine if server is in recovery
[2021-09-10 13:06:20] [DETAIL]
치명적오류:  관리자 요청에 의해서 연결을 끝냅니다
server closed the connection unexpectedly
        This probably means the server terminated abnormally
        before or while processing the request.

[2021-09-10 13:06:20] [DETAIL] query text is:
SELECT pg_catalog.pg_is_in_recovery()
[2021-09-10 13:06:20] [NOTICE] local node "replica2" (ID: 103)'s upstream appears to have changed, restarting monitoring
[2021-09-10 13:06:20] [DETAIL] currently monitoring upstream 101; new upstream is 102
[2021-09-10 13:11:20] [INFO] node "replica2" (ID: 103) monitoring upstream node "replica1" (ID: 102) in normal state
[2021-09-10 13:11:20] [DETAIL] last monitoring statistics update was 10 seconds ago
[2021-09-10 13:16:21] [INFO] node "replica2" (ID: 103) monitoring upstream node "replica1" (ID: 102) in normal state
[2021-09-10 13:16:21] [DETAIL] last monitoring statistics update was 10 seconds ago
 

 

Failover로 새로운 primary Node로 승격시킬때 기존 primary가 Online 상태가 되면 repmgr  데이터의 부정확성이 초래됩니다.  따라서 기존 Primary를 띄우면안됩니다. 

postgres@primary:/tmp# stop
서버를 멈추기 위해 기다리는 중.... 완료
서버 멈추었음



postgres@primary:/home/postgres# show
WARNING: the following problems were found in the configuration file:
  parameter "cluster" is deprecated and will be ignored
ID  | Name     | Role    | Status               | Upstream   | Location | Priority | Timeline | Connection string                      
-----+----------+---------+----------------------+------------+----------+----------+----------+-----------------------------------------
101 | primary  | primary | * running            |            | default  | 100      | 5        | host=primary dbname=repmgr user=repmgr
102 | replica1 | standby | ! running as primary |            | default  | 100      | 6        | host=replica1 dbname=repmgr user=repmgr
103 | replica2 | standby |   running            | ! replica1 | default  | 100      | 6        | host=replica2 dbname=repmgr user=repmgr
104 | repmgr   | witness | * running            | ! replica1 | default  | 0        | n/a      | host=repmgr dbname=repmgr user=repmgr  

WARNING: following issues were detected
  - node "replica1" (ID: 102) is registered as standby but running as primary
  - node "replica2" (ID: 103) reports a different upstream (reported: "replica1", expected "primary")
  - node "repmgr" (ID: 104) reports a different upstream (reported: "replica1", expected "primary")



postgres@replica1:/# show
WARNING: the following problems were found in the configuration file:
  parameter "cluster" is deprecated and will be ignored
ID  | Name     | Role    | Status    | Upstream | Location | Priority | Timeline | Connection string                      
-----+----------+---------+-----------+----------+----------+----------+----------+-----------------------------------------
101 | primary  | primary | ! running |          | default  | 100      | 5        | host=primary dbname=repmgr user=repmgr
102 | replica1 | primary | * running |          | default  | 100      | 6        | host=replica1 dbname=repmgr user=repmgr
103 | replica2 | standby |   running | replica1 | default  | 100      | 6        | host=replica2 dbname=repmgr user=repmgr
104 | repmgr   | witness | * running | replica1 | default  | 0        | n/a      | host=repmgr dbname=repmgr user=repmgr  

WARNING: following issues were detected
  - node "primary" (ID: 101) is running but the repmgr node record is inactive

 
 
 
 

 Failover 이후 기존 Primary 작업 ( 기존 Primary Unregister 후에 현재 Primary DB backup이용하여 Replication 재구성 ) 

현재 primary로 노드인 replica1로 가서 원래 primary노드인 primary를 레지스터에서 지운뒤 작업 진행
postgres@replica1:/# show
WARNING: the following problems were found in the configuration file:
  parameter "cluster" is deprecated and will be ignored
ID  | Name     | Role    | Status    | Upstream | Location | Priority | Timeline | Connection string                      
-----+----------+---------+-----------+----------+----------+----------+----------+-----------------------------------------
101 | primary  | primary | - failed  | ?        | default  | 100      | 5        | host=primary dbname=repmgr user=repmgr
102 | replica1 | primary | * running |          | default  | 100      | 6        | host=replica1 dbname=repmgr user=repmgr
103 | replica2 | standby |   running | replica1 | default  | 100      | 6        | host=replica2 dbname=repmgr user=repmgr
104 | repmgr   | witness | * running | replica1 | default  | 0        | n/a      | host=repmgr dbname=repmgr user=repmgr  
WARNING: following issues were detected
  - node "primary" (ID: 101) is running but the repmgr node record is inactive




## unregister
postgres@replica1:/# repmgr primary unregister --node-id=101 --force
WARNING: the following problems were found in the configuration file:
  parameter "cluster" is deprecated and will be ignored
INFO: node "primary" (ID: 101) was successfully unregistered
postgres@replica1:/# show
WARNING: the following problems were found in the configuration file:
  parameter "cluster" is deprecated and will be ignored
ID  | Name     | Role    | Status    | Upstream | Location | Priority | Timeline | Connection string                      
-----+----------+---------+-----------+----------+----------+----------+----------+-----------------------------------------
102 | replica1 | primary | * running |          | default  | 100      | 6        | host=replica1 dbname=repmgr user=repmgr
103 | replica2 | standby |   running | replica1 | default  | 100      | 6        | host=replica2 dbname=repmgr user=repmgr
104 | repmgr   | witness | * running | replica1 | default  | 0        | n/a      | host=repmgr dbname=repmgr user=repmgr





##이후에 기존 Primary Data를 지운후 Replication 을 다시 맺어준다.
postgres@primary:/mnt/pgsql/12# cd /mnt/pgsql/12/
postgres@primary:/mnt/pgsql/12# ll
합계 36
drwxr-xr-x.  2 postgres postgres  4096  9월  7 06:54 bin
drwx------. 20 postgres postgres  4096  9월 10 13:22 data
drwxr-xr-x.  6 postgres postgres  4096  9월  7 06:54 include
drwxr-xr-x.  4 postgres postgres  4096  9월  7 06:54 lib
drwxrwxr-x.  2 postgres postgres     6  9월  7 10:01 man
-rw-------.  1 postgres postgres 15839  9월 10 13:22 pgsql.log
drwxr-xr-x.  6 postgres postgres  4096  9월  7 06:54 share



postgres@primary:/mnt/pgsql/12# rm -rf data          //data dir 삭제





## 현재 Primary인 replica1 노드의 클론 작업
postgres@primary:/mnt/pgsql/12# repmgr -h replica1 -U repmgr -d repmgr standby clone
WARNING: the following problems were found in the configuration file:
  parameter "cluster" is deprecated and will be ignored
WARNING: following problems with command line parameters detected:
  "config_directory" set in repmgr.conf, but --copy-external-config-files not provided
NOTICE: destination directory "/mnt/pgsql/12/data" provided
INFO: connecting to source node

DETAIL: connection string is: host=replica1 user=repmgr dbname=repmgr
DETAIL: current installation size is 41 MB
INFO: replication slot usage not requested;  no replication slot will be set up for this standby
NOTICE: checking for available walsenders on the source node (2 required)
NOTICE: checking replication connections can be made to the source server (2 required)
WARNING: data checksums are not enabled and "wal_log_hints" is "off"
DETAIL: pg_rewind requires "wal_log_hints" to be enabled
INFO: creating directory "/mnt/pgsql/12/data"...
NOTICE: starting backup (using pg_basebackup)...
HINT: this may take some time; consider using the -c/--fast-checkpoint option
INFO: executing:
  /usr/pgsql-12/bin/pg_basebackup -l "repmgr base backup"  -D /mnt/pgsql/12/data -h replica1 -p 5432 -U repmgr -X stream
NOTICE: standby clone (using pg_basebackup) complete
NOTICE: you can now start your PostgreSQL server
HINT: for example: pg_ctl -D /mnt/pgsql/12/data start
HINT: after starting the server, you need to register this standby with "repmgr standby register"






## db 기동 후 레지스터 등록 -> 현재 primary가 replica1이므로 -h 경로에 replica1을 기재
postgres@primary:/mnt/pgsql/12# start
서버를 시작하기 위해 기다리는 중.... 완료
서버 시작됨



postgres@primary:/mnt/pgsql/12# /usr/pgsql-12/bin/repmgr standby register -h replica1 -U repmgr
WARNING: the following problems were found in the configuration file:
  parameter "cluster" is deprecated and will be ignored
INFO: connecting to local node "primary" (ID: 101)
WARNING: database connection parameters not required when the standby to be registered is running
DETAIL: repmgr uses the "conninfo" parameter in "repmgr.conf" to connect to the standby
INFO: connecting to primary database
WARNING: --upstream-node-id not supplied, assuming upstream node is primary (node ID 102)
INFO: standby registration complete
NOTICE: standby node "primary" (ID: 101) successfully registered




postgres@primary:/mnt/pgsql/12# show
WARNING: the following problems were found in the configuration file:
  parameter "cluster" is deprecated and will be ignored
ID  | Name     | Role    | Status    | Upstream | Location | Priority | Timeline | Connection string                      
-----+----------+---------+-----------+----------+----------+----------+----------+-----------------------------------------
101 | primary  | standby |   running | replica1 | default  | 100      | 6        | host=primary dbname=repmgr user=repmgr
102 | replica1 | primary | * running |          | default  | 100      | 6        | host=replica1 dbname=repmgr user=repmgr
103 | replica2 | standby |   running | replica1 | default  | 100      | 6        | host=replica2 dbname=repmgr user=repmgr
104 | repmgr   | witness | * running | replica1 | default  | 0        | n/a      | host=repmgr dbname=repmgr user=repmgr

 

이상으로 구성 및 failover test까지 진행했습니다.
다음 글에서 proxy server를 구성해보겠습니다

 ( repmgr 다운로드 )