postgreSQL 구성중 failback을 위해 pacemaker를 이용하여 구성하게되었습니다.
pacemaker는 Redhat에서 나오는 고가용성 클러스터입니다.
-
Corosync란: 클러스터 인프라 지원(Quorum 관리, 메시지 관리 등)
-
Pacemaker란: 클러스터 자원 관리자
-
pcs란: corosync와 pacemaker를 손쉽게 관리할 수 있는 management 프로그램
테스트 환경
구분
|
node1
|
node2
|
hostname
|
jh-cluster001
|
jh-cluster002
|
OS
|
centos7.3
|
centos7.3
|
Public IP
|
118.67.132.251
|
27.96.134.40
|
Private IP
|
10.41.43.141
192.168.100.60
|
10.41.142.140
192.168.100.61
|
VIP
|
192.168.100.62
|
|
사전 작업
■ network interface 카드 설정 (jh-cluster-001,jh-cluster-002)
[root@jh-cluster-001 network-scripts]# vi ifcfg-eth1
DEVICE=eth1
BOOTPROTO=static
IPADDR=192.168.100.60
NETMASK=255.255.255.0
ONBOOT=yes
[root@jh-cluster-001 network-scripts]# ifup ifcfg-eth1
[root@jh-cluster-001002 network-scripts]# vi ifcfg-eth1
DEVICE=eth1
BOOTPROTO=static
IPADDR=192.168.100.61
NETMASK=255.255.255.0
ONBOOT=yes
[root@jh-cluster-001002 network-scripts]# ifup ifcfg-eth1
■ host resolation 설정/ nameserver 설정 (jh-cluster-001,jh-cluster-002)
[root@jh-cluster001 ~]# vi /etc/hosts
127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4
::1 localhost localhost.localdomain localhost6 localhost6.localdomain6
192.168.100.60 jh-cluster001
192.168.100.61 jh-cluster002
[root@jh-cluster001 ~]# vi /etc/resolv.conf
; generated by /usr/sbin/dhclient-script
search ncloud.com
nameserver 10.250.255.11
nameserver 10.250.255.12
nameserver 8.8.8.8
[root@jh-cluster-001 network-scripts]# ping jh-cluster-002
PING jh-cluster-002 (192.168.100.61) 56(84) bytes of data.
64 bytes from jh-cluster-002 (192.168.100.61): icmp_seq=1 ttl=64 time=0.739 ms
64 bytes from jh-cluster-002 (192.168.100.61): icmp_seq=2 ttl=64 time=0.277 ms
64 bytes from jh-cluster-002 (192.168.100.61): icmp_seq=3 ttl=64 time=0.287 ms
^C
--- jh-cluster-002 ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time 1999ms
rtt min/avg/max/mdev = 0.277/0.434/0.739/0.216 ms
■PKG install (jh-cluster-001,jh-cluster-002)
[root@jh-cluster001 ~]# yum install -y pacemaker corosync pcs psmisc policycoreutils-python
##pcs daemon 실행
[root@jh-cluster-001 ~]# systemctl start pcsd.service
[root@jh-cluster-001 ~]# systemctl status pcsd.service
● pcsd.service - PCS GUI and remote configuration interface
Loaded: loaded (/usr/lib/systemd/system/pcsd.service; disabled; vendor preset: disabled)
Active: active (running) since Thu 2021-08-19 10:43:36 KST; 7s ago
Docs: man:pcsd(8)
man:pcs(8)
Main PID: 52561 (pcsd)
CGroup: /system.slice/pcsd.service
└─52561 /usr/bin/ruby /usr/lib/pcsd/pcsd
Aug 19 10:43:35 jh-cluster-001 systemd[1]: Starting PCS GUI a...
Aug 19 10:43:36 jh-cluster-001 systemd[1]: Started PCS GUI an...
Hint: Some lines were ellipsized, use -l to show in full.
##시스템 재기동 시에도 동작 될 수 있도록 설정
[root@jh-cluster-001 ~]# systemctl enable pcsd.service
Created symlink from /etc/systemd/system/multi-user.target.wants/pcsd.service to /usr/lib/systemd/system/pcsd.service.
■ hacluster 계정 패스워드 설정 (jh-cluster001,jh-cluster002)
>>PKG가 설치되면 자동으로 hacluster 계정이 생성
[root@jh-cluster001 ~]# systemctl enable pcsd.service
Created symlink from /etc/systemd/system/multi-user.target.wants/pcsd.service to /usr/lib/systemd/system/pcsd.service.
[root@jh-cluster001 ~]# cat /etc/passwd | grep "hacluster"
hacluster:x:189:189:cluster user:/home/hacluster:/sbin/nologin
[root@jh-cluster001 ~]# passwd hacluster
Changing password for user hacluster.
New password:
BAD PASSWORD: The password is shorter than 8 characters
Retype new password:
passwd: all authentication tokens updated successfully.
■corosync 설정 (한쪽 노드에서만 jh-cluster001)
[root@jh-cluster001 ~]# pcs cluster auth jh-cluster001 jh-cluster002
Username: hacluster
Password:
jh-cluster002: Authorized
jh-cluster001: Authorized
■corosync 구성 및 동기화 (jh-cluster-001)
[root@jh-cluster001 ~]# pcs cluster setup --name tcluster jh-cluster001 jh-cluster002
Destroying cluster on nodes: jh-cluster001, jh-cluster002...
jh-cluster-002: Stopping Cluster (pacemaker)...
jh-cluster-001: Stopping Cluster (pacemaker)...
jh-cluster-002: Successfully destroyed cluster
jh-cluster-001: Successfully destroyed cluster
Sending 'pacemaker_remote authkey' to 'jh-cluster001', 'jh-cluster002'
jh-cluster-001: successful distribution of the file 'pacemaker_remote authkey'
jh-cluster-002: successful distribution of the file 'pacemaker_remote authkey'
Sending cluster config files to the nodes...
jh-cluster-001: Succeeded
jh-cluster-002: Succeeded
Synchronizing pcsd certificates on nodes jh-cluster001, jh-cluster002...
jh-cluster-002: Success
jh-cluster-001: Success
Restarting pcsd on the nodes in order to reload the certificates...
jh-cluster-002: Success
jh-cluster-001: Success
■cluster 동작 확인 (jh-cluster-001)
[root@jh-cluster001 ~]# pcs cluster start --all
jh-cluster001: Starting Cluster (corosync)...
jh-cluster002: Starting Cluster (corosync)...
jh-cluster002: Starting Cluster (pacemaker)...
jh-cluster001: Starting Cluster (pacemaker)...
■cluster 통신 확인
<jh-cluster001>
[root@jh-cluster001 ~]# corosync-cfgtool -s
Printing ring status.
Local node ID 1
RING ID 0
id = 192.168.100.60
status = ring 0 active with no faults
<jh-cluster002>
[root@jh-cluster002 ~]# corosync-cfgtool -s
Printing ring status.
Local node ID 2
RING ID 0
id = 192.168.100.61
status = ring 0 active with no faults
■멤버쉽과 쿼럼 확인 (jh-cluster-001)
[root@jh-cluster001 ~]# corosync-cmapctl | egrep -i members
runtime.totem.pg.mrp.srp.members.1.config_version (u64) = 0
runtime.totem.pg.mrp.srp.members.1.ip (str) = r(0) ip(192.168.100.60)
runtime.totem.pg.mrp.srp.members.1.join_count (u32) = 1
runtime.totem.pg.mrp.srp.members.1.status (str) = joined
runtime.totem.pg.mrp.srp.members.2.config_version (u64) = 0
runtime.totem.pg.mrp.srp.members.2.ip (str) = r(0) ip(192.168.100.61)
runtime.totem.pg.mrp.srp.members.2.join_count (u32) = 1
runtime.totem.pg.mrp.srp.members.2.status (str) = joined
<jh-cluster001>
[root@jh-cluster001 ~]# pcs status corosync
Membership information
----------------------
Nodeid Votes Name
1 1 jh-cluster001 (local)
2 1 jh-cluster002
[root@jh-cluster001 ~]# pcs status
Cluster name: tcluster
WARNINGS:
No stonith devices and stonith-enabled is not false
Stack: corosync
Current DC: jh-cluster-001 (version 1.1.23-1.el7_9.1-9acf116022) - partition with quorum
Last updated: Thu Aug 19 11:06:18 2021
Last change: Thu Aug 19 11:03:39 2021 by hacluster via crmd on jh-cluster-001
2 nodes configured
0 resource instances configured
Online: [ jh-cluster001 jh-cluster002 ]
No resources
Daemon Status:
corosync: active/disabled
pacemaker: active/disabled
pcsd: active/enabled
<jh-cluster002>
[root@jh-cluster002 ~]# pcs status corosync
Membership information
----------------------
Nodeid Votes Name
1 1 jh-cluster001
2 1 jh-cluster002 (local)
[root@jh-cluster002 ~]# pcs status
Cluster name: tcluster
WARNINGS:
No stonith devices and stonith-enabled is not false
Stack: corosync
Current DC: jh-cluster-001 (version 1.1.23-1.el7_9.1-9acf116022) - partition with quorum
Last updated: Thu Aug 19 11:07:15 2021
Last change: Thu Aug 19 11:03:39 2021 by hacluster via crmd on jh-cluster-001
2 nodes configured
0 resource instances configured
Online: [ jh-cluster001 jh-cluster002 ]
No resources
Daemon Status:
corosync: active/disabled
pacemaker: active/disabled
pcsd: active/disabled
■최종 유효성 검사 (jh-cluster-001)
active/passive 클러스터 생성
※데이터 무결성 확보를 위해 STONITH가 활성화 되어 있어 처음 실행 시 오류 발생 -> STONITH를 비활성화 하고 다시 실행하면 오류 발생하지 않습니다.
[root@jh-cluster001 ~]# crm_verify -L -V
error: unpack_resources: Resource start-up disabled since no STONITH resources have been defined
error: unpack_resources: Either configure some or disable STONITH with the stonith-enabled option
error: unpack_resources: NOTE: Clusters with shared data need STONITH to ensure data integrity
Errors found during check: config not valid
[root@jh-cluster001 ~]# pcs property set stonith-enabled=false
[root@jh-cluster001 ~]# crm_verify -L -V
VIP 생성 (jh-cluster-001)
##VirtualIP라는 resource를 VIP:192.168.100.62 , netmask-32bit, 모니터링 interval-30초로 생성
[root@jh-cluster-001 ~]# pcs resource create VirtualIP ocf:heartbeat:IPaddr2 ip=192.168.100.62 cidr_netmask=24 op monitor interval=30s
##cluster 에서 생성 된 resource 확인
[root@jh-cluster001 ~]# pcs status
Cluster name: tcluster
Stack: corosync
Current DC: jh-cluster002 (version 1.1.23-1.el7_9.1-9acf116022) - partition with quorum
Last updated: Mon Aug 30 14:48:16 2021
Last change: Mon Aug 30 14:48:09 2021 by root via cibadmin on jh-cluster001
2 nodes configured
1 resource instance configured
Online: [ jh-cluster001 jh-cluster002 ]
Full list of resources:
VirtualIP (ocf::heartbeat:IPaddr2): Started jh-cluster001 //어느 노드에 VIP가 있는지 표시
Daemon Status:
corosync: active/disabled
pacemaker: active/disabled
pcsd: active/enabled
##VIP 설정 여부 확인
[root@jh-cluster001 ~]# ip a | grep secondary
inet 192.168.100.62/24 brd 192.168.100.255 scope global secondary eth1
##vip 정보 확인
[root@jh-cluster-001 ~]# pcs resource show VirtualIP
Resource: VirtualIP (class=ocf provider=heartbeat type=IPaddr2)
Attributes: cidr_netmask=20 ip=192.168.100.62
Operations: monitor interval=30s (VirtualIP-monitor-interval-30s)
start interval=0s timeout=20s (VirtualIP-start-interval-0s)
stop interval=0s timeout=20s (VirtualIP-stop-interval-0s)
VIP 삭제 방법
■ VIP 삭제
[root@cluster01 ~]# pcs status
Cluster name: tcluster
Stack: corosync
Current DC: cluster01 (version 1.1.19-8.el7_6.4-c3c624ea3d) - partition with quorum
Last updated: Tue Jun 25 12:03:30 2019
Last change: Tue Jun 25 12:03:25 2019 by root via cibadmin on cluster01
2 nodes configured
1 resource configured
Online: [ cluster01 cluster02 ]
Full list of resources:
VirtualIP (ocf::heartbeat:IPaddr2): Started cluster01
Daemon Status:
corosync: active/disabled
pacemaker: active/disabled
pcsd: active/enabled
[root@cluster01 ~]# pcs resource delete VirtualIP
Attempting to stop: VirtualIP... Stopped
[root@cluster01 ~]# ip a | grep secondary
[root@cluster01 ~]# pcs status
Cluster name: tcluster
Stack: corosync
Current DC: cluster01 (version 1.1.19-8.el7_6.4-c3c624ea3d) - partition with quorum
Last updated: Tue Jun 25 12:03:55 2019
Last change: Tue Jun 25 12:03:37 2019 by root via cibadmin on cluster01
2 nodes configured
0 resources configured
Online: [ cluster01 cluster02 ]
No resources
Daemon Status:
corosync: active/disabled
pacemaker: active/disabled
pcsd: active/enabled
resource "ocf:heartbeat:IPaddr2" 의 filed정보는 아래와 같습니다.
ocf:heartbeat:IPaddr2
┃ ┃ ┖-> 리소스의 스크립트의 이름
┃ ┖-> 리소스의 프로바이더
┖-> 리소스의 standard 정보
|
■ 리소스의 standard 정보 확인 방법
[root@jh-cluster001 ~]# pcs resource standards
lsb
ocf
service
systemd
■ 리소스의 프로바이더 확인
[root@jh-cluster001 ~]# pcs resource providers
heartbeat
openstack
pacemaker
■ 리소스의 스크립트의 이름확인
[root@jh-cluster001 ~]# pcs resource agents ocf:heartbeat
aliyun-vpc-move-ip
apache
aws-vpc-move-ip
aws-vpc-route53
awseip
awsvip
azure-events
azure-lb
clvm
conntrackd
CTDB
db2
Delay
dhcpd
docker
Dummy
ethmonitor
exportfs
Filesystem
galera
garbd
iface-vlan
IPaddr
IPaddr2
IPsrcaddr
iSCSILogicalUnit
iSCSITarget
LVM
LVM-activate
lvmlockd
MailTo
mysql
nagios
named
nfsnotify
nfsserver
nginx
NodeUtilization
oraasm
oracle
oralsnr
pgsql
portblock
postfix
rabbitmq-cluster
redis
Route
rsyncd
SendArp
slapd
Squid
sybaseASE
symlink
tomcat
vdo-vol
VirtualDomain
Xinetd
■ pcs config 보는 방법
[root@jh-cluster-001 ~]# pcs config show
Cluster Name: tcluster
Corosync Nodes:
jh-cluster-001 jh-cluster-002
Pacemaker Nodes:
jh-cluster-001 jh-cluster-002
Resources:
Resource: VirtualIP (class=ocf provider=heartbeat type=IPaddr2)
Attributes: cidr_netmask=20 ip=192.168.100.62
Operations: monitor interval=30s (VirtualIP-monitor-interval-30s)
start interval=0s timeout=20s (VirtualIP-start-interval-0s)
stop interval=0s timeout=20s (VirtualIP-stop-interval-0s)
Stonith Devices:
Fencing Levels:
Location Constraints:
Ordering Constraints:
Colocation Constraints:
Ticket Constraints:
Alerts:
No alerts defined
Resources Defaults:
No defaults set
Operations Defaults:
No defaults set
Cluster Properties:
cluster-infrastructure: corosync
cluster-name: tcluster
dc-version: 1.1.23-1.el7_9.1-9acf116022
have-watchdog: false
stonith-enabled: false
Quorum:
Options:
■ Failover test
>>Cluster001 정지시켜 failover 테스트 진행
<jh-cluster001>
##jh-clusster001 정지
[root@jh-cluster001 ~]# pcs cluster stop jh-cluster001
jh-cluster001: Stopping Cluster (pacemaker)...
jh-cluster001: Stopping Cluster (corosync)...
##Cluster 동작이 정지되면 상태 확인 불가
[root@jh-cluster001 ~]# pcs status
Error: cluster is not currently running on this node
[root@jh-cluster001 ~]# ip a | grep secondary
<jh-cluster002>
[root@jh-cluster002 ~]# pcs status
Cluster name: tcluster
WARNINGS:
No stonith devices and stonith-enabled is not false
Stack: corosync
Current DC: jh-cluster002 (version 1.1.23-1.el7_9.1-9acf116022) - partition with quorum
Last updated: Mon Aug 30 14:45:57 2021
Last change: Mon Aug 30 14:44:05 2021 by hacluster via crmd on jh-cluster002
2 nodes configured
0 resource instances configured
Online: [ jh-cluster001 jh-cluster002 ]
No resources
Daemon Status:
corosync: active/disabled
pacemaker: active/disabled
pcsd: active/enabled
[root@jh-cluster002 ~]# pcs status
Cluster name: tcluster
Stack: corosync
Current DC: jh-cluster002 (version 1.1.23-1.el7_9.1-9acf116022) - partition with quorum
Last updated: Mon Aug 30 14:59:33 2021
Last change: Mon Aug 30 14:48:09 2021 by root via cibadmin on jh-cluster001
2 nodes configured
1 resource instance configured
Online: [ jh-cluster002 ]
OFFLINE: [ jh-cluster001 ]
Full list of resources:
VirtualIP (ocf::heartbeat:IPaddr2): Started jh-cluster002
Daemon Status:
corosync: active/disabled
pacemaker: active/disabled
pcsd: active/enabled
##jh-cluster001에 있던 VIP가 넘어온것 확인
[root@jh-cluster002 ~]# ip a | grep secondary
inet 192.168.100.62/24 brd 192.168.100.255 scope global secondary eth1
※Cluster001을 다시 실행해도 VIP는 계속 Cluster002에서 동작되고 VIP절체되지 않음.
<jh-cluster001>
[root@jh-cluster001 ~]# pcs cluster start jh-cluster001
jh-cluster001: Starting Cluster (corosync)...
jh-cluster001: Starting Cluster (pacemaker)...
[root@jh-cluster001 ~]# pcs status; ip a | grep secondary
Cluster name: tcluster
Stack: corosync
Current DC: jh-cluster002 (version 1.1.23-1.el7_9.1-9acf116022) - partition with quorum
Last updated: Mon Aug 30 15:03:46 2021
Last change: Mon Aug 30 14:48:09 2021 by root via cibadmin on jh-cluster001
2 nodes configured
1 resource instance configured
Online: [ jh-cluster001 jh-cluster002 ]
Full list of resources:
VirtualIP (ocf::heartbeat:IPaddr2): Started jh-cluster002
Daemon Status:
corosync: active/disabled
pacemaker: active/disabled
pcsd: active/enabled
<jh-cluster002>
[root@jh-cluster002 ~]# pcs status
Cluster name: tcluster
Stack: corosync
Current DC: jh-cluster002 (version 1.1.23-1.el7_9.1-9acf116022) - partition with quorum
Last updated: Mon Aug 30 15:04:34 2021
Last change: Mon Aug 30 14:48:09 2021 by root via cibadmin on jh-cluster001
2 nodes configured
1 resource instance configured
Online: [ jh-cluster001 jh-cluster002 ]
Full list of resources:
VirtualIP (ocf::heartbeat:IPaddr2): Started jh-cluster002
Daemon Status:
corosync: active/disabled
pacemaker: active/disabled
pcsd: active/enabled
https://serverstudy.tistory.com/208
'Others > Opensource Tool' 카테고리의 다른 글
[Ansible - AWX설치] (0) | 2022.04.24 |
---|---|
[pt-query-digest - Mysql slowquery 분석 ] (0) | 2022.03.12 |
[Ansible - MySQL 설치 & 배포] (0) | 2022.02.05 |
[Redis - Sentinel 설정 & Failover test] part 2 (0) | 2022.01.20 |
[Redis - install & replication] part 1 (0) | 2022.01.20 |