[Pacemaker & corosync 구성 Vip failover]

postgreSQL 구성중 failback을 위해 pacemaker를 이용하여 구성하게되었습니다.

pacemaker는 Redhat에서 나오는 고가용성 클러스터입니다. 
  • Corosync란: 클러스터 인프라 지원(Quorum 관리, 메시지 관리 등)
  • Pacemaker란: 클러스터 자원 관리자
  • pcs란: corosync와 pacemaker를 손쉽게 관리할 수 있는 management 프로그램

 

 

 테스트 환경

구분
node1
node2
hostname
jh-cluster001
jh-cluster002
OS
centos7.3
centos7.3
Public IP
118.67.132.251
27.96.134.40
Private IP
10.41.43.141
192.168.100.60
10.41.142.140
192.168.100.61
VIP
192.168.100.62
 

 

 

 사전 작업

 

network interface 카드 설정 (jh-cluster-001,jh-cluster-002)

[root@jh-cluster-001 network-scripts]# vi ifcfg-eth1
DEVICE=eth1
BOOTPROTO=static
IPADDR=192.168.100.60
NETMASK=255.255.255.0
ONBOOT=yes

[root@jh-cluster-001 network-scripts]# ifup ifcfg-eth1


[root@jh-cluster-001002 network-scripts]# vi ifcfg-eth1
DEVICE=eth1
BOOTPROTO=static
IPADDR=192.168.100.61
NETMASK=255.255.255.0
ONBOOT=yes

[root@jh-cluster-001002 network-scripts]# ifup ifcfg-eth1
 
 

 host resolation 설정/ nameserver 설정 (jh-cluster-001,jh-cluster-002)

[root@jh-cluster001 ~]# vi /etc/hosts
127.0.0.1   localhost localhost.localdomain localhost4 localhost4.localdomain4
::1         localhost localhost.localdomain localhost6 localhost6.localdomain6
192.168.100.60 jh-cluster001
192.168.100.61 jh-cluster002


[root@jh-cluster001 ~]# vi /etc/resolv.conf
; generated by /usr/sbin/dhclient-script
search ncloud.com
nameserver 10.250.255.11
nameserver 10.250.255.12
nameserver 8.8.8.8


[root@jh-cluster-001 network-scripts]# ping jh-cluster-002
PING jh-cluster-002 (192.168.100.61) 56(84) bytes of data.
64 bytes from jh-cluster-002 (192.168.100.61): icmp_seq=1 ttl=64 time=0.739 ms
64 bytes from jh-cluster-002 (192.168.100.61): icmp_seq=2 ttl=64 time=0.277 ms
64 bytes from jh-cluster-002 (192.168.100.61): icmp_seq=3 ttl=64 time=0.287 ms
^C
--- jh-cluster-002 ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time 1999ms
rtt min/avg/max/mdev = 0.277/0.434/0.739/0.216 ms
 

 

PKG install (jh-cluster-001,jh-cluster-002)

[root@jh-cluster001 ~]# yum install -y pacemaker corosync pcs psmisc policycoreutils-python

##pcs daemon 실행

[root@jh-cluster-001 ~]# systemctl  start pcsd.service
[root@jh-cluster-001 ~]# systemctl status pcsd.service
● pcsd.service - PCS GUI and remote configuration interface
   Loaded: loaded (/usr/lib/systemd/system/pcsd.service; disabled; vendor preset: disabled)
   Active: active (running) since Thu 2021-08-19 10:43:36 KST; 7s ago
     Docs: man:pcsd(8)
           man:pcs(8)
Main PID: 52561 (pcsd)
   CGroup: /system.slice/pcsd.service
           └─52561 /usr/bin/ruby /usr/lib/pcsd/pcsd

Aug 19 10:43:35 jh-cluster-001 systemd[1]: Starting PCS GUI a...
Aug 19 10:43:36 jh-cluster-001 systemd[1]: Started PCS GUI an...
Hint: Some lines were ellipsized, use -l to show in full.


##시스템 재기동 시에도 동작 될 수 있도록 설정
[root@jh-cluster-001 ~]# systemctl enable pcsd.service
Created symlink from /etc/systemd/system/multi-user.target.wants/pcsd.service to /usr/lib/systemd/system/pcsd.service.

 

 

hacluster 계정 패스워드 설정 (jh-cluster001,jh-cluster002)

>>PKG가 설치되면 자동으로 hacluster 계정이 생성

[root@jh-cluster001 ~]# systemctl enable pcsd.service
Created symlink from /etc/systemd/system/multi-user.target.wants/pcsd.service to /usr/lib/systemd/system/pcsd.service.


[root@jh-cluster001 ~]# cat /etc/passwd | grep "hacluster"
hacluster:x:189:189:cluster user:/home/hacluster:/sbin/nologin


[root@jh-cluster001 ~]# passwd hacluster
Changing password for user hacluster.
New password:
BAD PASSWORD: The password is shorter than 8 characters
Retype new password:
passwd: all authentication tokens updated successfully.

 

corosync 설정 (한쪽 노드에서만 jh-cluster001)

[root@jh-cluster001 ~]# pcs cluster auth jh-cluster001 jh-cluster002
Username: hacluster
Password:
jh-cluster002: Authorized
jh-cluster001: Authorized

 

 

corosync 구성 및 동기화 (jh-cluster-001)

[root@jh-cluster001 ~]# pcs cluster setup --name tcluster jh-cluster001 jh-cluster002
Destroying cluster on nodes: jh-cluster001, jh-cluster002...
jh-cluster-002: Stopping Cluster (pacemaker)...
jh-cluster-001: Stopping Cluster (pacemaker)...
jh-cluster-002: Successfully destroyed cluster
jh-cluster-001: Successfully destroyed cluster

Sending 'pacemaker_remote authkey' to 'jh-cluster001', 'jh-cluster002'
jh-cluster-001: successful distribution of the file 'pacemaker_remote authkey'
jh-cluster-002: successful distribution of the file 'pacemaker_remote authkey'
Sending cluster config files to the nodes...
jh-cluster-001: Succeeded
jh-cluster-002: Succeeded

Synchronizing pcsd certificates on nodes jh-cluster001, jh-cluster002...
jh-cluster-002: Success
jh-cluster-001: Success
Restarting pcsd on the nodes in order to reload the certificates...
jh-cluster-002: Success
jh-cluster-001: Success

 

 

cluster 동작 확인 (jh-cluster-001)

[root@jh-cluster001 ~]# pcs cluster start --all
jh-cluster001: Starting Cluster (corosync)...
jh-cluster002: Starting Cluster (corosync)...
jh-cluster002: Starting Cluster (pacemaker)...
jh-cluster001: Starting Cluster (pacemaker)...

 

 

cluster 통신 확인

<jh-cluster001>
[root@jh-cluster001 ~]# corosync-cfgtool -s
Printing ring status.
Local node ID 1
RING ID 0
    id    = 192.168.100.60
    status    = ring 0 active with no faults


<jh-cluster002>
[root@jh-cluster002 ~]# corosync-cfgtool -s
Printing ring status.
Local node ID 2
RING ID 0
    id    = 192.168.100.61
    status    = ring 0 active with no faults

 

 

멤버쉽과 쿼럼 확인  (jh-cluster-001)

[root@jh-cluster001 ~]# corosync-cmapctl | egrep -i members
runtime.totem.pg.mrp.srp.members.1.config_version (u64) = 0
runtime.totem.pg.mrp.srp.members.1.ip (str) = r(0) ip(192.168.100.60)
runtime.totem.pg.mrp.srp.members.1.join_count (u32) = 1
runtime.totem.pg.mrp.srp.members.1.status (str) = joined
runtime.totem.pg.mrp.srp.members.2.config_version (u64) = 0
runtime.totem.pg.mrp.srp.members.2.ip (str) = r(0) ip(192.168.100.61)
runtime.totem.pg.mrp.srp.members.2.join_count (u32) = 1
runtime.totem.pg.mrp.srp.members.2.status (str) = joined


<jh-cluster001>
[root@jh-cluster001 ~]# pcs status corosync
Membership information
----------------------
    Nodeid      Votes Name
         1          1 jh-cluster001 (local)
         2          1 jh-cluster002


[root@jh-cluster001 ~]# pcs status
Cluster name: tcluster
WARNINGS:
No stonith devices and stonith-enabled is not false
Stack: corosync
Current DC: jh-cluster-001 (version 1.1.23-1.el7_9.1-9acf116022) - partition with quorum
Last updated: Thu Aug 19 11:06:18 2021
Last change: Thu Aug 19 11:03:39 2021 by hacluster via crmd on jh-cluster-001
2 nodes configured
0 resource instances configured
Online: [ jh-cluster001 jh-cluster002 ]
No resources
Daemon Status:
  corosync: active/disabled
  pacemaker: active/disabled
  pcsd: active/enabled


<jh-cluster002>
[root@jh-cluster002 ~]# pcs status corosync

Membership information
----------------------
    Nodeid      Votes Name
         1          1 jh-cluster001
         2          1 jh-cluster002 (local)


[root@jh-cluster002 ~]# pcs status
Cluster name: tcluster

WARNINGS:
No stonith devices and stonith-enabled is not false

Stack: corosync
Current DC: jh-cluster-001 (version 1.1.23-1.el7_9.1-9acf116022) - partition with quorum
Last updated: Thu Aug 19 11:07:15 2021
Last change: Thu Aug 19 11:03:39 2021 by hacluster via crmd on jh-cluster-001

2 nodes configured
0 resource instances configured

Online: [ jh-cluster001 jh-cluster002 ]

No resources


Daemon Status:
  corosync: active/disabled
  pacemaker: active/disabled
  pcsd: active/disabled

 

 

최종 유효성 검사 (jh-cluster-001)

active/passive 클러스터 생성
※데이터 무결성 확보를 위해 STONITH가 활성화 되어 있어 처음 실행 시 오류 발생 -> STONITH를 비활성화 하고 다시 실행하면 오류 발생하지 않습니다.
[root@jh-cluster001 ~]# crm_verify -L -V
   error: unpack_resources:    Resource start-up disabled since no STONITH resources have been defined
   error: unpack_resources:    Either configure some or disable STONITH with the stonith-enabled option
   error: unpack_resources:    NOTE: Clusters with shared data need STONITH to ensure data integrity
Errors found during check: config not valid

[root@jh-cluster001 ~]# pcs property set stonith-enabled=false
[root@jh-cluster001 ~]# crm_verify -L -V
 
 

VIP 생성 (jh-cluster-001)

##VirtualIP라는 resource를 VIP:192.168.100.62 , netmask-32bit, 모니터링 interval-30초로 생성
[root@jh-cluster-001 ~]# pcs resource create VirtualIP ocf:heartbeat:IPaddr2 ip=192.168.100.62 cidr_netmask=24 op monitor interval=30s


##cluster 에서 생성 된 resource 확인
[root@jh-cluster001 ~]# pcs status
Cluster name: tcluster
Stack: corosync
Current DC: jh-cluster002 (version 1.1.23-1.el7_9.1-9acf116022) - partition with quorum
Last updated: Mon Aug 30 14:48:16 2021
Last change: Mon Aug 30 14:48:09 2021 by root via cibadmin on jh-cluster001

2 nodes configured
1 resource instance configured

Online: [ jh-cluster001 jh-cluster002 ]

Full list of resources:

VirtualIP    (ocf::heartbeat:IPaddr2):    Started jh-cluster001  //어느 노드에 VIP가 있는지 표시

Daemon Status:
  corosync: active/disabled
  pacemaker: active/disabled
  pcsd: active/enabled



##VIP 설정 여부 확인
[root@jh-cluster001 ~]# ip a | grep secondary
    inet 192.168.100.62/24 brd 192.168.100.255 scope global secondary eth1



##vip 정보 확인
[root@jh-cluster-001 ~]# pcs resource show VirtualIP

Resource: VirtualIP (class=ocf provider=heartbeat type=IPaddr2)
  Attributes: cidr_netmask=20 ip=192.168.100.62
  Operations: monitor interval=30s (VirtualIP-monitor-interval-30s)
              start interval=0s timeout=20s (VirtualIP-start-interval-0s)
              stop interval=0s timeout=20s (VirtualIP-stop-interval-0s)

 

 

 

 VIP 삭제 방법

 

VIP 삭제 

[root@cluster01 ~]# pcs status 
Cluster name: tcluster 
Stack: corosync 
Current DC: cluster01 (version 1.1.19-8.el7_6.4-c3c624ea3d) - partition with quorum 
Last updated: Tue Jun 25 12:03:30 2019 
Last change: Tue Jun 25 12:03:25 2019 by root via cibadmin on cluster01 

2 nodes configured 
1 resource configured 

Online: [ cluster01 cluster02 ] 

Full list of resources: 

VirtualIP    (ocf::heartbeat:IPaddr2):    Started cluster01 


Daemon Status:   
  corosync: active/disabled   
  pacemaker: active/disabled   
  pcsd: active/enabled 


[root@cluster01 ~]# pcs resource delete VirtualIP 
Attempting to stop: VirtualIP... Stopped 



[root@cluster01 ~]# ip a | grep secondary 

[root@cluster01 ~]# pcs status 
Cluster name: tcluster 
Stack: corosync 
Current DC: cluster01 (version 1.1.19-8.el7_6.4-c3c624ea3d) - partition with quorum 
Last updated: Tue Jun 25 12:03:55 2019 
Last change: Tue Jun 25 12:03:37 2019 by root via cibadmin on cluster01 

2 nodes configured
0 resources configured 


Online: [ cluster01 cluster02 ] 

No resources 

Daemon Status:   
  corosync: active/disabled   
  pacemaker: active/disabled   
  pcsd: active/enabled 

 

resource  "ocf:heartbeat:IPaddr2" 의 filed정보는 아래와 같습니다. 
ocf:heartbeat:IPaddr2
                   -> 리소스의 스크립트의 이름 
           ->  리소스의 프로바이더 
  ┖->  리소스의 standard 정보 

 

리소스의 standard 정보 확인 방법 

[root@jh-cluster001 ~]# pcs resource standards
lsb
ocf
service
systemd
 

리소스의 프로바이더 확인

[root@jh-cluster001 ~]# pcs resource providers
heartbeat
openstack
pacemaker

 

 

리소스의 스크립트의 이름확인

[root@jh-cluster001 ~]# pcs resource agents ocf:heartbeat
aliyun-vpc-move-ip
apache
aws-vpc-move-ip
aws-vpc-route53
awseip
awsvip
azure-events
azure-lb
clvm
conntrackd
CTDB
db2
Delay
dhcpd
docker
Dummy
ethmonitor
exportfs
Filesystem
galera
garbd
iface-vlan
IPaddr
IPaddr2
IPsrcaddr
iSCSILogicalUnit
iSCSITarget
LVM
LVM-activate
lvmlockd
MailTo
mysql
nagios
named
nfsnotify
nfsserver
nginx
NodeUtilization
oraasm
oracle
oralsnr
pgsql
portblock
postfix
rabbitmq-cluster
redis
Route
rsyncd
SendArp
slapd
Squid
sybaseASE
symlink
tomcat
vdo-vol
VirtualDomain
Xinetd

 

 

pcs config 보는 방법

[root@jh-cluster-001 ~]# pcs config show
Cluster Name: tcluster
Corosync Nodes:
jh-cluster-001 jh-cluster-002
Pacemaker Nodes:
jh-cluster-001 jh-cluster-002

Resources:
Resource: VirtualIP (class=ocf provider=heartbeat type=IPaddr2)
  Attributes: cidr_netmask=20 ip=192.168.100.62
  Operations: monitor interval=30s (VirtualIP-monitor-interval-30s)
              start interval=0s timeout=20s (VirtualIP-start-interval-0s)
              stop interval=0s timeout=20s (VirtualIP-stop-interval-0s)

Stonith Devices:
Fencing Levels:

Location Constraints:
Ordering Constraints:
Colocation Constraints:
Ticket Constraints:

Alerts:
No alerts defined

Resources Defaults:
No defaults set
Operations Defaults:
No defaults set

Cluster Properties:
cluster-infrastructure: corosync
cluster-name: tcluster
dc-version: 1.1.23-1.el7_9.1-9acf116022
have-watchdog: false
stonith-enabled: false

Quorum:
  Options:

 

 

■ Failover test

>>Cluster001 정지시켜 failover 테스트 진행

<jh-cluster001>
##jh-clusster001 정지
[root@jh-cluster001 ~]# pcs cluster stop jh-cluster001
jh-cluster001: Stopping Cluster (pacemaker)...
jh-cluster001: Stopping Cluster (corosync)...



##Cluster 동작이 정지되면 상태 확인 불가
[root@jh-cluster001 ~]# pcs status
Error: cluster is not currently running on this node


[root@jh-cluster001 ~]# ip a | grep secondary

 

<jh-cluster002>
[root@jh-cluster002 ~]# pcs status
Cluster name: tcluster
WARNINGS:
No stonith devices and stonith-enabled is not false

Stack: corosync
Current DC: jh-cluster002 (version 1.1.23-1.el7_9.1-9acf116022) - partition with quorum
Last updated: Mon Aug 30 14:45:57 2021
Last change: Mon Aug 30 14:44:05 2021 by hacluster via crmd on jh-cluster002

2 nodes configured
0 resource instances configured

Online: [ jh-cluster001 jh-cluster002 ]

No resources

Daemon Status:
  corosync: active/disabled
  pacemaker: active/disabled
  pcsd: active/enabled
[root@jh-cluster002 ~]# pcs status
Cluster name: tcluster
Stack: corosync
Current DC: jh-cluster002 (version 1.1.23-1.el7_9.1-9acf116022) - partition with quorum
Last updated: Mon Aug 30 14:59:33 2021
Last change: Mon Aug 30 14:48:09 2021 by root via cibadmin on jh-cluster001

2 nodes configured
1 resource instance configured

Online: [ jh-cluster002 ]
OFFLINE: [ jh-cluster001 ]

Full list of resources:

VirtualIP    (ocf::heartbeat:IPaddr2):    Started jh-cluster002

Daemon Status:
  corosync: active/disabled
  pacemaker: active/disabled
  pcsd: active/enabled


##jh-cluster001에 있던 VIP가 넘어온것 확인
[root@jh-cluster002 ~]# ip a | grep secondary
    inet 192.168.100.62/24 brd 192.168.100.255 scope global secondary eth1
 
 

※Cluster001을 다시 실행해도 VIP는 계속 Cluster002에서 동작되고 VIP절체되지 않음.

<jh-cluster001>
[root@jh-cluster001 ~]# pcs cluster start jh-cluster001
jh-cluster001: Starting Cluster (corosync)...
jh-cluster001: Starting Cluster (pacemaker)...


[root@jh-cluster001 ~]# pcs status; ip a | grep secondary
Cluster name: tcluster
Stack: corosync
Current DC: jh-cluster002 (version 1.1.23-1.el7_9.1-9acf116022) - partition with quorum
Last updated: Mon Aug 30 15:03:46 2021
Last change: Mon Aug 30 14:48:09 2021 by root via cibadmin on jh-cluster001
2 nodes configured
1 resource instance configured
Online: [ jh-cluster001 jh-cluster002 ]
Full list of resources:
VirtualIP    (ocf::heartbeat:IPaddr2):    Started jh-cluster002
Daemon Status:
  corosync: active/disabled
  pacemaker: active/disabled
  pcsd: active/enabled
 
 
<jh-cluster002>
[root@jh-cluster002 ~]# pcs status
Cluster name: tcluster
Stack: corosync
Current DC: jh-cluster002 (version 1.1.23-1.el7_9.1-9acf116022) - partition with quorum
Last updated: Mon Aug 30 15:04:34 2021
Last change: Mon Aug 30 14:48:09 2021 by root via cibadmin on jh-cluster001
2 nodes configured
1 resource instance configured
Online: [ jh-cluster001 jh-cluster002 ]
Full list of resources:
VirtualIP    (ocf::heartbeat:IPaddr2):    Started jh-cluster002
Daemon Status:
  corosync: active/disabled
  pacemaker: active/disabled
  pcsd: active/enabled
 
 
 
 

188. [Cloud] 초보자를 위한 Pacemaker + corosync 기반의 클러스터 구축 및 Virtual IP 설정하기

이번 포스트에서는 pacemaker와 corosync로 Virtual IP를 설정하는 방법을 설명한다. 너무 쉬워서 이걸 ...

blog.naver.com

https://serverstudy.tistory.com/208

 

CentOS7 Pacemaker + Corosync

OS : CentOS7.x 0) Pacemaker, corosync 란? pacemaker를 사용할 때는 corosync라는 도구와 함께 쓰인다. corosync : 저수준의 인프라를 관리해주는 역할, 구체적으로 "노드 간의 멤버쉽, 쿼럼, 메시징" 정도가 될..

serverstudy.tistory.com

 
 

[Centos/RHEL] HAcluster - pacemaker (1)

redhat 계열에서의 Pacemaker를 설치 하는 방법이다. pacemaker는 Redhat에서 나오는 고가용성클러스터이다. Corosync란: 클러스터 인프라 지원(Quorum 관리, 메시지 관리 등) Pacemaker란: 클러스터 자원 관리자.

louky0714.tistory.com