由于是在个人虚拟机环境,本文档中主备切换未遇到报错,如果遇到报错,可按以下步骤进行排查。
环境说明
角色 | 主机名 | IPADDR | OS Version | DB version |
主 | opendb01 | 192.168.40.160 | Centos7.9 x86_64 | openGauss6.0.0 |
从 | opendb02 | 192.168.40.161 | Centos7.9 x86_64 | openGauss6.0.0 |
主备节点都为Primary的原因
主备节点都为Primary的原因如下:
业务压力下,主备实例切换时间长,这种情况不需要处理。
其他备机正在build的情况下,主机需要发送日志到备机后,才能降备,导致主备切换时间长。这种情况不需要处理,但应尽量避免build过程中进行主备切换。
切换过程中,因网络故障、磁盘满等原因造成主备实例连接断开,出现双主现象。
注意: 出现双主状态后,请按如下步骤恢复成正常的主备状态。否则可能会造成数据丢失。
处理步骤
查看主备情况
任一节点操作均可,若查询结果显示两个实例的状态都为Primary,这种状态为异常状态。
su - omm
gs_om -t status --detail
输出如下:
[omm@opendb01 ~]$ gs_om -t status --detail
[ Cluster State ]
cluster_state : Normal
redistributing : No
current_az : AZ_ALL
[ Datanode State ]
node node_ip port instance state
-----------------------------------------------------------------------------------------------
1 opendb01 192.168.40.160 5432 6001 /opt/huawei/install/data/dn P Primary Normal
2 opendb02 192.168.40.161 5432 6002 /opt/huawei/install/data/dn P Primary Normal
确定降为备机的节点,在节点上执行如下命令关闭服务
su - omm
gs_ctl stop -D /opt/huawei/install/data/dn
参数说明:-D /opt/huawei/install/data/dn 即-D 备节点的数据目录
输出如下:
[omm@opendb01 ~]$ gs_ctl stop -D /opt/huawei/install/data/dn
[2024-10-25 05:10:19.526][8045][][gs_ctl]: gs_ctl stopped ,datadir is /opt/huawei/install/data/dn
waiting for server to shut down.... done
server stopped
以standby模式启动备节点
su - omm
gs_ctl start -D /opt/huawei/install/data/dn -M standby
参数说明:-D /opt/huawei/install/data/dn 即-D 备节点的数据目录
-M standby 即模式
输出如下:
[omm@opendb01 ~]$ gs_ctl start -D /opt/huawei/install/data/dn -M standby
[2024-10-25 05:11:53.989][8093][][gs_ctl]: gs_ctl started,datadir is /opt/huawei/install/data/dn
[2024-10-25 05:11:54.030][8093][][gs_ctl]: waiting for server to start...
.0 LOG: [Alarm Module]can not read GAUSS_WARNING_TYPE env.
0 LOG: [Alarm Module]Host Name: opendb01
0 LOG: [Alarm Module]Host IP: opendb01. Copy hostname directly in case of taking 10s to use 'gethostbyname' when /etc/hosts does not contain <HOST IP>
0 LOG: [Alarm Module]Cluster Name: cluster_dxj
0 LOG: [Alarm Module]Invalid data in AlarmItem file! Read alarm English name failed! line: 58
0 WARNING: failed to open feature control file, please check whether it exists: FileName=gaussdb.version, Errno=2, Errmessage=No such file or directory.
0 WARNING: failed to parse feature control file: gaussdb.version.
0 WARNING: Failed to load the product control file, so gaussdb cannot distinguish product version.
2024-10-25 05:11:54.111 671ab81a.1 [unknown] 140462587550400 [unknown] 0 dn_6001_6002 00000 0 [BACKEND] LOG: base_page_saved_interval is 400, ori is 400.
2024-10-25 05:11:54.116 671ab81a.1 [unknown] 140462587550400 [unknown] 0 dn_6001_6002 DB010 0 [REDO] LOG: Recovery parallelism, cpu count = 1, max = 4, actual = 1
2024-10-25 05:11:54.116 671ab81a.1 [unknown] 140462587550400 [unknown] 0 dn_6001_6002 DB010 0 [REDO] LOG: ConfigRecoveryParallelism, true_max_recovery_parallelism:4, max_recovery_parallelism:4
2024-10-25 05:11:54.123 671ab81a.1 [unknown] 140462587550400 [unknown] 0 dn_6001_6002 00000 0 [BACKEND] LOG: [Alarm Module]can not read GAUSS_WARNING_TYPE env.
2024-10-25 05:11:54.123 671ab81a.1 [unknown] 140462587550400 [unknown] 0 dn_6001_6002 00000 0 [BACKEND] LOG: [Alarm Module]Host Name: opendb01
2024-10-25 05:11:54.123 671ab81a.1 [unknown] 140462587550400 [unknown] 0 dn_6001_6002 00000 0 [BACKEND] LOG: [Alarm Module]Host IP: opendb01. Copy hostname directly in case oftaking 10s to use 'gethostbyname' when /etc/hosts does not contain <HOST IP>
2024-10-25 05:11:54.123 671ab81a.1 [unknown] 140462587550400 [unknown] 0 dn_6001_6002 00000 0 [BACKEND] LOG: [Alarm Module]Cluster Name: cluster_dxj
2024-10-25 05:11:54.123 671ab81a.1 [unknown] 140462587550400 [unknown] 0 dn_6001_6002 00000 0 [BACKEND] LOG: [Alarm Module]Invalid data in AlarmItem file! Read alarm English name failed! line: 58
2024-10-25 05:11:54.125 671ab81a.1 [unknown] 140462587550400 [unknown] 0 dn_6001_6002 00000 0 [BACKEND] LOG: loaded library "security_plugin"
2024-10-25 05:11:54.128 671ab81a.1 [unknown] 140462587550400 [unknown] 0 dn_6001_6002 01000 0 [BACKEND] WARNING: could not create any HA TCP/IP sockets
2024-10-25 05:11:54.130 671ab81a.1 [unknown] 140462587550400 [unknown] 0 dn_6001_6002 00000 0 [BACKEND] LOG: InitNuma numaNodeNum: 1 numa_distribute_mode: none inheritThreadPool: 0.
2024-10-25 05:11:54.130 671ab81a.1 [unknown] 140462587550400 [unknown] 0 dn_6001_6002 01000 0 [BACKEND] WARNING: Failed to initialize the memory protect for g_instance.attr.attr_storage.cstore_buffers (1024 Mbytes) or shared memory (3630 Mbytes) is larger.
2024-10-25 05:11:54.192 671ab81a.1 [unknown] 140462587550400 [unknown] 0 dn_6001_6002 00000 0 [CACHE] LOG: set data cache size(805306368)
2024-10-25 05:11:54.532 671ab81a.1 [unknown] 140462587550400 [unknown] 0 dn_6001_6002 00000 0 [SEGMENT_PAGE] LOG: Segment-page constants: DF_MAP_SIZE: 8156, DF_MAP_BIT_CNT: 65248, DF_MAP_GROUP_EXTENTS: 4175872, IPBLOCK_SIZE: 8168, EXTENTS_PER_IPBLOCK: 1021, IPBLOCK_GROUP_SIZE: 4090, BMT_HEADER_LEVEL0_TOTAL_PAGES: 8323072, BktMapEntryNumberPerBlock: 2038, BktMapBlockNumber: 25, BktBitMaxMapCnt: 512
2024-10-25 05:11:54.571 671ab81a.1 [unknown] 140462587550400 [unknown] 0 dn_6001_6002 00000 0 [BACKEND] LOG: gaussdb: fsync file "/opt/huawei/install/data/dn/gaussdb.state.temp" success
2024-10-25 05:11:54.571 671ab81a.1 [unknown] 140462587550400 [unknown] 0 dn_6001_6002 00000 0 [BACKEND] LOG: create gaussdb state file success: db state(STARTING_STATE), server mode(Standby), connection index(1)
2024-10-25 05:11:54.572 671ab81a.1 [unknown] 140462587550400 [unknown] 0 dn_6001_6002 00000 0 [BACKEND] LOG: max_safe_fds = 974, usable_fds = 1000, already_open = 16
[2024-10-25 05:11:55.037][8093][][gs_ctl]: done
[2024-10-25 05:11:55.037][8093][][gs_ctl]: server started (/opt/huawei/install/data/dn)
[omm@opendb01 ~]$
保存数据库主备机器信息
任一节点操作即可,会动态地保存所有节点机器信息
su - omm
gs_om -t refreshconf
输出如下:
[ ]$ gs_om -t refreshconf
Generating dynamic configuration file for all nodes.
Successfully generated dynamic configuration file.
查看主备情况
任一节点操作均可,确认实例状态恢复,现在161为主,160为备
su - omm
gs_om -t status --detail
输出如下:
[omm@opendb01 ~]$ gs_om -t status --detail
[ Cluster State ]
cluster_state : Normal
redistributing : No
current_az : AZ_ALL
[ Datanode State ]
node node_ip port instance state
-----------------------------------------------------------------------------------------------
1 opendb01 192.168.40.160 5432 6001 /opt/huawei/install/data/dn P Primary Normal
2 opendb02 192.168.40.161 5432 6002 /opt/huawei/install/data/dn S Standby Normal
点击阅读原文跳转作者文章