openGauss6.0主备节点都为Primary分析处理

文摘   2024-11-12 17:30   广东  

由于是在个人虚拟机环境,本文档中主备切换未遇到报错,如果遇到报错,可按以下步骤进行排查。

环境说明

角色

主机名

IPADDR

OS Version

DB version

opendb01

192.168.40.160

Centos7.9 x86_64

openGauss6.0.0

opendb02

192.168.40.161

Centos7.9 x86_64

openGauss6.0.0

主备节点都为Primary的原因

主备节点都为Primary的原因如下:

  • 业务压力下,主备实例切换时间长,这种情况不需要处理。

  • 其他备机正在build的情况下,主机需要发送日志到备机后,才能降备,导致主备切换时间长。这种情况不需要处理,但应尽量避免build过程中进行主备切换。

  • 切换过程中,因网络故障、磁盘满等原因造成主备实例连接断开,出现双主现象。

注意: 出现双主状态后,请按如下步骤恢复成正常的主备状态。否则可能会造成数据丢失。

处理步骤

查看主备情况

任一节点操作均可,若查询结果显示两个实例的状态都为Primary,这种状态为异常状态。

su - ommgs_om -t status --detail

输出如下:

[omm@opendb01 ~]$ gs_om -t status --detail[   Cluster State   ]
cluster_state : Normalredistributing : Nocurrent_az : AZ_ALL
[ Datanode State ]
node node_ip port instance state-----------------------------------------------------------------------------------------------1 opendb01 192.168.40.160 5432 6001 /opt/huawei/install/data/dn P Primary Normal2  opendb02 192.168.40.161  5432       6002 /opt/huawei/install/data/dn   P Primary Normal

确定降为备机的节点,在节点上执行如下命令关闭服务

su - ommgs_ctl stop -D /opt/huawei/install/data/dn
参数说明:-D /opt/huawei/install/data/dn  即-D 备节点的数据目录

输出如下:

[omm@opendb01 ~]$ gs_ctl stop -D /opt/huawei/install/data/dn[2024-10-25 05:10:19.526][8045][][gs_ctl]: gs_ctl stopped ,datadir is /opt/huawei/install/data/dnwaiting for server to shut down.... doneserver stopped

以standby模式启动备节点

su - ommgs_ctl start -D /opt/huawei/install/data/dn -M standby
参数说明:-D /opt/huawei/install/data/dn 即-D 备节点的数据目录         -M standby 即模式

输出如下:

[omm@opendb01 ~]$ gs_ctl start -D /opt/huawei/install/data/dn -M standby[2024-10-25 05:11:53.989][8093][][gs_ctl]: gs_ctl started,datadir is /opt/huawei/install/data/dn[2024-10-25 05:11:54.030][8093][][gs_ctl]: waiting for server to start....0 LOG:  [Alarm Module]can not read GAUSS_WARNING_TYPE env.
0 LOG: [Alarm Module]Host Name: opendb01
0 LOG: [Alarm Module]Host IP: opendb01. Copy hostname directly in case of taking 10s to use 'gethostbyname' when /etc/hosts does not contain <HOST IP>
0 LOG: [Alarm Module]Cluster Name: cluster_dxj
0 LOG: [Alarm Module]Invalid data in AlarmItem file! Read alarm English name failed! line: 58
0 WARNING: failed to open feature control file, please check whether it exists: FileName=gaussdb.version, Errno=2, Errmessage=No such file or directory.0 WARNING: failed to parse feature control file: gaussdb.version.0 WARNING: Failed to load the product control file, so gaussdb cannot distinguish product version.2024-10-25 05:11:54.111 671ab81a.1 [unknown] 140462587550400 [unknown] 0 dn_6001_6002 00000 0 [BACKEND] LOG: base_page_saved_interval is 400, ori is 400.2024-10-25 05:11:54.116 671ab81a.1 [unknown] 140462587550400 [unknown] 0 dn_6001_6002 DB010 0 [REDO] LOG: Recovery parallelism, cpu count = 1, max = 4, actual = 12024-10-25 05:11:54.116 671ab81a.1 [unknown] 140462587550400 [unknown] 0 dn_6001_6002 DB010 0 [REDO] LOG: ConfigRecoveryParallelism, true_max_recovery_parallelism:4, max_recovery_parallelism:42024-10-25 05:11:54.123 671ab81a.1 [unknown] 140462587550400 [unknown] 0 dn_6001_6002 00000 0 [BACKEND] LOG: [Alarm Module]can not read GAUSS_WARNING_TYPE env.
2024-10-25 05:11:54.123 671ab81a.1 [unknown] 140462587550400 [unknown] 0 dn_6001_6002 00000 0 [BACKEND] LOG: [Alarm Module]Host Name: opendb01
2024-10-25 05:11:54.123 671ab81a.1 [unknown] 140462587550400 [unknown] 0 dn_6001_6002 00000 0 [BACKEND] LOG: [Alarm Module]Host IP: opendb01. Copy hostname directly in case oftaking 10s to use 'gethostbyname' when /etc/hosts does not contain <HOST IP>
2024-10-25 05:11:54.123 671ab81a.1 [unknown] 140462587550400 [unknown] 0 dn_6001_6002 00000 0 [BACKEND] LOG: [Alarm Module]Cluster Name: cluster_dxj
2024-10-25 05:11:54.123 671ab81a.1 [unknown] 140462587550400 [unknown] 0 dn_6001_6002 00000 0 [BACKEND] LOG: [Alarm Module]Invalid data in AlarmItem file! Read alarm English name failed! line: 58
2024-10-25 05:11:54.125 671ab81a.1 [unknown] 140462587550400 [unknown] 0 dn_6001_6002 00000 0 [BACKEND] LOG: loaded library "security_plugin"2024-10-25 05:11:54.128 671ab81a.1 [unknown] 140462587550400 [unknown] 0 dn_6001_6002 01000 0 [BACKEND] WARNING: could not create any HA TCP/IP sockets2024-10-25 05:11:54.130 671ab81a.1 [unknown] 140462587550400 [unknown] 0 dn_6001_6002 00000 0 [BACKEND] LOG: InitNuma numaNodeNum: 1 numa_distribute_mode: none inheritThreadPool: 0.2024-10-25 05:11:54.130 671ab81a.1 [unknown] 140462587550400 [unknown] 0 dn_6001_6002 01000 0 [BACKEND] WARNING: Failed to initialize the memory protect for g_instance.attr.attr_storage.cstore_buffers (1024 Mbytes) or shared memory (3630 Mbytes) is larger.2024-10-25 05:11:54.192 671ab81a.1 [unknown] 140462587550400 [unknown] 0 dn_6001_6002 00000 0 [CACHE] LOG: set data cache size(805306368)2024-10-25 05:11:54.532 671ab81a.1 [unknown] 140462587550400 [unknown] 0 dn_6001_6002 00000 0 [SEGMENT_PAGE] LOG: Segment-page constants: DF_MAP_SIZE: 8156, DF_MAP_BIT_CNT: 65248, DF_MAP_GROUP_EXTENTS: 4175872, IPBLOCK_SIZE: 8168, EXTENTS_PER_IPBLOCK: 1021, IPBLOCK_GROUP_SIZE: 4090, BMT_HEADER_LEVEL0_TOTAL_PAGES: 8323072, BktMapEntryNumberPerBlock: 2038, BktMapBlockNumber: 25, BktBitMaxMapCnt: 5122024-10-25 05:11:54.571 671ab81a.1 [unknown] 140462587550400 [unknown] 0 dn_6001_6002 00000 0 [BACKEND] LOG: gaussdb: fsync file "/opt/huawei/install/data/dn/gaussdb.state.temp" success2024-10-25 05:11:54.571 671ab81a.1 [unknown] 140462587550400 [unknown] 0 dn_6001_6002 00000 0 [BACKEND] LOG: create gaussdb state file success: db state(STARTING_STATE), server mode(Standby), connection index(1)2024-10-25 05:11:54.572 671ab81a.1 [unknown] 140462587550400 [unknown] 0 dn_6001_6002 00000 0 [BACKEND] LOG: max_safe_fds = 974, usable_fds = 1000, already_open = 16
[2024-10-25 05:11:55.037][8093][][gs_ctl]: done[2024-10-25 05:11:55.037][8093][][gs_ctl]: server started (/opt/huawei/install/data/dn)[omm@opendb01 ~]$

保存数据库主备机器信息

任一节点操作即可,会动态地保存所有节点机器信息

su - ommgs_om -t refreshconf

输出如下:

[omm@opendb01 ~]$ gs_om -t refreshconfGenerating dynamic configuration file for all nodes.Successfully generated dynamic configuration file.

查看主备情况

任一节点操作均可,确认实例状态恢复,现在161为主,160为备

su - ommgs_om -t status --detail

输出如下:

[omm@opendb01 ~]$ gs_om -t status --detail[   Cluster State   ]
cluster_state : Normalredistributing : Nocurrent_az : AZ_ALL
[ Datanode State ]
node node_ip port instance state-----------------------------------------------------------------------------------------------1 opendb01 192.168.40.160 5432 6001 /opt/huawei/install/data/dn P Primary Normal2  opendb02 192.168.40.161  5432       6002 /opt/huawei/install/data/dn   S Standby Normal

点击阅读原文跳转作者文章

openGauss
开源关系型数据库
 最新文章