openGauss6.0.0企业版使用普通用户搭建一主一备

文摘   2024-11-20 17:30   中国香港  

1. 简介

openGauss6.0.0 企业版是 openGauss 团队于 2024-09-30 发布的长期支持版本,生命周期是 3 年。该版本有个明显的变化是,可以使用普通用户来安装主备集群,方便没有 root 权限的用户也能顺利的装上数据库,本次安装不涉及 cm 组件。(om 完全支持非 root 安装,但是涉及 cm 组件,可能修改下文件句柄数>=640000)

特性说明:https://docs-opengauss.osinfra.cn/zh/docs/latest/docs/AboutopenGauss/%E6%95%B0%E6%8D%AE%E5%BA%93%E5%AE%89%E8%A3%85%E6%B5%81%E7%A8%8B%E8%A7%A3%E9%99%A4%E5%AF%B9root%E7%94%A8%E6%88%B7%E7%9A%84%E4%BE%9D%E8%B5%96.html

2. 使用前提

  • 确保主备节点普通用户存在,没有的话,提前创建用户

  • 支持 python3 版本在 3.6~3.10

  • 需要下载的软件:libaio-devel、readline-devel、expect

yum install -y libaio-devel readline-devel expect
  • 关闭防火墙或者保证要搭建集群配置的数据库的端口已经打开了(port,port+1,port+4, port+5,22 这些端口都要打开)

systemctl disable firewalld.servicesystemctl stop firewalld.service
  • 关闭机器大页内存

echo "never" > /sys/kernel/mm/transparent_hugepage/enabled
  • 主备机器的时间一致

3. om安装数据库原理

4. 安装步骤

4.1. 查询机器系统

去 openGauss 官网下载对应系统的安装包,我这边的机器系统是 openEuler 20.03 LTS x86,下面我就以这个为例。

uname -aLinux yc-0003 4.19.90-2003.4.0.0036.oe1.x86_64 #1 SMP Mon Mar 23 19:10:41 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux

4.2. 下载安装包,解压

wget https://opengauss.obs.cn-south-1.myhuaweicloud.com/6.0.0/openEuler20.03/x86/openGauss-All-6.0.0-openEuler20.03-x86_64.tar.gztar -xf openGauss-All-6.0.0-openEuler20.03-x86_64.tar.gztar -xf openGauss-OM-6.0.0-openEuler20.03-x86_64.tar.gz

设置安装包的权限为 755,同时属主设置为当前用户,比如:我的安装包放在了/data/lh/600 这个目录

chmod -R 755 /data/lh/600 && chown -R lh:lh /data/lh/600

解压完 om,当前目录如下

lltotal 471M-rw-r-----  1 lh lh   44 Oct 31 11:40 hostsdrwx------ 20 lh lh 4.0K Oct 31 11:40 lib-rw-------  1 lh lh 149M Sep 29 22:27 openGauss-All-6.0.0-openEuler20.03-x86_64.tar.gz-rw-------  1 lh lh    0 Sep 29 18:59 openGauss-CM-6.0.0-openEuler20.03-x86_64.sha256-rw-------  1 lh lh  22M Sep 29 18:59 openGauss-CM-6.0.0-openEuler20.03-x86_64.tar.gz-rw-------  1 lh lh   65 Sep 29 18:57 openGauss-OM-6.0.0-openEuler20.03-x86_64.sha256-rw-------  1 lh lh  23M Sep 29 18:57 openGauss-OM-6.0.0-openEuler20.03-x86_64.tar.gz-rw-------  1 lh lh 173M Oct 31 11:41 openGauss-Package-bak_aee4abd5.tar.gz-rw-------  1 lh lh   65 Sep 29 18:59 openGauss-Server-6.0.0-openEuler20.03-x86_64.sha256-rw-------  1 lh lh 105M Sep 29 18:59 openGauss-Server-6.0.0-openEuler20.03-x86_64.tar.bz2drwx------ 11 lh lh 4.0K Sep 29 18:57 script-rw-------  1 lh lh   65 Sep 29 18:56 upgrade_sql.sha256-rw-------  1 lh lh 552K Sep 29 18:56 upgrade_sql.tar.gz-rw-r-----  1 lh lh   47 Sep 29 18:56 version.cfg

4.3. 准备 xml 文件

我这里是一主一备,需要注意的是:配置的目录,子用户有权限操作

<?xml version="1.0" encoding="UTF-8"?><ROOT>    <CLUSTER>        <PARAM name="clusterName" value="opengauss" />        <PARAM name="nodeNames" value="yc-0003,yc-0002" />        <PARAM name="gaussdbAppPath" value="/data/lh/openGauss/app" />        <PARAM name="gaussdbLogPath" value="/data/lh/openGauss/log/omm" />        <PARAM name="tmpMppdbPath" value="/data/lh/openGauss/tmp" />        <PARAM name="gaussdbToolPath" value="/data/lh/openGauss/om" />        <PARAM name="corePath" value="/data/lh/openGauss/corefile" />        <PARAM name="backIp1s" value="192.168.0.141,192.168.0.176"/>    </CLUSTER>    <DEVICELIST>        <DEVICE sn="100003">            <PARAM name="name" value="yc-0003"/>            <PARAM name="azName" value="AZ1"/>            <PARAM name="azPriority" value="1"/>            <PARAM name="backIp1" value="192.168.0.141"/>            <PARAM name="sshIp1" value="192.168.0.141"/>            <PARAM name="dataNum" value="1"/>            <PARAM name="dataPortBase" value="11000"/>            <PARAM name="dataNode1" value="/data/lh/openGauss/data/dn1,yc-0002,/data/lh/openGauss/data/dn1" />            <PARAM name="dataNode1_syncNum" value="0"/>        </DEVICE>        <DEVICE sn="100002">            <PARAM name="name" value="yc-0002"/>            <PARAM name="azName" value="AZ1"/>            <PARAM name="azPriority" value="1"/>            <PARAM name="backIp1" value="192.168.0.176"/>            <PARAM name="sshIp1" value="192.168.0.176"/>        </DEVICE>    </DEVICELIST></ROOT>

4.4. 执行预安装

./gs_preinstall -U lh -G lh -X /data/lh/om_xml/ins2.xml --sep-env-file=/data/lh/env/env1         ## --sep-env-file 这是环境分离的参数

控制台输出如下:

Parsing the configuration file.Successfully parsed the configuration file.Installing the tools on the local node.Successfully installed the tools on the local node.Creating SSH trust for [lh] user.Please enter password for current user[lh].Password: Checking network information.All nodes in the network are Normal.Successfully checked network information.Creating SSH trust.Creating the local key file.Successfully created the local key files.Appending local ID to authorized_keys.Successfully appended local ID to authorized_keys.Updating the known_hosts file.Successfully updated the known_hosts file.Appending authorized_key on the remote node.Successfully appended authorized_key on all remote node.Checking common authentication file content.Successfully checked common authentication content.Distributing SSH trust file to all node.Distributing trust keys file to all node successfully.Successfully distributed SSH trust file to all node.Verifying SSH trust on all hosts.Verifying SSH trust on all hosts by ip.Successfully verified SSH trust on all hosts by ip.Successfully verified SSH trust on all hosts.Start set cron for lhSuccessfully to set cron for lhSuccessfully created SSH trust.Successfully created SSH trust for [lh] user.Setting host ip envSuccessfully set host ip env.Distributing package.Begin to distribute package to tool path.Successfully distribute package to tool path.Begin to distribute package to package path.Successfully distribute package to package path.Successfully distributed package.Preparing SSH service.Successfully prepared SSH service.Installing the tools in the cluster.Successfully installed the tools in the cluster.Checking hostname mapping.Successfully checked hostname mapping.Checking OS software.Successfully check OS software.Checking OS version.Successfully checked OS version.Checking cpu instructions.Successfully checked cpu instructions.Creating cluster's path.Successfully created cluster's path.Set and check OS parameter.Set and check OS parameter completed.Preparing CRON service.Successfully prepared CRON service.Setting user environmental variables.Successfully set user environmental variables.Setting the dynamic link library.Successfully set the dynamic link library.Fixing server package owner.Setting finish flag.Successfully set finish flag.Preinstallation succeeded.

4.5. 执行安装

先 source 下环境变量,如果是环境分离,那么 source 环境分离文件;否则 source ~/.bashrc;执行安装

我这里的话,由于在预安装时用的是环境分离,所以 souce 环境分离文件

source /data/lh/env/env1gs_install -X /data/lh/om_xml/ins2.xml

安装过程如下:

Parsing the configuration file.Successfully checked gs_uninstall on every node.Check preinstall on every node.Successfully checked preinstall on every node.Creating the backup directory.Successfully created the backup directory.begin deploy..Installing the cluster.begin prepare Install Cluster..Checking the installation environment on all nodes.begin install Cluster..Installing applications on all nodes.Successfully installed APP.begin init Instance..encrypt cipher and rand files for database.Please enter password for database:             ##  这里会让输入数据库密码,并且数据库的密码三种不同字符,长度大于等于8Please repeat for database:                     ##  确认数据库密码begin to create CA cert filesThe sslcert will be generated in /data/lh/openGauss/app/share/sslcert/omNO cm_server instance, no need to create CA for CM.Non-dss_ssl_enable, no need to create CA for DSSCluster installation is completed.Configuring.Deleting instances from all nodes.Successfully deleted instances from all nodes.Checking node configuration on all nodes.Initializing instances on all nodes.Updating instance configuration on all nodes.Check consistence of memCheck and coresCheck on database nodes.Successfully check consistence of memCheck and coresCheck on all nodes.Configuring pg_hba on all nodes.Configuration is completed.The cluster status is Normal.Successfully started cluster.Successfully installed application.end deploy..

4.6. 查询数据库状态

gs_om -t status --detail

输出如下:

[   Cluster State   ]
cluster_state : Normalredistributing : Nocurrent_az : AZ_ALL
[ Datanode State ]
node node_ip port instance state----------------------------------------------------------------------------------------------1 yc-0003 192.168.0.141 11000 6001 /data/lh/openGauss/data/dn1 P Primary Normal2 yc-0002 192.168.0.176 11000 6002 /data/lh/openGauss/data/dn1 S Standby Normal

##  输出的信息有:hostname ip   port  数据库实例id    数据库data目录     节点主备关系   节点是否正常

4.7. 缩容

gs_dropnode -U lh -G lh -h 192.168.0.176

控制台输出

The target node to be dropped is (['yc-0002']) Do you want to continue to drop the target node (yes/no)?yThe cluster will have only one standalone node left after the operation!Do you want to continue to drop the target node (yes/no)? yDrop node start without CM node.[gs_dropnode]Start to drop nodes of the cluster.[gs_dropnode]Start to stop the target node yc-0002.[gs_dropnode]End of stop the target node yc-0002.[gs_dropnode]Start to backup parameter config file on yc-0003.[gs_dropnode]End to backup parameter config file on yc-0003.[gs_dropnode]The backup file of yc-0003 is /data/lh/openGauss/tmp/gs_dropnode_backup20241031143102/parameter_yc-0003.tar[gs_dropnode]Start to parse parameter config file on yc-0003.Command for Checking VIP mode: cm_ctl res --list | awk -F "|" '{print $2}' | grep -w *** The current cluster does not support VIP.[gs_dropnode]End to parse parameter config file on yc-0003.[gs_dropnode]Start to parse backup parameter config file on yc-0003.[gs_dropnode]End to parse backup parameter config file yc-0003.[gs_dropnode]Start to set openGauss config file on yc-0003.[gs_dropnode]End of set openGauss config file on yc-0003.[gs_dropnode]Start of set pg_hba config file on yc-0003.[gs_dropnode]End of set pg_hba config file on yc-0003.[gs_dropnode]Start to set repl slot on yc-0003.[gs_dropnode]Start to get repl slot on yc-0003.[gs_dropnode]End of set repl slot on yc-0003.[gs_dropnode]Start to modify the cluster static conf.[gs_dropnode]End of modify the cluster static conf.[gs_dropnode]Remove the dynamic conf.Only one primary node is left. It is recommended to restart the node.Do you want to restart the primary node now (yes/no)? y[gs_dropnode]Start to stop the target node yc-0003.[gs_dropnode]End of stop the target node yc-0003.[gs_dropnode]Start to start the target node.2024-10-31 14:31:14.364 67232432.1 [unknown] 140321508899200 [unknown] 0 dn_6001_6002 01000  0 [BACKEND] WARNING:  could not create any HA TCP/IP sockets2024-10-31 14:31:14.364 67232432.1 [unknown] 140321508899200 [unknown] 0 dn_6001_6002 01000  0 [BACKEND] WARNING:  could not create any HA TCP/IP sockets2024-10-31 14:31:14.366 67232432.1 [unknown] 140321508899200 [unknown] 0 dn_6001_6002 01000  0 [BACKEND] WARNING:  Failed to initialize the memory protect for g_instance.attr.attr_storage.cstore_buffers (1024 Mbytes) or shared memory (4482 Mbytes) is larger.[gs_dropnode]Success to drop the target nodes.

4.8. 扩容

gs_expansion -U lh -G lh -h 192.168.0.176 -X /data/lh/om_xml/ins2.xml -L

控制台输出

The cluster no need create ssh trustStart expansion without cluster manager component.Database on standby nodes installed finished.Checking gaussdb and gs_om version.End to check gaussdb and gs_om version.Start to establish the relationship.Start to build standby 192.168.0.176.Build standby 192.168.0.176 success.Start to generate and send cluster static file.End to generate and send cluster static file.Expansion results:192.168.0.176:  SuccessExpansion Finish.

4.9. 卸载数据库

gs_uninstall --delete-data

控制台输出

Checking uninstallation.Successfully checked uninstallation.Stopping the cluster.Successfully stopped the cluster.Successfully deleted instances.Uninstalling application.Successfully uninstalled application.No need to clear dss disk.Successfully deleted log.Uninstallation succeeded.

5. om 安装数据库常见的问题

5.1. ssh互信问题bad interpreter: No such file or directory

经过排查,发现执行的命令是 Cmd:echo "kJG6H*:Wl*nf^uxZOXth*U7ZJ*7I05Sj" | /bin/sh /data/lh/latest/script/./local/sshexkey_encrypt_tool.sh sshkeygen /home/lh/.ssh/id_om /home/lh/.ssh/id_om.pub

报错:Generating mutual trust files\n/data/lh/latest/script/./local/sshexkey_encrypt_tool.sh: /home/lh/gauss_om/script/ssh-keygen: /bin/bash^M: bad interpreter: No such file or directory

sshexkey_encrypt_tool.sh脚本内容

ssh-keygen -t ed25519 -N \"$passwd\" -f ~/.ssh/id_om

意思是:生成一个ssh密钥对,放到id_om id_om.pub

解决思路:查看相关的文件格式是不是不对

CRLF是windows的格式

再次执行,发现还有其他文件格式不对

到上面出错的目录下:/home/lh/gauss_om,将所有涉及 ssh 相关的文件进行格式化

5.2. 预读块问题

问题原因: checkos校验的预读块预期值和实际值不一样,可以手动修改下解决(并且只能手动修改)

解决办法:

先查询当前机器的磁盘中的io调度器,将上面错误盘的io调度器修改

/sys/block/%s/queue/scheduler

echo “mq-deadline” > /sys/block/%s/queue/scheduler

%s 代表:上面示例中的sdj sdk

5.3. 大页内存没关

解决方法:

echo "never" > /sys/kernel/mm/transparent_hugepage/enabled

5.4. 创建互信,Failed to append local ID to authorized_keys on remote node

原理:实际执行的是mkdir .ssh touch .ssh/authorized_keys touch .ssh/known_hosts

等等这些文件,创建文件后,执行echo \"%s #OM\" >> .ssh/authorized_keys && echo ok ok ok % localID

排查:单独执行命令,发现磁盘满了

清理磁盘即可

5.5. 互信问题got xxx expected xxx

原因:

这是因为之前用户建过互信,现在重新建互信,发现两次的字符串不匹配

解决:

将之前的互信删除,重新建立


点击阅读原文跳转作者文章

openGauss
开源关系型数据库
 最新文章