1. 简介
openGauss6.0.0 企业版是 openGauss 团队于 2024-09-30 发布的长期支持版本,生命周期是 3 年。该版本有个明显的变化是,可以使用普通用户来安装主备集群,方便没有 root 权限的用户也能顺利的装上数据库,本次安装不涉及 cm 组件。(om 完全支持非 root 安装,但是涉及 cm 组件,可能修改下文件句柄数>=640000)
特性说明:https://docs-opengauss.osinfra.cn/zh/docs/latest/docs/AboutopenGauss/%E6%95%B0%E6%8D%AE%E5%BA%93%E5%AE%89%E8%A3%85%E6%B5%81%E7%A8%8B%E8%A7%A3%E9%99%A4%E5%AF%B9root%E7%94%A8%E6%88%B7%E7%9A%84%E4%BE%9D%E8%B5%96.html
2. 使用前提
确保主备节点普通用户存在,没有的话,提前创建用户
支持 python3 版本在 3.6~3.10
需要下载的软件:libaio-devel、readline-devel、expect
yum install -y libaio-devel readline-devel expect
关闭防火墙或者保证要搭建集群配置的数据库的端口已经打开了(port,port+1,port+4, port+5,22 这些端口都要打开)
systemctl disable firewalld.service
systemctl stop firewalld.service
关闭机器大页内存
echo "never" > /sys/kernel/mm/transparent_hugepage/enabled
主备机器的时间一致
3. om安装数据库原理
4. 安装步骤
4.1. 查询机器系统
去 openGauss 官网下载对应系统的安装包,我这边的机器系统是 openEuler 20.03 LTS x86,下面我就以这个为例。
uname -a
Linux yc-0003 4.19.90-2003.4.0.0036.oe1.x86_64 #1 SMP Mon Mar 23 19:10:41 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
4.2. 下载安装包,解压
wget https://opengauss.obs.cn-south-1.myhuaweicloud.com/6.0.0/openEuler20.03/x86/openGauss-All-6.0.0-openEuler20.03-x86_64.tar.gz
tar -xf openGauss-All-6.0.0-openEuler20.03-x86_64.tar.gz
tar -xf openGauss-OM-6.0.0-openEuler20.03-x86_64.tar.gz
设置安装包的权限为 755,同时属主设置为当前用户,比如:我的安装包放在了/data/lh/600 这个目录
chmod -R 755 /data/lh/600 && chown -R lh:lh /data/lh/600
解压完 om,当前目录如下
ll
total 471M
-rw-r----- 1 lh lh 44 Oct 31 11:40 hosts
drwx------ 20 lh lh 4.0K Oct 31 11:40 lib
-rw------- 1 lh lh 149M Sep 29 22:27 openGauss-All-6.0.0-openEuler20.03-x86_64.tar.gz
-rw------- 1 lh lh 0 Sep 29 18:59 openGauss-CM-6.0.0-openEuler20.03-x86_64.sha256
-rw------- 1 lh lh 22M Sep 29 18:59 openGauss-CM-6.0.0-openEuler20.03-x86_64.tar.gz
-rw------- 1 lh lh 65 Sep 29 18:57 openGauss-OM-6.0.0-openEuler20.03-x86_64.sha256
-rw------- 1 lh lh 23M Sep 29 18:57 openGauss-OM-6.0.0-openEuler20.03-x86_64.tar.gz
-rw------- 1 lh lh 173M Oct 31 11:41 openGauss-Package-bak_aee4abd5.tar.gz
-rw------- 1 lh lh 65 Sep 29 18:59 openGauss-Server-6.0.0-openEuler20.03-x86_64.sha256
-rw------- 1 lh lh 105M Sep 29 18:59 openGauss-Server-6.0.0-openEuler20.03-x86_64.tar.bz2
drwx------ 11 lh lh 4.0K Sep 29 18:57 script
-rw------- 1 lh lh 65 Sep 29 18:56 upgrade_sql.sha256
-rw------- 1 lh lh 552K Sep 29 18:56 upgrade_sql.tar.gz
-rw-r----- 1 lh lh 47 Sep 29 18:56 version.cfg
4.3. 准备 xml 文件
我这里是一主一备,需要注意的是:配置的目录,子用户有权限操作
<?xml version="1.0" encoding="UTF-8"?>
<ROOT>
<CLUSTER>
<PARAM name="clusterName" value="opengauss" />
<PARAM name="nodeNames" value="yc-0003,yc-0002" />
<PARAM name="gaussdbAppPath" value="/data/lh/openGauss/app" />
<PARAM name="gaussdbLogPath" value="/data/lh/openGauss/log/omm" />
<PARAM name="tmpMppdbPath" value="/data/lh/openGauss/tmp" />
<PARAM name="gaussdbToolPath" value="/data/lh/openGauss/om" />
<PARAM name="corePath" value="/data/lh/openGauss/corefile" />
<PARAM name="backIp1s" value="192.168.0.141,192.168.0.176"/>
</CLUSTER>
<DEVICELIST>
<DEVICE sn="100003">
<PARAM name="name" value="yc-0003"/>
<PARAM name="azName" value="AZ1"/>
<PARAM name="azPriority" value="1"/>
<PARAM name="backIp1" value="192.168.0.141"/>
<PARAM name="sshIp1" value="192.168.0.141"/>
<PARAM name="dataNum" value="1"/>
<PARAM name="dataPortBase" value="11000"/>
<PARAM name="dataNode1" value="/data/lh/openGauss/data/dn1,yc-0002,/data/lh/openGauss/data/dn1" />
<PARAM name="dataNode1_syncNum" value="0"/>
</DEVICE>
<DEVICE sn="100002">
<PARAM name="name" value="yc-0002"/>
<PARAM name="azName" value="AZ1"/>
<PARAM name="azPriority" value="1"/>
<PARAM name="backIp1" value="192.168.0.176"/>
<PARAM name="sshIp1" value="192.168.0.176"/>
</DEVICE>
</DEVICELIST>
</ROOT>
4.4. 执行预安装
./gs_preinstall -U lh -G lh -X /data/lh/om_xml/ins2.xml --sep-env-file=/data/lh/env/env1 ## --sep-env-file 这是环境分离的参数
控制台输出如下:
Parsing the configuration file.
Successfully parsed the configuration file.
Installing the tools on the local node.
Successfully installed the tools on the local node.
Creating SSH trust for [lh] user.
Please enter password for current user[lh].
Password:
Checking network information.
All nodes in the network are Normal.
Successfully checked network information.
Creating SSH trust.
Creating the local key file.
Successfully created the local key files.
Appending local ID to authorized_keys.
Successfully appended local ID to authorized_keys.
Updating the known_hosts file.
Successfully updated the known_hosts file.
Appending authorized_key on the remote node.
Successfully appended authorized_key on all remote node.
Checking common authentication file content.
Successfully checked common authentication content.
Distributing SSH trust file to all node.
Distributing trust keys file to all node successfully.
Successfully distributed SSH trust file to all node.
Verifying SSH trust on all hosts.
Verifying SSH trust on all hosts by ip.
Successfully verified SSH trust on all hosts by ip.
Successfully verified SSH trust on all hosts.
Start set cron for lh
Successfully to set cron for lh
Successfully created SSH trust.
Successfully created SSH trust for [lh] user.
Setting host ip env
Successfully set host ip env.
Distributing package.
Begin to distribute package to tool path.
Successfully distribute package to tool path.
Begin to distribute package to package path.
Successfully distribute package to package path.
Successfully distributed package.
Preparing SSH service.
Successfully prepared SSH service.
Installing the tools in the cluster.
Successfully installed the tools in the cluster.
Checking hostname mapping.
Successfully checked hostname mapping.
Checking OS software.
Successfully check OS software.
Checking OS version.
Successfully checked OS version.
Checking cpu instructions.
Successfully checked cpu instructions.
Creating cluster's path.
Successfully created cluster's path.
Set and check OS parameter.
Set and check OS parameter completed.
Preparing CRON service.
Successfully prepared CRON service.
Setting user environmental variables.
Successfully set user environmental variables.
Setting the dynamic link library.
Successfully set the dynamic link library.
Fixing server package owner.
Setting finish flag.
Successfully set finish flag.
Preinstallation succeeded.
4.5. 执行安装
先 source 下环境变量,如果是环境分离,那么 source 环境分离文件;否则 source ~/.bashrc;执行安装
我这里的话,由于在预安装时用的是环境分离,所以 souce 环境分离文件
source /data/lh/env/env1
gs_install -X /data/lh/om_xml/ins2.xml
安装过程如下:
Parsing the configuration file.
Successfully checked gs_uninstall on every node.
Check preinstall on every node.
Successfully checked preinstall on every node.
Creating the backup directory.
Successfully created the backup directory.
begin deploy..
Installing the cluster.
begin prepare Install Cluster..
Checking the installation environment on all nodes.
begin install Cluster..
Installing applications on all nodes.
Successfully installed APP.
begin init Instance..
encrypt cipher and rand files for database.
Please enter password for database: ## 这里会让输入数据库密码,并且数据库的密码三种不同字符,长度大于等于8
Please repeat for database: ## 确认数据库密码
begin to create CA cert files
The sslcert will be generated in /data/lh/openGauss/app/share/sslcert/om
NO cm_server instance, no need to create CA for CM.
Non-dss_ssl_enable, no need to create CA for DSS
Cluster installation is completed.
Configuring.
Deleting instances from all nodes.
Successfully deleted instances from all nodes.
Checking node configuration on all nodes.
Initializing instances on all nodes.
Updating instance configuration on all nodes.
Check consistence of memCheck and coresCheck on database nodes.
Successfully check consistence of memCheck and coresCheck on all nodes.
Configuring pg_hba on all nodes.
Configuration is completed.
The cluster status is Normal.
Successfully started cluster.
Successfully installed application.
end deploy..
4.6. 查询数据库状态
gs_om -t status --detail
输出如下:
[ Cluster State ]
cluster_state : Normal
redistributing : No
current_az : AZ_ALL
[ Datanode State ]
node node_ip port instance state
----------------------------------------------------------------------------------------------
1 yc-0003 192.168.0.141 11000 6001 /data/lh/openGauss/data/dn1 P Primary Normal
2 yc-0002 192.168.0.176 11000 6002 /data/lh/openGauss/data/dn1 S Standby Normal
## 输出的信息有:hostname ip port 数据库实例id 数据库data目录 节点主备关系 节点是否正常
4.7. 缩容
gs_dropnode -U lh -G lh -h 192.168.0.176
控制台输出
The target node to be dropped is (['yc-0002'])
Do you want to continue to drop the target node (yes/no)?y
The cluster will have only one standalone node left after the operation!
Do you want to continue to drop the target node (yes/no)? y
Drop node start without CM node.
[gs_dropnode]Start to drop nodes of the cluster.
[gs_dropnode]Start to stop the target node yc-0002.
[gs_dropnode]End of stop the target node yc-0002.
[gs_dropnode]Start to backup parameter config file on yc-0003.
[gs_dropnode]End to backup parameter config file on yc-0003.
[gs_dropnode]The backup file of yc-0003 is /data/lh/openGauss/tmp/gs_dropnode_backup20241031143102/parameter_yc-0003.tar
[gs_dropnode]Start to parse parameter config file on yc-0003.
Command for Checking VIP mode: cm_ctl res --list | awk -F "|" '{print $2}' | grep -w ***
The current cluster does not support VIP.
[gs_dropnode]End to parse parameter config file on yc-0003.
[gs_dropnode]Start to parse backup parameter config file on yc-0003.
[gs_dropnode]End to parse backup parameter config file yc-0003.
[gs_dropnode]Start to set openGauss config file on yc-0003.
[gs_dropnode]End of set openGauss config file on yc-0003.
[gs_dropnode]Start of set pg_hba config file on yc-0003.
[gs_dropnode]End of set pg_hba config file on yc-0003.
[gs_dropnode]Start to set repl slot on yc-0003.
[gs_dropnode]Start to get repl slot on yc-0003.
[gs_dropnode]End of set repl slot on yc-0003.
[gs_dropnode]Start to modify the cluster static conf.
[gs_dropnode]End of modify the cluster static conf.
[gs_dropnode]Remove the dynamic conf.
Only one primary node is left. It is recommended to restart the node.
Do you want to restart the primary node now (yes/no)? y
[gs_dropnode]Start to stop the target node yc-0003.
[gs_dropnode]End of stop the target node yc-0003.
[gs_dropnode]Start to start the target node.
2024-10-31 14:31:14.364 67232432.1 [unknown] 140321508899200 [unknown] 0 dn_6001_6002 01000 0 [BACKEND] WARNING: could not create any HA TCP/IP sockets
2024-10-31 14:31:14.364 67232432.1 [unknown] 140321508899200 [unknown] 0 dn_6001_6002 01000 0 [BACKEND] WARNING: could not create any HA TCP/IP sockets
2024-10-31 14:31:14.366 67232432.1 [unknown] 140321508899200 [unknown] 0 dn_6001_6002 01000 0 [BACKEND] WARNING: Failed to initialize the memory protect for g_instance.attr.attr_storage.cstore_buffers (1024 Mbytes) or shared memory (4482 Mbytes) is larger.
[gs_dropnode]Success to drop the target nodes.
4.8. 扩容
gs_expansion -U lh -G lh -h 192.168.0.176 -X /data/lh/om_xml/ins2.xml -L
控制台输出
The cluster no need create ssh trust
Start expansion without cluster manager component.
Database on standby nodes installed finished.
Checking gaussdb and gs_om version.
End to check gaussdb and gs_om version.
Start to establish the relationship.
Start to build standby 192.168.0.176.
Build standby 192.168.0.176 success.
Start to generate and send cluster static file.
End to generate and send cluster static file.
Expansion results:
192.168.0.176: Success
Expansion Finish.
4.9. 卸载数据库
gs_uninstall --delete-data
控制台输出
Checking uninstallation.
Successfully checked uninstallation.
Stopping the cluster.
Successfully stopped the cluster.
Successfully deleted instances.
Uninstalling application.
Successfully uninstalled application.
No need to clear dss disk.
Successfully deleted log.
Uninstallation succeeded.
5. om 安装数据库常见的问题
5.1. ssh互信问题bad interpreter: No such file or directory
经过排查,发现执行的命令是 Cmd:echo "kJG6H*:Wl*nf^uxZOXth*U7ZJ*7I05Sj" | /bin/sh /data/lh/latest/script/./local/sshexkey_encrypt_tool.sh sshkeygen /home/lh/.ssh/id_om /home/lh/.ssh/id_om.pub
报错:Generating mutual trust files\n/data/lh/latest/script/./local/sshexkey_encrypt_tool.sh: /home/lh/gauss_om/script/ssh-keygen: /bin/bash^M: bad interpreter: No such file or directory
sshexkey_encrypt_tool.sh脚本内容
ssh-keygen -t ed25519 -N \"$passwd\" -f ~/.ssh/id_om
意思是:生成一个ssh密钥对,放到id_om id_om.pub
解决思路:查看相关的文件格式是不是不对
CRLF是windows的格式
再次执行,发现还有其他文件格式不对
到上面出错的目录下:/home/lh/gauss_om,将所有涉及 ssh 相关的文件进行格式化
5.2. 预读块问题
问题原因: checkos校验的预读块预期值和实际值不一样,可以手动修改下解决(并且只能手动修改)
解决办法:
先查询当前机器的磁盘中的io调度器,将上面错误盘的io调度器修改
/sys/block/%s/queue/scheduler
echo “mq-deadline” > /sys/block/%s/queue/scheduler
%s 代表:上面示例中的sdj sdk
5.3. 大页内存没关
解决方法:
echo "never" > /sys/kernel/mm/transparent_hugepage/enabled
5.4. 创建互信,Failed to append local ID to authorized_keys on remote node
原理:实际执行的是mkdir .ssh touch .ssh/authorized_keys touch .ssh/known_hosts
等等这些文件,创建文件后,执行echo \"%s #OM\" >> .ssh/authorized_keys && echo ok ok ok % localID
排查:单独执行命令,发现磁盘满了
清理磁盘即可
5.5. 互信问题got xxx expected xxx
原因:
这是因为之前用户建过互信,现在重新建互信,发现两次的字符串不匹配
解决:
将之前的互信删除,重新建立
点击阅读原文跳转作者文章