Milvus 是一款高性能、高度可扩展的开源向量数据库,能够存储、索引和搜索由非结构化数据转化而来的高维 Embedding 向量。Milvus 适用于构建现代 AI 应用,如检索增强生成(RAG)、语义搜索、多模态搜索和推荐系统。从笔记本电脑到大规模分布式系统,Milvus 能够在各种环境中高效运行。您可以使用开源的 Milvus 或者全托管的 Milvus 服务(Zilliz Cloud)。
Milvus Backup 是一个用于备份和恢复 Milvus 数据的工具。它同时提供 CLI 和 API ,以适应不同的应用场景。本教程将手把手带您使用 Milvus Backup ,帮助您学会如何处理您的备份需求。
01.
准备工作在开始备份和恢复前,需要先设置环境:
macOS:
milvus-backup_Darwin_arm64.tar.gz
或milvus-backup_Darwin_x86_64.tar.gz
Linux:
milvus-backup_Linux_arm64.tar.gz
或milvus-backup_Linux_x86_64.tar.gz
2. 从 GitHub 中下载配置文件。
3. 将 tar 文件提取到您偏好的文件夹中,并将 backup.yaml
文件存放于同一个文件夹的 configs/
路径下。请确保路径结构如下:
├── configs
│ └── backup.yaml
├── milvus-backup
└── README.md
02.
命令概览打开终端并熟悉备份工具的命令:
milvus-backup help
查看通用命令(Command)和标志(Flag)。milvus-backup is a backup&restore tool for milvus.
Usage:
milvus-backup [flags]
milvus-backup [command]
Available Commands:
check check if the connects is right.
create create subcommand create a backup.
delete delete subcommand delete backup by name.
get get subcommand get backup by name.
help Help about any command
list list subcommand shows all backup in the cluster.
restore restore subcommand restore a backup.
server server subcommand start milvus-backup RESTAPI server.
Flags:
--config string config YAML file of milvus (default "backup.yaml")
-h, --help help for milvus-backup
Use "milvus-backup [command] --help" for more information about a command.
milvus-backup create --help
获取创建备份的命令帮助信息。Usage:
milvus-backup create [flags]
Flags:
-n, --name string backup name, if unset will generate a name automatically
-c, --colls string collectionNames to backup, use ',' to connect multiple collections
-d, --databases string databases to backup
-a, --database_collections string databases and collections to backup, json format: {"db1":["c1", "c2"],"db2":[]}
-f, --force force backup, will skip flush, should make sure data has been stored into disk when using it
--meta_only only backup collection meta instead of data
-h, --help help for create
milvus-backup restore --help
获取相关帮助信息。Usage:
milvus-backup restore [flags]
Flags:
-n, --name string backup name to restore
-c, --collections string collectionNames to restore
-s, --suffix string add a suffix to collection name to restore
-r, --rename string rename collections to new names, format: db1.collection1:db2.collection1_new,db1.collection2:db2.collection2_new
-d, --databases string databases to restore, if not set, restore all databases
-a, --database_collections string databases and collections to restore, json format: {"db1":["c1", "c2"],"db2":[]}
--meta_only if true, restore meta only
--restore_index if true, restore index
--use_auto_index if true, replace vector index with autoindex
--drop_exist_collection if true, drop existing target collection before create
--drop_exist_index if true, drop existing index of target collection before create
--skip_create_collection if true, will skip collection, use when collection exist, restore index or data
-h, --help help for restore
03.
备份/恢复用例milvus-backup 工具的应用较为广泛,可以在多种用例中使用,具体使用场景取决于您的需求和配置:
1. 在同一个 Milvus 实例内进行备份和恢复:在同一 Milvus 实例内将 Collection 复制为一个新的 Collection。
2. 在共用同一个 S3 Bucket 的两个 Milvus 实例之间进行备份和恢复:在使用不同根路径但使用相同 S3 Bucket 的 Milvus 实例之间迁移 Collection。
3. 在同一个 S3 服务不同 Bucket 的两个 Milvus 实例之间进行备份和恢复:在同一 S3 服务内的不同 S3 Bucket 之间迁移 Collection。
4. 在不同 S3 服务的两个 Milvus 实例之间进行备份和恢复:在使用不同 S3 服务的 Milvus 实例之间复制 Collection。
让我们详细探讨每个用例。
用例 1:在同一个 Milvus 实例内进行备份和恢复
在同一个 Milvus 实例内针对一个 Collection 进行备份和恢复。假设我们需要为名称为 “coll” 的 Collection 创建备份并将其恢复为名称为 “coll_bak” 的 Collection。这个过程中,我们使用同一个 S3 Bucket。
配置:
Milvus 使用
bucket_A
作为存储MinIO 配置:
minio:
address: localhost # Address of MinIO/S3
port: 9000 # Port of MinIO/S3
accessKeyID: minioadmin # accessKeyID of MinIO/S3
secretAccessKey: minioadmin # MinIO/S3 encryption string
useSSL: false # Access to MinIO/S3 with SSL
ssl:
tlsCACert: /path/to/public.crt # path to your CACert file, ignore when it is empty
bucketName: bucket_A # Bucket name in MinIO/S3
rootPath: files # The root path where the message is stored in MinIO/S3
备份和恢复流程
backup.yaml
文件将 Milvus 和 MinIO 指向正确的位置。minio:
# cloudProvider: "minio" # deprecated use storageType instead
storageType: "minio" # support storage type: local, minio, s3, aws, gcp, ali(aliyun), azure, tc(tencent)
address: localhost # Address of MinIO/S3
port: 9000 # Port of MinIO/S3
accessKeyID: minioadmin # accessKeyID of MinIO/S3
secretAccessKey: minioadmin # MinIO/S3 encryption string
useSSL: false # Access to MinIO/S3 with SSL
useIAM: false
iamEndpoint: ""
bucketName: "bucket_A" # Milvus Bucket name in MinIO/S3, make it the same as your milvus instance
rootPath: "files" # Milvus storage root path in MinIO/S3, make it the same as your milvus instance
# only for azure
backupAccessKeyID: minioadmin # accessKeyID of MinIO/S3
backupSecretAccessKey: minioadmin # MinIO/S3 encryption string
backupBucketName: "bucket_A" # Bucket name to store backup data. Backup data will store to backupBucketName/backupRootPath
backupRootPath: "backup" # Rootpath to store backup data. Backup data will store to backupBucketName/backupRootPath
./milvus-backup create -c coll -n my_backup
上述命令将备份文件存储在 bucket_A/backup/my_backup
中。
./milvus-backup restore -c coll -n my_backup -s _bak
上述命令将 Collection “coll” 恢复为同一个 Milvus 实例中的新 Collection “coll_bak”。
图1: 在同一个 Milvus 实例内进行备份和恢复的流程
用例 2:在共用同一个 S3 Bucket 的两个 Milvus 实例之间进行备份和恢复
从一个 Milvus 实例备份一个 Collection,使用相同的 S3 桶但不同的根路径将其恢复到另一个实例。假设在 milvus_A 中有一个名为 “coll” 的 Collection,我们将其备份并恢复到 milvus_B 中名为 “coll_bak” 的新 Collection 中。这两个 Milvus 实例共享同一个 S3 Bucket “bucket_A”,但这两个 Milvus 实例的根路径不同。
配置
Milvus A 使用
files_A
作为根路径Milvus B 使用
files_B
作为根路径Milvus A 的 MinIO 配置:
minio:
address: localhost # Address of MinIO/S3
port: 9000 # Port of MinIO/S3
accessKeyID: minioadmin # accessKeyID of MinIO/S3
secretAccessKey: minioadmin # MinIO/S3 encryption string
useSSL: false # Access to MinIO/S3 with SSL
ssl:
tlsCACert: /path/to/public.crt # path to your CACert file, ignore when it is empty
bucketName: bucket_A # Bucket name in MinIO/S3
rootPath: files_A # The root path where the message is stored in MinIO/S3
Milvus B 的 MinIO 配置:
minio:
address: localhost # Address of MinIO/S3
port: 9000 # Port of MinIO/S3
accessKeyID: minioadmin # accessKeyID of MinIO/S3
secretAccessKey: minioadmin # MinIO/S3 encryption string
useSSL: false # Access to MinIO/S3 with SSL
ssl:
tlsCACert: /path/to/public.crt # path to your CACert file, ignore when it is empty
bucketName: bucket_A # Bucket name in MinIO/S3
rootPath: files_B # The root path where the message is stored in MinIO/S3
备份和恢复流程
milvus:
address: milvus_A
port: 19530
authorizationEnabled: false
# tls mode values [0, 1, 2]
# 0 is close, 1 is one-way authentication, 2 is two-way authentication.
tlsMode: 0
user: "root"
password: "Milvus"
# Related configuration of minio, which is responsible for data persistence for Milvus.
minio:
# cloudProvider: "minio" # deprecated use storageType instead
storageType: "minio" # support storage type: local, minio, s3, aws, gcp, ali(aliyun), azure, tc(tencent)
address: milvus_A # Address of MinIO/S3
port: 9000 # Port of MinIO/S3
accessKeyID: minioadmin # accessKeyID of MinIO/S3
secretAccessKey: minioadmin # MinIO/S3 encryption string
useSSL: false # Access to MinIO/S3 with SSL
useIAM: false
iamEndpoint: ""
bucketName: "bucket_A" # Milvus Bucket name in MinIO/S3, make it the same as your milvus instance
rootPath: "files_A" # Milvus storage root path in MinIO/S3, make it the same as your milvus instance
# only for azure
backupAccessKeyID: minioadmin # accessKeyID of MinIO/S3
backupSecretAccessKey: minioadmin # MinIO/S3 encryption string
backupBucketName: "bucket_A" # Bucket name to store backup data. Backup data will store to backupBucketName/backupRootPath
backupRootPath: "backup" # Rootpath to store backup data. Backup data will store to backupBucketName/backupRootPath
./milvus-backup create -c coll -n my_backup
修改 backup.yaml
文件,指向 Milvus B 并调整 MinIO 的根路径:
milvus:
address: milvus_B
port: 19530
authorizationEnabled: false
# tls mode values [0, 1, 2]
# 0 is close, 1 is one-way authentication, 2 is two-way authentication.
tlsMode: 0
user: "root"
password: "Milvus"
# Related configuration of minio, which is responsible for data persistence for Milvus.
minio:
# cloudProvider: "minio" # deprecated use storageType instead
storageType: "minio" # support storage type: local, minio, s3, aws, gcp, ali(aliyun), azure, tc(tencent)
address: milvus_B # Address of MinIO/S3
port: 9000 # Port of MinIO/S3
accessKeyID: minioadmin # accessKeyID of MinIO/S3
secretAccessKey: minioadmin # MinIO/S3 encryption string
useSSL: false # Access to MinIO/S3 with SSL
useIAM: false
iamEndpoint: ""
bucketName: "bucket_A" # Milvus Bucket name in MinIO/S3, make it the same as your milvus instance
rootPath: "files_B" # Milvus storage root path in MinIO/S3, make it the same as your milvus instance
# only for azure
backupAccessKeyID: minioadmin # accessKeyID of MinIO/S3
backupSecretAccessKey: minioadmin # MinIO/S3 encryption string
backupBucketName: "bucket_A" # Bucket name to store backup data. Backup data will store to backupBucketName/backupRootPath
backupRootPath: "backup" # Rootpath to store backup data. Backup data will store to backupBucketName/backupRootPath
./milvus-backup restore -c coll -n my_backup -s _bak
图2: 在共用同一个 S3 Bucket 的两个 Milvus 实例之间进行备份和恢复的流程
用例 3:在同一个 S3 服务不同 Bucket 的两个 Milvus 实例之间进行备份和恢复
从一个 Milvus 实例(Milvus_A)备份一个 Collection,并将其恢复到同一 S3 服务中的另一个 Milvus 实例(Milvus_B),但使用不同的 Bucket。
配置:
Milvus 使用
bucket_A
作为存储Milvus A 的 MinIO 配置:
minio:
address: localhost # Address of MinIO/S3
port: 9000 # Port of MinIO/S3
accessKeyID: minioadmin # accessKeyID of MinIO/S3
secretAccessKey: minioadmin # MinIO/S3 encryption string
useSSL: false # Access to MinIO/S3 with SSL
ssl:
tlsCACert: /path/to/public.crt # path to your CACert file, ignore when it is empty
bucketName: bucket_A # Bucket name in MinIO/S3
rootPath: files # The root path where the message is stored in MinIO/S3
Milvus B 的 MinIO 配置:
minio:
address: localhost # Address of MinIO/S3
port: 9000 # Port of MinIO/S3
accessKeyID: minioadmin # accessKeyID of MinIO/S3
secretAccessKey: minioadmin # MinIO/S3 encryption string
useSSL: false # Access to MinIO/S3 with SSL
ssl:
tlsCACert: /path/to/public.crt # path to your CACert file, ignore when it is empty
bucketName: bucket_B # Bucket name in MinIO/S3
rootPath: files # The root path where the message is stored in MinIO/S3
备份和恢复流程
milvus:
address: milvus_A
port: 19530
authorizationEnabled: false
# tls mode values [0, 1, 2]
# 0 is close, 1 is one-way authentication, 2 is two-way authentication.
tlsMode: 0
user: "root"
password: "Milvus"
# Related configuration of minio, which is responsible for data persistence for Milvus.
minio:
# cloudProvider: "minio" # deprecated use storageType instead
storageType: "minio" # support storage type: local, minio, s3, aws, gcp, ali(aliyun), azure, tc(tencent)
address: localhost # Address of MinIO/S3
port: 9000 # Port of MinIO/S3
accessKeyID: minioadmin # accessKeyID of MinIO/S3
secretAccessKey: minioadmin # MinIO/S3 encryption string
useSSL: false # Access to MinIO/S3 with SSL
useIAM: false
iamEndpoint: ""
bucketName: "bucket_A" # Milvus Bucket name in MinIO/S3, make it the same as your milvus instance
rootPath: "files" # Milvus storage root path in MinIO/S3, make it the same as your milvus instance
# only for azure
backupAccessKeyID: minioadmin # accessKeyID of MinIO/S3
backupSecretAccessKey: minioadmin # MinIO/S3 encryption string
backupBucketName: "bucket_B" # Bucket name to store backup data. Backup data will store to backupBucketName/backupRootPath
backupRootPath: "backup" # Rootpath to store backup data. Backup data will store to backupBucketName/backupRootPath
./milvus-backup create -c coll -n my_backup
milvus:
address: milvus_B
port: 19530
authorizationEnabled: false
# tls mode values [0, 1, 2]
# 0 is close, 1 is one-way authentication, 2 is two-way authentication.
tlsMode: 0
user: "root"
password: "Milvus"
# Related configuration of minio, which is responsible for data persistence for Milvus.
minio:
# cloudProvider: "minio" # deprecated use storageType instead
storageType: "minio" # support storage type: local, minio, s3, aws, gcp, ali(aliyun), azure, tc(tencent)
address: localhost # Address of MinIO/S3
port: 9000 # Port of MinIO/S3
accessKeyID: minioadmin # accessKeyID of MinIO/S3
secretAccessKey: minioadmin # MinIO/S3 encryption string
useSSL: false # Access to MinIO/S3 with SSL
useIAM: false
iamEndpoint: ""
bucketName: "bucket_B" # Milvus Bucket name in MinIO/S3, make it the same as your milvus instance
rootPath: "files" # Milvus storage root path in MinIO/S3, make it the same as your milvus instance
# only for azure
backupAccessKeyID: minioadmin # accessKeyID of MinIO/S3
backupSecretAccessKey: minioadmin # MinIO/S3 encryption string
backupBucketName: "bucket_B" # Bucket name to store backup data. Backup data will store to backupBucketName/backupRootPath
backupRootPath: "backup" # Rootpath to store backup data. Backup data will store to backupBucketName/backupRootPath
./milvus-backup restore -c coll -n my_backup -s _bak
图3: 在同一个 S3 服务不同 Bucket 的两个 Milvus 实例之间进行备份和恢复的流程
用例 4:在不同 S3 服务的两个 Milvus 实例之间进行备份和恢复
从使用一个 S3 服务(MinIO_A)的 Milvus_A 实例中为名为 “coll” 的 Collection 创建备份,并将其恢复到使用不同 S3 服务(MinIO_B)的 Milvus_B 实例中。这两个 Milvus 实例都使用不同的存储桶。
配置
Milvus A 的 MinIO 配置:
minio:
address: minio_A # Address of MinIO/S3
port: 9000 # Port of MinIO/S3
accessKeyID: minioadmin # accessKeyID of MinIO/S3
secretAccessKey: minioadmin # MinIO/S3 encryption string
useSSL: false # Access to MinIO/S3 with SSL
ssl:
tlsCACert: /path/to/public.crt # path to your CACert file, ignore when it is empty
bucketName: bucket_A # Bucket name in MinIO/S3
rootPath: files # The root path where the message is stored in MinIO/S3
Milvus B 的 MinIO 配置:
minio:
address: minio_B # Address of MinIO/S3
port: 9000 # Port of MinIO/S3
accessKeyID: minioadmin # accessKeyID of MinIO/S3
secretAccessKey: minioadmin # MinIO/S3 encryption string
useSSL: false # Access to MinIO/S3 with SSL
ssl:
tlsCACert: /path/to/public.crt # path to your CACert file, ignore when it is empty
bucketName: bucket_B # Bucket name in MinIO/S3
rootPath: files # The root path where the message is stored in MinIO/S3
备份和恢复流程
milvus:
address: milvus_A
port: 19530
authorizationEnabled: false
# tls mode values [0, 1, 2]
# 0 is close, 1 is one-way authentication, 2 is two-way authentication.
tlsMode: 0
user: "root"
password: "Milvus"
# Related configuration of minio, which is responsible for data persistence for Milvus.
minio:
# cloudProvider: "minio" # deprecated use storageType instead
storageType: "minio" # support storage type: local, minio, s3, aws, gcp, ali(aliyun), azure, tc(tencent)
address: minio_A # Address of MinIO/S3
port: 9000 # Port of MinIO/S3
accessKeyID: minioadmin # accessKeyID of MinIO/S3
secretAccessKey: minioadmin # MinIO/S3 encryption string
useSSL: false # Access to MinIO/S3 with SSL
useIAM: false
iamEndpoint: ""
bucketName: "bucket_A" # Milvus Bucket name in MinIO/S3, make it the same as your milvus instance
rootPath: "files" # Milvus storage root path in MinIO/S3, make it the same as your milvus instance
# only for azure
backupAccessKeyID: minioadmin # accessKeyID of MinIO/S3
backupSecretAccessKey: minioadmin # MinIO/S3 encryption string
backupBucketName: "bucket_A" # Bucket name to store backup data. Backup data will store to backupBucketName/backupRootPath
backupRootPath: "backup" # Rootpath to store backup data. Backup data will store to backupBucketName/backupRootPath
./milvus-backup create -c coll -n my_backup
使用与 S3 兼容的工具或 SDK,手动将备份从 minio_A:bucket_A/backup/my_backup
复制到minio_B:bucket_B/backup/my_backup
。
milvus:
address: milvus_B
port: 19530
authorizationEnabled: false
# tls mode values [0, 1, 2]
# 0 is close, 1 is one-way authentication, 2 is two-way authentication.
tlsMode: 0
user: "root"
password: "Milvus"
# Related configuration of minio, which is responsible for data persistence for Milvus.
minio:
# cloudProvider: "minio" # deprecated use storageType instead
storageType: "minio" # support storage type: local, minio, s3, aws, gcp, ali(aliyun), azure, tc(tencent)
address: minio_B # Address of MinIO/S3
port: 9000 # Port of MinIO/S3
accessKeyID: minioadmin # accessKeyID of MinIO/S3
secretAccessKey: minioadmin # MinIO/S3 encryption string
useSSL: false # Access to MinIO/S3 with SSL
useIAM: false
iamEndpoint: ""
bucketName: "bucket_B" # Milvus Bucket name in MinIO/S3, make it the same as your milvus instance
rootPath: "files" # Milvus storage root path in MinIO/S3, make it the same as your milvus instance
# only for azure
backupAccessKeyID: minioadmin # accessKeyID of MinIO/S3
backupSecretAccessKey: minioadmin # MinIO/S3 encryption string
backupBucketName: "bucket_B" # Bucket name to store backup data. Backup data will store to backupBucketName/backupRootPath
backupRootPath: "backup" # Rootpath to store backup data. Backup data will store to backupBucketName/backupRootPath
./milvus-backup restore -c coll -n my_backup -s _bak
图4: 在不同 S3 服务的两个 Milvus 实例之间进行备份和恢复的流程
04.
配置文件说明编辑 configs/backup.yaml
文件,以根据您的环境定制备份设置。以下是配置选项的详细说明:
日志记录:配置日志级别和输出偏好。
log:
level: info # Only supports debug, info, warn, error, panic, or fatal. Default 'info'.
console: true # whether print log to console
file:
rootPath: "logs/backup.log"
Milvus 连接信息:设置用于连接至 Milvus 实例的连接信息。
milvus:
address: localhost
port: 19530
authorizationEnabled: false
# tls mode values [0, 1, 2]
# 0 is close, 1 is one-way authentication, 2 is two-way authentication.
tlsMode: 0
user: "root"
password: "Milvus"
MinIO 配置:设置备份与 MinIO 或其他兼容 S3 协议的存储如何交互。
minio:
# cloudProvider: "minio" # deprecated use storageType instead
storageType: "minio" # support storage type: local, minio, s3, aws, gcp, ali(aliyun), azure, tc(tencent)
address: localhost # Address of MinIO/S3
port: 9000 # Port of MinIO/S3
accessKeyID: minioadmin # accessKeyID of MinIO/S3
secretAccessKey: minioadmin # MinIO/S3 encryption string
useSSL: false # Access to MinIO/S3 with SSL
useIAM: false
iamEndpoint: ""
bucketName: "a-bucket" # Milvus Bucket name in MinIO/S3, make it the same as your milvus instance
rootPath: "files" # Milvus storage root path in MinIO/S3, make it the same as your milvus instance
# only for azure
backupAccessKeyID: minioadmin # accessKeyID of MinIO/S3
backupSecretAccessKey: minioadmin # MinIO/S3 encryption string
backupBucketName: "a-bucket" # Bucket name to store backup data. Backup data will store to backupBucketName/backupRootPath
backupRootPath: "backup" # Rootpath to store backup data. Backup data will store to backupBucketName/backupRootPath
05.
总结Milvus Backup 备份工具为在 Milvus 实例内部和跨实例备份和恢复 Collection 提供了强大的解决方案。无论您是在单个实例内管理备份,还是在同一个 S3 服务的实例之间,或者跨不同的 S3 服务的实例间进行备份恢复,milvus-backup 都能灵活且精准地处理所有情况。
核心要点
多功能:Milvus-backup 支持多种场景,从简单的实例内备份到复杂的跨服务备份恢复。
配置灵活:通过配置
backup.yaml
文件,用户可以定制化设置备份和恢复流程以满足特定用例和需求,适应不同的存储设置和网络配置。数据安全与可控性:通过直接设置备份的 S3 Bucket 和路径,可以保障备份数据的安全性,且实现数据访问控制,仅被授权的用户可以访问数据。
有效的数据管理对于在应用中充分发挥 Milvus 的潜力至关重要。通过学习使用 Milvus Backup 备份工具,您可以确保数据的持久性和可用性,即使在复杂的分布式环境中也是如此。本教程旨在帮助您掌握备份策略的能力、最佳实践和高效的数据处理技术。
无论您是开发者、数据工程师还是 IT 专业人士,欢迎了解和使用 Milvus-backup 工具。这个工具可以助力您的项目,提供可靠和高效的数据管理解决方案。
作者介绍
莫毅华
Zilliz 高级软件工程师
推荐阅读