国产Sora开源,一夜4.6k Star,CogVideoX 复现指南

科技   2024-08-07 21:29   浙江  

基于Python的23种经典设计模式实战(超全)!留言区开奖啦~恭喜BabyFeng 获得《Python设计模式实战》一本

CogVideoX 复现指南,文本转视频,视频素材自动生成

转载请注明来源及作者。

大家好今天分享前几天由智谱AI开源 CogVideoX 项目复现教程,有一些坑,也有一些硬性要求,这个项目目前仅支持显存大于18g的机器,比如 3090、4090,或者更好的A40、A6000等。

项目地址:https://github.com/THUDM/CogVideo/tree/main

首先使用想读AI总结项目的核心点,CogVideoX是一个基于Transformer的视频生成模型,可以用于生成不同场景的视频片段。

接下来我们一起来复现下这个项目看看,怎么样!

首先你得有一台显存18G以上的机器,或者去矩池云等云服务器平台租用一台云服务器(2-3块一小时,挺划算的)。

机器准备好,环境我用的 Python 3.12.4 。如果你是24G显存机器,请直接看后面24G显存机器推理步骤部分。

插播:现在下载我们自己开发的 想读APP,阅读、学习文章更简单,点击查看想读安装使用说明。

前期准备

  • 安装torch
pip install torch torchvision torchaudio
  • 下载项目
git clone https://github.com/THUDM/CogVideo
  • 安装其他依赖
cd CogVideo/
pip install -r requirements.txt 
  • 运行发现缺更多依赖,安装
pip install gradio spaces imageio moviepy sentencepiece

运行程序

输入以下指令运行程序。

python gradio_demo.py 

运行后先会下载需要的模型,总共约13G。

下载好后就运行成功啦,有一个FileNotFoundError,我们手动创建一个即可。

mkdir output

推理生成视频

访问 7860 端口服务即可输入提示词开始生成视频。

A cow cat and a white cat were chasing each other under the shade of a tree. Suddenly, the white cat was attracted by the leaves blown by the wind.

界面如下:

前期推理显存就19G了,

然后显存爆了。

怪我没看清项目里说的,18G显存可推理是 SAT 模式,如果你和我一样是24G显存机器,那么继续看后面操作即可,如果你是32G以上显存机器,那你应该不会出错。

24G显存机器推理步骤

设置 sat 模式,需要安装下载一些依赖,估计  30 分钟左右,主要有一个 t5-v1_1-xxl 太大了,有44G。

  • 安装 sat 相关依赖
cd CogVideo/sat
pip install -r requirements.txt 
  • 下载需要的模型
mkdir CogVideoX-2b-sat 
cd CogVideoX-2b-sat
wget -O vae.zip https://cloud.tsinghua.edu.cn/f/fdba7608a49c463ba754/?dl=1  
unzip vae.zip
wget -O transformer.zip https://cloud.tsinghua.edu.cn/f/556a3e1329e74f1bac45/?dl=1  
unzip transformer.zip

rm vae.zip transformer.zip
  • 下载最大的模型
# 这里是个坑,这个模型仓库里有88g
# 我们只需要 pytorch_model.bin (44g)所以可以先clone项目小文件,
# 然后wegt 下载这个大模型即可
git clone https://huggingface.co/google/t5-v1_1-xxl.git
cd t5-v1_1-xxl
wget https://huggingface.co/google/t5-v1_1-xxl/resolve/main/pytorch_model.bin

前面都配置好,就可以修改下配置文件,开始运行了。

打开CogVideo/sat目录下的 configs/cogvideox_2b_infer.yaml 文件,然后将下面几个地方路径改成我们下载存放的模型实际路径即可。

运行:

bash inference.sh

测试效果

以下为使用 4090 24G显存 SAT 模式下CogVideoX-2B推理生成视频效果。

随便输入个提示词:

A cow cat and a white cat were chasing each other under the shade of a tree. Suddenly, the white cat was attracted by the leaves blown by the wind.

4090 推理显存使用情况,大概占用 16G 左右。

生成图片在项目目录 sat/outputs 中。

下载到本地,看看视频效果:

是不是哪里搞错了。。。

赶紧提个 issue 问问。

按官方给的案例优化后的 prompts:

A playful cow cat with black and white patches and a pure white cat with blue eyes chase each other energetically under the dappled shade of a large oak tree in a peaceful garden. The cow cat leads the chase, its tail flicking with excitement, while the white cat follows closely, both moving swiftly on the soft, grassy ground. Suddenly, the white cat's attention is captured by the rustling leaves as a gentle breeze stirs them, causing it to pause and gaze upward in fascination. The garden, filled with colorful flowers and a small birdbath, enhances the tranquil and lively scene.

好了一点,但还是差点意思,换个提示词,再看看 :

A majestic elephant stands in a shallow river, its massive body half-submerged in the cool, flowing water. The elephant uses its trunk to spray water over its back, creating a sparkling arc of droplets that catch the sunlight. Its gray skin glistens with wetness as it enjoys the refreshing bath. Surrounding the river are lush green trees and shrubs, adding to the serene and natural ambiance. Birds can be seen perched on branches and flying above, while the gentle sound of the river flowing adds to the peaceful atmosphere.

来看看官方给的提示词生成效果吧。

A detailed wooden toy ship with intricately carved masts and sails is seen gliding smoothly over a plush, blue carpet that mimics the waves of the sea. The ship's hull is painted a rich brown, with tiny windows. The carpet, soft and textured, provides a perfect backdrop, resembling an oceanic expanse. Surrounding the ship are various other toys and children's items, hinting at a playful environment. The scene captures the innocence and imagination of childhood, with the toy ship's journey symbolizing endless adventures in a whimsical, indoor setting.

In the haunting backdrop of a war-torn city, where ruins and crumbled walls tell a story of devastation, a poignant close-up frames a young girl. Her face is smudged with ash, a silent testament to the chaos around her. Her eyes glistening with a mix of sorrow and resilience, capturing the raw emotion of a world that has lost its innocence to the ravages of conflict.

还是官方的 prompt 写的好啊~

再挣扎尝试下,他真的好快,我都不好意思分享prompt了,还是继续学习学习prompt写法:

开放性留言,说说你阅读文章后的想法。


加入我们


扫码加我微信,回复:AI阅读
一起迎接AI时代

简说Python
号主老表,自学,分享Python,SQL零基础入门、数据分析、数据挖掘、机器学习优质文章以及学习经验。
 最新文章