首块缓存:全方位模型推理加速神器
🌹大家好!欢迎来到破狼公众号。感谢大家的支持与鼓励。在AIGC探索道路上,我将与你一路同行。喜欢就星标关注破狼公众号或文末扫码加入交流群 !
First Block Cache简介
今天推荐一款为 ComfyUI 设计的全方位推理优化解决方案:Comfy-WaveSpeed,这是为ComfyUI中文生图或视频模型加载和推理在提供通用性、灵活性和快速性。该ComfyUI插件引入了First Block Cache(动态缓存,首块缓存) 和 增强的 torch.compile
两项推理性能优化方案。
First Block Cache(动态缓存,首块缓存) 是受 TeaCache 和其他去噪缓存算法启发,引入首块缓存(FBCache),使用第一个 Transformer 块的残差输出作为缓存指标。如果当前和前一个第一个 Transformer 块的残差输出之间的差异足够小,就可以重用前一个最终残差输出,并跳过所有后续 Transformer 块的计算。这可以显著降低模型的计算成本,在保持高精度的同时实现高达约 2 倍的速度提升。First Block Cache (FBCache)可以与文生图Flux模型、腾讯混元视频模型等加载模型结合使用。
由于torch.compile
可能无法很好地与模型卸载配合使用,以及torch.compile
官方不支持 Windows 并且与 LoRAs 存在问题,故此本文将不会做为重点介绍。
First Block Cache ComfyUI体验
首先通过ComfyUI插件管理器Git安装Comfy-WaveSpeed 插件。
• Comfy-WaveSpeed插件:https://github.com/chengzeyi/Comfy-WaveSpeed.git
• ComfyUI混元视频:安装详情参见文章:[ComfyUI]腾讯混元视频:官方极限优化8GB可运行!32G到8G极限优化,开源生态加速
Flux文生图&混元视频工作流
• F.1-绮梦流光-水湄凝香:
https://www.liblib.art/modelinfo/134c6dd95aef48e98a22b24e003e026b
• 文生图-Flux文生图(PuLID|LORA|Joy|SUPIR)工作流:
https://www.liblib.art/modelinfo/782aacd70f604da39e83368c696a02a8?versionUuid=9c5eceb01fb94d4d93d60fe2c0bd7468
• 文生视频-腾迅混元最强开源视频(LORA)工作流:
https://www.liblib.art/modelinfo/35ee21d5f6a94204abb767ad194ab9cd?versionUuid=be674032ffa14e5597a08922556f4da0
First Block Cache(首块缓存)工作流体验
First Block Cache(首块缓存)工作流已上传LIBLIBAI平台可体验:https://www.liblib.art/modelinfo/433fbf0bd2a8484d8e32d9e32258f378?versionUuid=d6aff5cdda1e44f6b52eeaec45bef268
• First Block Cache:是在紧接着 Load Diffusion Model
节点后添加wavespeed->Apply First Block Cache
节点,并调整residual_diff_threashold
值以适应对应的加载模型。• 对于 flux-dev.safetensors
使用fp8_e4m3fn_fast
和 28 步时,设置为0.07
。预计可实现 1.5 倍到 3.0 倍的速度提升,同时保持可接受的精度损失。• First Block Cache同时支持 FLUX
、LTXV (native and non-native)
和HunyuanVideo (native)、SDXL
等多种模型。本文将以腾讯混元视频模型为例子。推荐配置如下:
• 结合First Block Cache可以实现混元视频模型显著的速度提升,First Block Cache会有一定的质量损失。通常的默认设置residual_diff_threshold 值设置为 0.035,实际使用中可在0-0.1之间权衡和调试,值越高加速越快,但是如果觉得视频生成质量不够理想,可以适当的降低该值(当然,速度会变慢一些)。 • torch.compile需要结合Linux和--gpu-only 使用。
不启用First Block Cache:848*480分辨率5秒视频(121帧),仅需约4分4秒耗时。
启用First Block Cache:848*480分辨率5秒视频(121帧),residual_diff_threshold 为0.035,仅需约3分21秒耗时。大约17.62%的提升。启用First Block Cache:848*480分辨率5秒视频(121帧),residual_diff_threshold 为0.07,仅需约2分35秒耗时。大约36.48%的提升。
01.图书馆
princess zelda sitting at a desk in a library with a stack of books and she’s texting on her iPhone
02.火车
A beautiful Chinese woman with long hair is sitting on a high-speed train, preparing to travel She was wearing a cool and slim white dress, very elegant The sunlight shone through the car window, illuminating her fair skin and gentle smile She quietly watched the scenery passing by outside the window, her hair swaying gently in the wind The camera captured her quiet and beautiful moments from the side The entire screen is presented with high-quality images, giving people a comfortable feeling
03.比心
This video describes an anime woman HD video, white dress, A woman is wearing a shiny silver outfit. There is a large moon behind her. The woman has long brown hair. , Luxueqi woman raised her hands and broke a heart -shaped hand shape on her chest, and then slowly raised one hand as if in front of her mouth, and blowing out a kiss in the hand shape.
如果觉得文章不错,就请赞、在看与转发三连