[ComfyUI]CLIPtion:仅需100MB内存!媲美Florence2高效轻量级图像标注模型

科技   2024-12-28 19:12   四川  

CLIPtion:仅需100MB内存!高效轻量级图像标注模型

🌹大家好!欢迎来到破狼公众号。感谢大家的支持与鼓励。在AIGC探索道路上,我将与你一路同行。喜欢就星标关注破狼公众号或文末扫码加入交流群 !

Comfy-CLIPtion简介

今天介绍一款图片语义编码的模型:CLIPtionCLIPtion是一个快速且轻量级的图片标注描述扩展,是基于OpenAI CLIP ViT-L/14的模型。在日常文生图中使用Stable DiffusionSDXLSD3FLUX等文生图模型时,工作流中已经加载了ViT-L模型,它只需额外的100MB内存,就可以在工作流程中加入caption/prompt生成功能!

虽然使用更大的专用Caption模型和VLM能够提供更准确的描述,但是这个模型小巧、快速,并且可以复用已经加载的内容,并有选项的提供更好的CLIP对齐,因此CLIPtion可以作为一个快速高效的权衡选择方案。

  • • Github:https://github.com/pharmapsychotic/comfy-cliption

Comfy-CLIPtion ComfyUI体验

首先需要在ComfyUI中通过插件管理器安装Comfy-CLIPtion插件。

  • • Comfy-CLIPtion插件 :https://github.com/pharmapsychotic/comfy-cliption

  • • CLIPtion_20241219_fp16.safetensors(文末网盘下载):下载模型并放置在 /ComfyUI/custom_nodes/comfy-cliption目录下(也可以首次运行依赖程序自动下载,但该方式不建议,它会自动下载放到huggface缓存目录下)。下载地址:https://huggingface.co/pharmapsychotic/CLIPtion/blob/main/CLIPtion_20241219_fp16.safetensors

Flux文生图工作流

Flux文生图感兴趣的同学可参考LIBLIB在线运行工作流:FLUX[续篇]:12B参数23G最大开源文生图模型,Dev版直出惊艳美图欣赏
本文涉及ComfyUI工作流和模型均可在LIBLIBAI上下载或在线运行体验:

• F.1-绮梦流光-水湄凝香

https://www.liblib.art/modelinfo/134c6dd95aef48e98a22b24e003e026b

• 工作流-Flux文|图生图+LORA+提示反推一键切换工作流

https://www.liblib.art/modelinfo/782aacd70f604da39e83368c696a02a8


Comfy-CLIPtion工作流

Comfy-CLIPtion工作流已上传LIBLIB平台

https://www.liblib.art/modelinfo/1a05262821ff4f14975307a33e11cdea?versionUuid=bf9a6743b66245bf9fb7f6acfb5ad4c2

注意

• CLIPtion仅需100MB内存,可作为一款高效快速简单标准想法,能够与微软Florence2模型Florence2:使用LLM助力你的AI绘图,仅需一个模型搞定提示反推&对象检查&蒙版识别&文字识别&咨询建议多功能一体媲美。虽然更详细的标准需要Joy等大模型,但也需要更大的显存和推理时间。

Comfy-CLIPtion节点简介

CLIPtion Generate

该节点能够从图片或一批图片中创建Caption标题,参数选项如下:
  • • temperature - 控制生成的随机性 - 较高的值产生更多样化的输出,较低的值更集中和可预测

  • • best_of - 并行生成多少条Caption标题,并选择与图片CLIP相似度最高的标题

  • • ramble - 强制生成完整的77个令牌

CLIPtion Beam Search

实现功能一样,具有更详细的描述。该节点节点包含参数如下:
  • • beam_width - 并行考虑多少个替代标题 - 较高的值探索更多可能性但需要更长时间

  • • ramble - 强制生成完整的77个令牌

01.Caption

下面可以看出CLIPtion Beam Search节点的图像标注信息更详细,因此后续文生图将沿用它。

CLIPtion Generate

a young woman in a casual outfit with headphones and a white shirt sits outdoors on a pathway in a serene park setting.

CLIPtion Beam Search:

a young woman with dark hair and a white shirt with a blue logo is wearing headphones, standing in front of a serene outdoor setting with trees and a stone pathway, wearing a white shirt with a blue collar and a logo on the left side of the shirt, and has a calm and serene expression on her face and upper body, with the headphones facing the camera.

02.音乐

the second image is a close - up of a young woman with long, dark hair. she is wearing a white shirt with a blue logo and has a pair of white headphones on her ears. the background features a natural setting with trees and rocks, suggesting a garden or park setting. the image has a vintage feel, with a focus on her face and the headphones. the

03.纹身

a woman with a tiger tattoo on her shoulder royalty illustration a picture of a woman with a tiger tattoo on her chest royalty illustration a picture of a woman with a red rose tattoo on her shoulder. the woman has long, dark hair and is looking directly at the camera. she is wearing a white sweater. the background is a plain, light - colored wall. the overall style

04.圣诞猫咪

an anthropomorphic cat in a pink dress stands amidst a wintry landscape, beside a decorated christmas tree, while an illuminated firework illuminates the night sky above snow - capped mountains and a body of water. a snowman accompanies the scene. the overall mood is lighthearted and festive. the overall mood is lighthearted and festive. the overall mood is lighthear

05.街景

a young woman with short brown hair stands on a bustling city street, wearing a white tank top and black choker necklace, looking directly at the camera with a gentle smile. the street is lined with buildings, potted plants, and passersby can be seen in the background. the overall atmosphere is calm and peaceful. the overall mood of the image is calm and peaceful.

Cliption模型:关注公众号口令【ComfyUI插件】下comfy-cliption获取

更多推荐文章:

• [ComfyUI]MMAudio:有声视频最后一公里!腾讯混元视频模型与多模态音频合成

• LuminaBrush:CN作者最新大作,图像打光交互工具,简单画笔刷即可轻松搞定电商展览|摄影写真

• [ComfyUI]HelloMemeV2:表情迁移神器!直播数字人,更强保真和表情丰富的一致性迁移

• [ComfyUI]腾讯混元视频:强烈推荐!质量增强神器!Enhance-A-Video无性能和内存显著影响

• 腾讯混元视频:ComfyUI官方原生支持!GGUF量化低显存福音!开源生态高速发展

感兴趣加入[AGI技术交流群]+V

    如果觉得文章不错,就请在看转发三连

破狼
关注AIGC、LLM、绘图作品、软件工程、技术学习。交流+V:shunshizhiwu。
 最新文章