经常从事AI绘画创作的小伙伴有一个感触:尽管AI技术越来越强大,但想要精准的描述一幅画面,难度却越来越大。
SD1.5的时候,需要各种复杂的prompt(提示词),甚至还要强化参数,才能勉强获得满意的效果;
SDXL的时候,AI对自然语言的理解变得强大,几乎不需要刻意输入参数,就可以画出想要的画面,但细节方面的把握,依然还有所欠缺;
Flux时代,AI几乎能够很容易的理解自然语言,却来了一个新问题:不是所有人都能非常清晰的描述出想要的画面。
哪怕看着样图,也无法把画面每个细节表述清楚。
那怎么办?
反推。
所谓反推,是使用大模型技术用AI识别样图,把样图的细节表达出来,ChatGPT等AI工具都有类似功能。
不过今天我推荐的是开源工具:JoyCaption Two。
这款工具的原理是使用LLM开源大模型对图片进行识别,然后描述出画面的所有细节,并提供配置选项。
比如镜头参数,比如图像风格,比如构图等等,都有开关。
举个例子,这是我上传的图片。
反推出来的提示词。
photo of a young East Asian woman, studio setting, soft lighting, pastel blue background, medium close-up shot, centered composition, rule of thirds, high-key lighting, slight smile, fair skin, long straight black hair, hair partially covering right shoulder, soft makeup, natural look, wearing traditional Chinese qipao dress, white with blue floral patterns, high collar, delicate embroidery, smooth fabric texture, delicate features, slender physique, serene expression, no visible accessories, background blurred, depth of field shallow, likely taken with a DSLR camera, aperture around f/2.8, shutter speed 1/125 sec, ISO 100, elegant and poised, cultural attire, minimalist background, studio photography, portrait style, gentle and calm atmosphere, no visible watermark, soft and clean aesthetic, professional photography, no visible shadows, neutral and calming color palette, focus on the subject's face and upper body, gentle lighting, soft shadows, soft focus, portrait orientation, no visible text or logos, no visible background elements, neutral and calm composition, simple and elegant design, traditional and cultural elements, traditional Chinese fashion, soft and delicate aesthetic, traditional and cultural context, traditional and cultural photography, traditional and cultural portrait, traditional and cultural style, traditional and cultural attire, traditional and cultural dress, traditional and cultural fashion, traditional and cultural elements, traditional and cultural context, traditional and cultural photography, traditional and cultural portrait, traditional and cultural style, traditional and cultural attire, traditional and cultural dress
翻译成中文。
一张东亚年轻女性的肖像照片,摄影棚布景,柔和的光线,浅蓝色背景,中景特写镜头,居中构图,三分法则,高光照明,微笑,肤色白皙,长直黑发,头发部分遮住右肩,妆容柔和,自然风格,穿着传统中式旗袍,白色带有蓝色花卉图案,高领,精致刺绣,平滑的面料质地,细腻的五官,苗条身材,宁静的表情,没有可见的配饰,背景虚化,景深较浅,可能是用单反相机拍摄,光圈大约为 f/2.8,快门速度 1/125 秒,ISO 100,优雅而稳重,文化服饰,极简背景,摄影棚摄影,肖像风格,温和宁静的氛围,没有可见的水印,柔和而干净的审美,专业摄影,没有可见的阴影,中性且令人平静的色彩搭配,聚焦于主体的面部和上半身,柔和的光线,柔和的阴影,柔和的焦点,肖像方向,没有可见的文字或标志,没有可见的背景元素,中性而平静的构图,简单而优雅的设计,传统与文化元素,传统中式时尚,柔和而精致的美学,传统与文化背景,传统与文化摄影,传统与文化肖像,传统与文化风格,传统与文化服饰,传统与文化服装,传统与文化时尚,传统与文化元素,传统与文化背景,传统与文化摄影,传统与文化肖像,传统与文化风格,传统与文化服饰,传统与文化服装。
可见信息量非常大,足够使用了。
不过这个插件的安装非常繁琐,请睁大眼睛跟我一步步操作。
1、安装插件
https://github.com/EvilBT/ComfyUI_SLK_joy_caption_two
以秋叶启动器为例,在版本--安装新扩展--输入url位置输入该地址,然后点击安装即可。
2、下载和安装模型
该插件的模型非常多!安装巨繁琐!
为了方便大家,我直接打包好,大家只需要解压缩,然后拷贝到models目录下即可。
解压缩后路径示例:
拷贝路径示例:
3、翻译
可以看到我的是中文版,默认安装完了是英文版。
需安装translation插件,插件地址(一般默认安装好了):
https://github.com/AIGODLIKE/AIGODLIKE-COMFYUI-TRANSLATION
复制translation
文件夹下的中文翻译到对应的语言包路径下,重启就可以使用中文版的了。
把ComfyUI\custom_nodes\ComfyUI_SLK_joy_caption_two\translation\zh-CN\NodesComfyui_SLK_joy_caption_two.json
复制到目录:ComfyUI\custom_nodes\AIGODLIKE-COMFYUI-TRANSLATION\zh-CN\Nodes
即可。
4、反推效果
我试了下我的头像。
反推提示词:
anime style, digital drawing, character: astronaut, space setting, dark starry background, no visible stars, character centered, floating in space, wearing white spacesuit, helmet visor reflective, Japanese text on spacesuit, small triangular symbol on spacesuit, left arm extended, right arm bent, finger pointing upwards, spacesuit details: zippers, pockets, patches, texture: smooth, shiny, reflective, spacesuit color: white, helmet visor color: dark blue, background: deep black, no visible planets or celestial bodies, composition: rule of thirds, character positioned in upper right third, character looking to the right, space suit details: realistic proportions, realistic textures, space suit style: futuristic, sci-fi, anime style: clean lines, sharp edges, smooth shading, no visible brush strokes, no visible camera, no visible depth of field, no visible focus, no visible blur, no visible lens flare, no visible noise, no visible grain, no visible light leaks, no visible vignette, no visible distortion, no visible chromatic aberration, no visible motion blur, no visible artifacts, no visible compression artifacts, no visible artifacts, no visible artifacts, no visible artifacts, no visible artifacts, no visible artifacts, no visible artifacts, no visible artifacts, no visible artifacts, no visible artifacts, no visible artifacts, no visible artifacts, no visible artifacts, no visible artifacts, no visible artifacts, no visible artifacts, no visible artifacts, no visible artifacts, no visible
使用反推提示词生成的图片(墨幽Flux大模型):
应该说还原效果还是非常强大的。
但是!!!
原图是截取过的,真正的原图是韩国画师的作品。
网盘下载(含工作流):
夸克:
https://pan.quark.cn/s/19e87d41946e
百度:
https://pan.baidu.com/s/1uJskQFGHdMdJga84EtTX_A?pwd=j6na
提取码:j6na