由「彩虹之眼」整理 | Flux社区🔗成员知乎大佬「ChubbyPillow」投稿(建议收藏) 前阵子我刚测评过了SD3.5的“大杯”(Large,8B参数量),今天SD3.5中杯(Medium,2.5B)终于发布了,也来给大家测试一下! 这次就不换新的提示词了,太麻烦了……我会直接将结果和SD3.5大杯横向对比,让大家能够快速了解两者的区别。 就不多讲使用方法了,跟之前SD3.5大杯是几乎一模一样的用法,只不过在Conditioning Set Timestep Range这两个node上,一个变成了0.2-1一个变成了0-0.2,其余都是一样的。
提示词:a cinematic front view photo of a slim white male dryad emerging from a tree, his eyes closed with his head lowered, facing the viewer with his back on the tree. His arms and chest are made of green branches and white flowers, his hair made of brown vines and branches, his body fused with the tree trunk, his skin covered in moss and leaf, his shoulders and collar bone resembling pale human skin. The photo is taken with a 35mm lens capturing the essence of golden hour.
这个主题真有点难说……虽然大杯对我提示词的理解能力明显上了一个台阶,审美和打光也都更好,但是不得不承认中杯的皮肤质感真的很惊人。放大仔细看的话你甚至能看出来皮肤底下透出来的很自然的粉红色,还有那种独特的凹凸不平的质感,锁骨的阴影看起来也非常自然,这点基本上就是碾压了Flux(Flux即使低guidance也做不出来这种效果……)我觉得这种风格我也蛮喜欢的,以后如果做人像我可能会先用大杯出整体然后中杯换皮肤/高清放大。我前阵子玩Flux的时候用过一个微调的TE,名字如上图,大家感兴趣的可以去这里下载:https://huggingface.co/zer0int/CLIP-GmP-ViT-L-14/tree/main这位大佬做了很多个版本,有的更善于改善文字生成,有的可以改善提示词理解能力(?),我测试了几次之后觉得这个版本我用起来效果比较好(我不怎么做文字)。我觉得用这个TE生成的效果貌似比原版更好,感觉更接近我想要的效果(“body fused with the tree trunk”“his skin covered in moss”)。虽然aesthetic上来看我整体还是更喜欢大杯的风格……
提示词:full color portrait photo of a 20yo woman laying underwater, sunlight casting realistic water caustics on her face and body, she's wearing a gauze white dress.
欸嘿,我感觉在这个主题上中杯好像表现得还更好?虽然比较遗憾的是它对water caustics的理解并没有大杯好,但是视觉效果挺棒的,有点柔光的感觉,更接近我想要的水下写真的那种很柔美清新的感觉。不过仔细看的话会发现胸口那块的裙子好像崩了……就是裙子的纹理leak到皮肤上去了提示词:a top-down close-up photo of a slim attractive woman playing the piano, the camera focusing on her hands, she is wearing a red and black plaid skirt, the piano is shiny and black. The photo is taken in a bright room with soft diffused sunlight.中杯还是以前那个问题,做手如果CFG太低了会崩得要命。上面这张已经是我用CFG=5做的了,如果用3.5一类的会崩得比SD1.5还糟糕。我给大家几个例子吧:我当时的反应:
提示词:Amateur phone photo of a slim attractive white man, wearing unbuttoned long-sleeve white dress shirt and black pants. He is sitting on the front edge of bed, hand running through his hair, while looking at viewer. The photo was taken in a bedroom at night, the bedroom is dimly lit with warm color tone, the only light source is the bedside table lamp. The photo was posted to reddit in 2012. The image is grainy jpeg with motion blur and soft focus, a snapshot taken by amateur with deep focus, added digital sharpening and blurry, diffused poor lighting.
提示词:amateur side view photo of a slim white woman, she is cooking in kitchen and wearing a white apron, underneath the apron is a plain grey tshirt, she is looking at the food in the steel sauce pot, her head lowered, holding a wooden spoon in her right hand. The photo was taken from her left side by an amateur, taken with a smartphone in 2015, in a modern kitchen with soft diffused indoor lighting at night.
右边这张不是我上次做的图,是我今天做的和左边图完全一样的设置生成的图,同样用了"shallow depth of field, bokeh, blurry background"的负面提示词,同样用了替代的CLIP-L,但是显然大杯的SD3.5仍然很难做出来清晰的背景。提示词:a photo of a blue lynx ragdoll cat, it is facing camera, standing on its hind legs on a blue pillow, holding out one paw, wearing a wizard hat and a purple wizard robe, casting spells with its paw, silver sparkles swirling around its paw. It has light colored fur and blue eyes. The photo is taken in a spring garden in the morning, with bright diffused natural lighting.先跟大家说一声,我这次用的提示词跟上次有一丁丁点区别,就是把seal bicolor ragdoll换成了blue lynx ragdoll……
提示词:Photograph of a majestic cake adorned with intricate fondant decorations inspired by ocean waves. The whole cake has a base color of modest dark blue, surrounded by swirls of light blue layers shaped like ocean waves, the layers closing in from bottom to top, forming a curved shape like a blooming rosebud. On the outer side of the cake, pink and purple corals decorating the bottom of ocean wave fondant, resembling a beautiful tiara. The photo is taken in a room with simple dark background.
因为这三张效果很不一样,我就都放上来了……我觉得带那个微调CLIP-L的对我的提示词理解更好些,但是审美上可能不如原版出来的水平?虽然中杯搭配的配色挺类似的,但是不得不承认大杯的配色更有层次感,蛋糕周围的一层层奶油也更有波浪的感觉,中杯的除了中间的海浪做得还挺吸引人,外面的一圈有点无聊。替换CLIP-L的版本在形状上更创新一些,感觉更接近我想要的一层层堆起来、像花瓣一样的感觉。提示词:1980s Retro manga style illustration of a slim young white man, with messy wavy light brown hair and fair skin, his head tilted to the side, his face clean-shaved. The image only portrays his face and chest. He is wearing a long clothing made of white waterlily flowers and green leaves, the thick layers of leaf covering his whole body, contrasting his blue eyes, a laurel leaf wreath on his head, while he looks at viewer. He is in a summer forest at dusk, soft diffused sunlight shining on him.这个主题上中杯赢太多了。至少在我看来是这样。首先大杯根本就不知道retro manga是个啥意思,做出来的风格是那种很现代的插画风格,中杯至少在脸部的线条是真的和90s manga更接近的,虽然离真正的80-90s manga还是差挺远,但是至少比大杯看着清爽自然很多。大杯我感觉就是不搞AI的人眼中典型的“AI味儿”,中杯虽然细节也有点乱,但是整体不会给人太油腻的感觉。然而提示词遵循能力中杯就有点糟糕……
提示词:Vector clipart of a fluffy orange cat sitting on an office chair, facing a computer moniter, its one paw placed on keyboard, one paw placed on mouse, turning to look at viewer, simple pale pink background, bold line style.
这个好像……比大杯稍微好一点点?至少猫是真的在椅子上的,确实在look at viewer,但是并没有正对着显示器……线条是真的算bold line了,这个我得加分。其他的感觉也好一些,可能是视角原因,中杯生成的显示器和鼠标相对看起来比较正常。另外就是我觉得颜色上来讲中杯的猫的颜色显得更均匀,虽然还是赶不上Flux生成的那种flat color,但是至少比右边这版要flat多了哈哈哈……
提示词:Crayon drawing of a chubby white duck on top of a tubby orange cat on top of a small capybara. All three animals stacked vertically, on the grass of a sunny garden.
非常遗憾的是中杯似乎完全失去了有关水豚的知识(SD3.5 Medium screaming: Why should I know what a capybara is when most of you weebs are gonna generate big booba waifus anyway???)但是它还是保留了这种手绘的粗糙感,我还是挺喜欢的。不过对于这种比较复杂的动物主题它经常会崩身体结构……(我有试过仅仅生成水豚,它对颜色的认知还是对的,但是嘴巴就完全不像)
提示词:3D animation movie scene of a grumpy penguin with wings, facing the viewer while holding a large board that says "IT'S PENGUIN, NOT PENGWING", sitting on ice in antarctica, DreamWorks style.
提示词:A renaissance oil painting of a strange creature that resembles a chimera of fish and cat, the creature's upper body is a white angora cat, its lower body looks like tropical fish with iridescent fish scales. The creature is swimming in the sea, the water is dark blue colored.