本工作主要贡献如下:
创新的风格对齐方法:这是一种在生成图像系列中实现风格一致性的新方法。通过在扩散过程中引入最小的注意力共享,该方法无需优化即可在文生图模型中保持风格一致性;
无优化的零样本解决方案:与现有的需要进行微调和优化的方法不同,StyleAligned是一种零样本解决方案,不需要任何形式的优化或微调。这使得该方法在实现风格一致性方面更加高效和便捷;
多样的实验评估:在各种风格和文本提示下进行了广泛的实验,展示了该方法在高质量合成和风格一致性方面的有效性。本工作还展示了该方法在生成与参考图像风格一致的图像方面的性能。
图1 共享注意力层
图2 使用标准文生图模型和StyleAligned方法生成图像比较图
图3 消融实验,有无使用Style-aligned方法的定性比较
-- End--
[1] Aditya Ramesh, Mikhail Pavlov, Gabriel Goh, Scott Gray, Chelsea Voss, Alec Radford, Mark Chen, and Ilya Sutskever. Zero-shot text-to-image generation. International Conference on Machine Learning (ICML). 8821-8831, 2021.
[2] Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser, and Bjorn Ommer. High-resolution image synthesis with latent diffusion models. Conference on Computer Vision and Pattern Recognition (CVPR). 10684-10695, 2022.
[3] Chitwan Saharia, William Chan, Saurabh Saxena, Lala Li, Jay Whang, Emily L Denton, Kamyar Ghasemipour, Raphael Gontijo Lopes, Burcu Karagol Ayan, Tim Salimans, Jonathan Ho, David J. Fleet, Mohammad Norouzi. Photorealistic text-to-image diffusion models with deep language understanding. Advances in Neural Information Processing Systems (NeurlPS). 35:36479-36494, 2022.
[4] Jonathan Ho, Ajay Jain, and Pieter Abbeel. Denoising diffusion probabilistic models. Advances in Neural Information Processing Systems (NeurlPS). 6840-6851, 2020.
[5] Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, Gretchen Krueger, and Ilya Sutskever. Learning transferable visual models from natural language supervision. International Conference on Machine Learning (ICML). 5748-8763, 2021.
[6] Mathilde Caron, Hugo Touvron, Ishan Misra, Herve Jegou, Julien Mairal, Piotr Bojanowski, and Armand Joulin. Emerging properties in self-supervised vision transformers. International Conference on Computer Vision (ICCV). 9650-9660, 2021.
[7] Omer Bar-Tal, Lior Yariv, Yaron Lipman, and Tali Dekel. Multidiffusion: Fusing diffusion paths for controlled image generation. International Conference on Machine Learning (ICML). 1737-1752, 2023.
[8] Nataniel Ruiz, Yuanzhen Li, Varun Jampani, Wei Wei, Tingbo Hou, Yael Pritch, Neal Wadhwa, Michael Rubinstein, and Kfir Aberman. Hyperdreambooth: Hypernetworks for fast personalization of text-to-image models. Conference on Computer Vision and Pattern Recognition (CVPR). 6527-6536, 2024.
[9] Lvmin Zhang, Anyi Rao, and Maneesh Agrawala. Adding conditional control to text-to-image diffusion models. International Conference on Computer Vision (ICCV). 3836-3847, 2023.