本工作主要贡献如下:
提出一种基于二维扩散模型的多视图合成方法,给出单个肖像进行推理,能够生成三维一致的新视角图像;
外观和相机视角的分离控制,可在保留身份和表情的情况下实现有效的相机视角控制;
通过交叉视图注意力模块和三维感知噪声生成实现三维视图一致性。
图1 网络结构图
图2 基于NVS模型生成三维一致性噪声
图3 DiffPortrait3D在数据集外的肖像生成结果
-- End--
[1] Tero Karras, Samuli Laine, Timo Aila. A style-based generator architecture for generative adversarial networks. Conference on Computer Vision and Pattern Recognition (CVPR). 4401-4410, 2019.
[2] Ben Poole, Ajay Jain, Jonathan T. Barron, Ben Mildenhall. Dreamfusion: Text-to-3D using 2D diffusion. The Eleventh International Conference on Learning Representations (ICLR). 2023.
[3] Ben Mildenhall, Pratul P. Srinivasan, Matthew Tancik, Jonathan T. Barron, Ravi Ramamoorthi, Ren Ng. Nerf: Representing scenes as neural radiance fields for view synthesis. Communications of the ACM (CACM). 65(1), 99-106, 2021.
[4] Ruoshi Liu, Rundi Wu, Basile Van Hoorick, Pavel Tokmakov, Sergey Zakharov, Carl Vondrick. Zero-1-to-3: Zero-shot one image to 3D object. International Conference on Computer Vision (ICCV). 9298-9309, 2023.
[5] Lvmin Zhang, Anyi Rao, Maneesh Agrawala. Adding conditional control to text-to-image diffusion models. International Conference on Computer Vision (ICCV). 3836-3847, 2023.
[6] Sizhe An, Hongyi Xu, Yichun Shi, Guoxian Song, Umit Ogras, Linjie Luo. Panohead: Geometry-aware 3D fullhead synthesis in 360deg. Conference on Computer Vision and Pattern Recognition (CVPR). 20950–20959, 2023.
[7] Yuwei Guo, Ceyuan Yang, Anyi Rao, Zhengyang Liang, Yaohui Wang, Yu Qiao, Maneesh Agrawala, Dahua Lin, Bo Dai. Animatediff: Animate your personalized text-to-image diffusion models without specific tuning. arXiv preprint arXiv. 2307.04725, 2023.
[8] Tobias Kirschstein, Shenhan Qian, Simon Giebenhain, Tim Walter, Matthias Nießner. NeRSemble: Multi-view Radiance Field Reconstruction of Human Heads. ACM Transactions on Graphics (TOG). 42(4), 161:1-161:14, 2023.