多模态人脸反欺诈综述
余梓彤 陈昌盛
大湾区大学 深圳大学
1. 多模态人脸反欺诈背景与数据集
图1 WMCA数据集[11]中的可见光RGB、深度、红外和热力等多种模态真实人脸与呈现攻击
表1 多模态人脸反欺诈数据集总览
数据集 | 年份 | 真实 | 欺诈 | 个体 | 模态类型 | 设定 | 攻击类型 |
3DMAD [2] | 2013 | 170视频 | 85视频 | 17 | 可见光、深度 | 3个场景 | 面具(纸,树脂) |
3DFS-DB [3] | 2016 | 260视频 | 260视频 | 26 | 可见光、深度 | 角度丰富的头部运动 | 面具(塑料) |
BRSU [4] | 2016 | 102图像 | 404图像 | 137 | 可见光、短波红外 | 具有4个波段的多光谱短波长红外 | 面具(硅胶、塑料、树脂、乳胶) |
Msspoof [5] | 2016 | 1470图像 | 3024图像 | 21 | 可见光、近红外 | 7个环境条件 | 黑白打印(平面) |
MLFP [6] | 2017 | 150视频 | 1200视频 | 10 | 可见光、近红外,热力 | 室内和室外,固定和随机背景 | 面具(乳胶、纸) |
ERPA [7] | 2017 | 总共86视频 | 5 | 可见光、深度、近红外、热力 | 拍摄对象距离 2 种相机较近(0.3∼0.5 米) | 打印(平面)、重放(显示器)、面具(树脂、硅胶) | |
CSMAD [8] | 2018 | 104视频 | 159视频 | 14 | 可见光、深度、近红外、热力 | 4个光照环境 | 面具(定制硅胶) |
3DMA [9] | 2019 | 536视频 | 384视频 | 67 | 可见光、近红外 | 48不同ID面具;2种照明和4种捕捉距离 | 面具(塑料) |
CASIA-SURF [10] | 2019 | 3000视频 | 18000视频 | 1000 | 可见光、深度、近红外 | 删除背景;随机剪切眼睛、鼻子或嘴巴区域 | 打印(平面、扭曲、切片) |
WMCA [11] | 2019 | 347视频 | 1332视频 | 72 | 可见光、深度、近红外、热力 | 6个具有不同背景和照明的场景 | 打印(平面)、重放(平板电脑)、局部(眼镜)、面具(塑料、硅胶和纸、人体模型) |
CeFA [12] | 2019 | 6300视频 | 27900视频 | 1607 | 可见光、深度、近红外 | 3个种族;户外和室内;带假发和眼镜的装饰 | 打印(平面、扭曲)、重放、面具(3D 打印、硅胶) |
HQ-WMCA [13] | 2020 | 555视频 | 2349视频 | 51 | 可见光、深度、近红外、热力、短波红外 | 4种近红外波长和 7 种短波红外波长;面具和人模加热至体温 | 激光或喷墨打印(平面)、重放(平板电脑、手机)、面具(塑料、硅胶、纸、人模)、化妆、局部(眼镜、假发、纹身) |
PADISI-Face [14] | 2021 | 1105视频 | 924视频 | 360 | 可见光、深度、近红外、热力、短波红外 | 室内,固定绿色背景 | 打印(平面)、重放(平板电脑、手机)、局部(眼镜、有趣的眼睛)、面具(塑料、硅胶、透明、人模) |
Echo-FAS [15] | 2022 | 250000 声学信号片段 | 30 | 可见光、声学 | 4个环境变量:设备、距离、环境噪音和音调 | 打印(平面)、重放、面具(纸张打印、扭曲、半脸) | |
Echoface-Spoof [16] | 2024 | 82715图像和声学 | 166637图像和声学 | 30 | 可见光、声学 | 4种采集设备,3种采集距离,3种环境噪声 | 打印(平面)、重放 |
2 多模态人脸反欺诈方法
2.1 基于可见光多线索融合的人脸反欺诈
目前的基于可见光多线索融合的方法大多依赖于人工提取的额外线索(如伪深度图、伪反射图、rPPG信号、文本提示线索),但上述方法仍存在一些问题。一方面,这些线索也容易受到域偏差和新型攻击方式的影响。另一方面,由于多种线索是异态的,如果进行有效地对齐或融合至关重要。因此,后续研究工作也关注如何进行更稳健的线索表征和更高效的信息融合。
2.2 基于多传感模态融合的人脸反欺诈
3 总结与展望
1)由于目前多模态深度架构、监督方式和学习策略的局限性,现有多模态人脸欺诈检测模型具有有限的表征能力。学习具有判别性和泛化性的真实/欺诈特征对于多模态人脸反欺诈至关重要。未来更需设计新型多模态算子(如高阶多变换域的组合卷积与注意力模块)和基础模型,引入更多“野外(in-the-wild)”进行自监督/半监督的多模态预训练,并探索自动神经搜索在高效且有效融合上的应用。
2)现有方法通常在不切实际的多模态测试基准和协议下进行评估,不利于评价其在真实场景下的表现。例如,WMCA[11]和CAISA-SURF[10]等数据集的内部训练和内部测试结果表明在这种小规模和单调的测试集上性能饱和。一方面,多模态多源域训练,并多模态跨域测试的基准[43]仍处于摸索阶段。由于受到模态间本质偏差和特定模态域偏差影响,大多单模态欺诈检测域泛化算法[44]在多模态人脸反欺诈上难以带来增益。另一方面,由于现实世界中难以同时在训练和测试场景得到所有模态表征,灵活模态(即包含训练/测试时模态完整和部分模态缺失两种情况)[45,46]设定更具落地价值。因此,多模态人脸反欺诈的多源域泛化问题和灵活模态设定具有巨大的发展潜力。
3)对可解释性和隐私问题的考虑不足。大多数现有的多模态人脸反欺诈研究致力于开发针对最先进性能的新算法,但很少考虑背后的可解释性。这种黑盒方法很难在现实世界中做出可靠的决策,引入多模态大语言模型对域场景及潜在攻击进行任务关联的精细描述并推理决策,是增强模型可解释性的一大方向。此外,大多数现有工作都使用大量存储的源域人脸数据来训练和微调深度人脸反欺诈模型,而忽略了隐私和生物特征敏感性问题。因此,探索和解决多模态联邦学习和多模态无源(Source-free)域自适应问题具有前景。
参考文献
[1]Yu, Z., Qin, Y., Li, X., Zhao, C., Lei, Z., & Zhao, G. (2022). Deep learning for face anti-spoofing: A survey. IEEE transactions on pattern analysis and machine intelligence, 45(5), 5609-5631.
[2]Erdogmus, N., & Marcel, S. (2014). Spoofing face recognition with 3D masks. IEEE transactions on information forensics and security, 9(7), 1084-1097.
[3]Galbally, J., & Satta, R. (2016). Three‐dimensional and two‐and‐a‐half‐dimensional face recognition spoofing using three‐dimensional printed models. IET Biometrics, 5(2), 83-91.
[4]Steiner, H., Kolb, A., & Jung, N. (2016, June). Reliable face anti-spoofing using multispectral SWIR imaging. In 2016 international conference on biometrics (ICB) (pp. 1-8). IEEE.
[5]Chingovska, I., Erdogmus, N., Anjos, A., & Marcel, S. (2016). Face recognition systems under spoofing attacks. Face Recognition Across the Imaging Spectrum, 165-194.
[6]Agarwal, A., Yadav, D., Kohli, N., Singh, R., Vatsa, M., & Noore, A. (2017). Face presentation attack with latex masks in multispectral videos. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops (pp. 81-89).
[7]Bhattacharjee, S., & Marcel, S. (2017, September). What you can't see can help you-extended-range imaging for 3d-mask presentation attack detection. In 2017 International Conference of the Biometrics Special Interest Group (BIOSIG) (pp. 1-7). IEEE.
[8]Bhattacharjee, S., Mohammadi, A., & Marcel, S. (2018, October). Spoofing deep face recognition with custom silicone masks. In 2018 IEEE 9th international conference on biometrics theory, applications and systems (BTAS) (pp. 1-7). IEEE.
[9]Xiao, J., Tang, Y., Guo, J., Yang, Y., Zhu, X., Lei, Z., & Li, S. Z. (2019, September). 3DMA: A multi-modality 3D mask face anti-spoofing database. In 2019 16th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS) (pp. 1-8). IEEE.
[10]Zhang, S., Wang, X., Liu, A., Zhao, C., Wan, J., Escalera, S., ... & Li, S. Z. (2019). A dataset and benchmark for large-scale multi-modal face anti-spoofing. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (pp. 919-928).
[11]George, A., Mostaani, Z., Geissenbuhler, D., Nikisins, O., Anjos, A., & Marcel, S. (2019). Biometric face presentation attack detection with multi-channel convolutional neural network. IEEE transactions on information forensics and security, 15, 42-55.
[12]Liu, A., Tan, Z., Wan, J., Escalera, S., Guo, G., & Li, S. Z. (2021). Casia-surf cefa: A benchmark for multi-modal cross-ethnicity face anti-spoofing. In Proceedings of the IEEE/CVF winter conference on applications of computer vision (pp. 1179-1187).
[13]Heusch, G., George, A., Geissbühler, D., Mostaani, Z., & Marcel, S. (2020). Deep models and shortwave infrared information to detect face presentation attacks. IEEE Transactions on Biometrics, Behavior, and Identity Science, 2(4), 399-409.
[14]Rostami, M., Spinoulas, L., Hussein, M., Mathai, J., & Abd-Almageed, W. (2021). Detection and continual learning of novel face presentation attacks. In Proceedings of the IEEE/CVF international conference on computer vision (pp. 14851-14860).
[15]Kong, C., Zheng, K., Wang, S., Rocha, A., & Li, H. (2022). Beyond the pixel world: A novel acoustic-based face anti-spoofing system for smartphones. IEEE Transactions on Information Forensics and Security, 17, 3238-3253.
[16]Kong, C., Zheng, K., Liu, Y., Wang, S., Rocha, A., & Li, H. (2024). M3FAS: An Accurate and Robust MultiModal Mobile Face Anti-Spoofing System. IEEE Transactions on Dependable and Secure Computing.
[17]Pan, G., Sun, L., Wu, Z., & Lao, S. (2007, October). Eyeblink-based anti-spoofing in face recognition from a generic webcamera. In 2007 IEEE 11th international conference on computer vision (pp. 1-8). IEEE.
[18]Yu, Z., Wan, J., Qin, Y., Li, X., Li, S. Z., & Zhao, G. (2020). NAS-FAS: Static-dynamic central difference network search for face anti-spoofing. IEEE transactions on pattern analysis and machine intelligence, 43(9), 3005-3023.
[19]Liu, Y., Jourabloo, A., & Liu, X. (2018). Learning deep models for face anti-spoofing: Binary or auxiliary supervision. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 389-398).
[20]Li, X., Komulainen, J., Zhao, G., Yuen, P. C., & Pietikäinen, M. (2016, December). Generalized face anti-spoofing by detecting pulse from face videos. In 2016 23rd International Conference on Pattern Recognition (ICPR) (pp. 4244-4249). IEEE.
[21]Yu, Z., Li, X., Niu, X., Shi, J., & Zhao, G. (2020). Face anti-spoofing with human material perception. In Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part VII 16 (pp. 557-575). Springer International Publishing.
[22]Yu, Z., Cai, R., Li, Z., Yang, W., Shi, J., & Kot, A. C. (2024). Benchmarking joint face spoofing and forgery detection with visual and physiological cues. IEEE Transactions on Dependable and Secure Computing.
[23]Srivatsan, K., Naseer, M., & Nandakumar, K. (2023). Flip: Cross-domain face anti-spoofing with language guidance. In Proceedings of the IEEE/CVF International Conference on Computer Vision (pp. 19685-19696).
[24]Fang, H., Liu, A., Jiang, N., Lu, Q., Zhao, G., & Wan, J. (2024, April). VL-FAS: Domain Generalization via Vision-Language Model For Face Anti-Spoofing. In ICASSP 2024-2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (pp. 4770-4774). IEEE.
[25]Liu, A., Xue, S., Gan, J., Wan, J., Liang, Y., Deng, J., ... & Lei, Z. (2024). CFPL-FAS: Class Free Prompt Learning for Generalizable Face Anti-spoofing. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (pp. 222-232).
[26]Zhang, S., Liu, A., Wan, J., Liang, Y., Guo, G., Escalera, S., ... & Li, S. Z. (2020). Casia-surf: A large-scale multi-modal benchmark for face anti-spoofing. IEEE Transactions on Biometrics, Behavior, and Identity Science, 2(2), 182-193.
[27]Parkin, A., & Grinchuk, O. (2019). Recognizing multi-modal face spoofing with face recognition networks. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition workshops (pp. 0-0).
[28]Kuang, H., Ji, R., Liu, H., Zhang, S., Sun, X., Huang, F., & Zhang, B. (2019, October). Multi-modal multi-layer fusion network with average binary center loss for face anti-spoofing. In Proceedings of the 27th ACM International Conference on Multimedia (pp. 48-56).
[29]Shen, T., Huang, Y., & Tong, Z. (2019). FaceBagNet: Bag-of-local-features model for multi-modal face anti-spoofing. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition workshops (pp. 0-0).
[30]George, A., & Marcel, S. (2021). Cross modal focal loss for rgbd face anti-spoofing. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 7882-7891).
[31]Deng, P., Ge, C., Qiao, X., Wei, H., & Sun, Y. (2023). Attention-aware dual-stream network for multimodal face anti-spoofing. IEEE Transactions on Information Forensics and Security, 18, 4258-4271.
[32]Antil, A., & Dhiman, C. (2024). MF2ShrT: multimodal feature fusion using shared layered transformer for face anti-spoofing. ACM Transactions on Multimedia Computing, Communications and Applications, 20(6), 1-21.
[33]Wang, W., Wen, F., Zheng, H., Ying, R., & Liu, P. (2022). Conv-MLP: a convolution and MLP mixed model for multimodal face anti-spoofing. IEEE Transactions on Information Forensics and Security, 17, 2284-2297.
[34]Yu, Z., Cai, R., Cui, Y., Liu, X., Hu, Y., & Kot, A. C. (2024). Rethinking vision transformer and masked autoencoder in multimodal face anti-spoofing. International Journal of Computer Vision, 1-22.
[35]Nikisins, O., George, A., & Marcel, S. (2019, June). Domain adaptation in multi-channel autoencoder based features for robust face anti-spoofing. In 2019 International Conference on Biometrics (ICB) (pp. 1-8). IEEE.
[36]Liu, W., Wei, X., Lei, T., Wang, X., Meng, H., & Nandi, A. K. (2021). Data-fusion-based two-stage cascade framework for multimodality face anti-spoofing. IEEE Transactions on cognitive and developmental systems, 14(2), 672-683.
[37]Yu, Z., Qin, Y., Li, X., Wang, Z., Zhao, C., Lei, Z., & Zhao, G. (2020). Multi-modal face anti-spoofing based on central difference networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (pp. 650-651).
[38]Zhang, P., Zou, F., Wu, Z., Dai, N., Mark, S., Fu, M., ... & Li, K. (2019). FeatherNets: Convolutional neural networks as light as feather for face anti-spoofing. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition workshops (pp. 0-0).
[39]Jiang, F., Liu, P., Shao, X., & Zhou, X. (2020). Face anti-spoofing with generated near-infrared images. Multimedia Tools and Applications, 79, 21299-21323.
[40]Liu, A., Tan, Z., Wan, J., Liang, Y., Lei, Z., Guo, G., & Li, S. Z. (2021). Face anti-spoofing via adversarial cross-modality translation. IEEE Transactions on Information Forensics and Security, 16, 2759-2772.
[41]Li, Z., Li, H., Luo, X., Hu, Y., Lam, K. Y., & Kot, A. C. (2021). Asymmetric modality translation for face presentation attack detection. IEEE Transactions on Multimedia, 25, 62-76.
[42]Mallat, K., & Dugelay, J. L. (2021). Indirect synthetic attack on thermal face biometric systems via visible-to-thermal spectrum conversion. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (pp. 1435-1443).
[43]Lin, X., Wang, S., Cai, R., Liu, Y., Fu, Y., Tang, W., ... & Kot, A. (2024). Suppress and Rebalance: Towards Generalized Multi-Modal Face Anti-Spoofing. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (pp. 211-221).
[44]Jia, Y., Zhang, J., Shan, S., & Chen, X. (2020). Single-side domain generalization for face anti-spoofing. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (pp. 8484-8493).
[45]Yu, Z., Liu, A., Zhao, C., Cheng, K. H., Cheng, X., & Zhao, G. (2023). Flexible-modal face anti-spoofing: A benchmark. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (pp. 6346-6351).
[46]Liu, A., Tan, Z., Yu, Z., Zhao, C., Wan, J., Liang, Y., ... & Guo, G. (2023). Fm-vit: Flexible modal vision transformers for face anti-spoofing. IEEE Transactions on Information Forensics and Security, 18, 4775-4786.
供稿:余梓彤,大湾区大学;陈昌盛,深圳大学