大模型生成内容溯源技术
杨天韵 石宇辉 盛强 曹娟
一、引言
二、生成文本溯源
2.1 基于预训练-微调的方法
图1 T5-Sentinel模型结构
2.2 基于风格特征的方法
2.3 基于重写的方法
图2 DNA-GPT流程示意
(注:图示为检测任务,完成溯源任务需要在各候大模型上重复该过程)
2.4 基于概率特征的方法
图3 Sniffer流程示意
图4 POGER流程示意
三、生成图像溯源
3.1 基于模型水印的方法
图5 基于模型水印的生成图像溯源
3.2 基于模型反演的方法
图6 基于模型反演的生成图像溯源
3.3 基于模型指纹的方法
图7 基于模型指纹的生成图像溯源
四、问题与挑战
1.模型溯源过程的可解释性问题。如何清晰地解释模型指纹是如何形成以及它们如何帮助确定生成内容的来源。
2.开放世界模型溯源的泛化性问题。如何确保溯源方法能在面对未见的新兴模型时仍保持高效和准确。
3.人机混合场景下的溯源问题。如何定义人机协作内容的作者归属以及如何溯源此类内容。
4.溯源方法的攻击鲁棒性问题。如何确保溯源方法在应对重述攻击等作者混淆攻击时的溯源准确性。
五、结语
[1] Zellers R, Holtzman A, Rashkin H, et al. Defending against neural fake news[J]. Advances in neural information processing systems, 2019, 32.
[2] Gehrmann S, Strobelt H, Rush A M. GLTR: Statistical Detection and Visualization of Generated Text[C]//Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics: System Demonstrations. 2019: 111-116.
[3] Ippolito D, Duckworth D, Callison-Burch C, et al. Automatic Detection of Generated Text is Easiest when Humans are Fooled[C]//Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. 2020: 1808-1822.
[4] Uchendu A, Le T, Shu K, et al. Authorship attribution for neural text generation[C]//Proceedings of the 2020 conference on empirical methods in natural language processing (EMNLP). 2020: 8384-8395.
[5] Chen Y, Kang H, Zhai V, et al. Token Prediction as Implicit Classification to Identify LLM-Generated Text[C]//Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing. 2023: 13112-13120.
[6] Beigi A, Tan Z, Mudiam N, et al. Model Attribution in Machine-Generated Disinformation: A Domain Generalization Approach with Supervised Contrastive Learning[J]. arXiv preprint arXiv:2407.21264, 2024.
[7] Kumarage T, Liu H. Neural Authorship Attribution: Stylometric Analysis on Large Language Models[C]//2023 International Conference on Cyber-Enabled Distributed Computing and Knowledge Discovery (CyberC). IEEE, 2023: 51-54.
[8] Soto R A R, Koch K, Khan A, et al. Few-Shot Detection of Machine-Generated Text using Style Representations[C]//The Twelfth International Conference on Learning Representations.
[9] Yang X, Cheng W, Wu Y, et al. DNA-GPT: Divergent N-Gram Analysis for Training-Free Detection of GPT-Generated Text[C]//The Twelfth International Conference on Learning Representations.
[10] Li L, Wang P, Ren K, et al. Origin tracing and detecting of llms[J]. arXiv preprint arXiv:2304.14072, 2023.
[11] Wang P, Li L, Ren K, et al. SeqXGPT: Sentence-Level AI-Generated Text Detection[C]//Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing. 2023: 1144-1156.
[12] Venkatraman S, Uchendu A, Lee D. GPT-who: An Information Density-based Machine-Generated Text Detector[C]//Findings of the Association for Computational Linguistics: NAACL 2024. 2024: 103-115.
[13] Wu K, Pang L, Shen H, et al. LLMDet: A Third Party Large Language Models Generated Text Detection Tool[C]//Findings of the Association for Computational Linguistics: EMNLP 2023. 2023: 2113-2133.
[14] Shi Y, Sheng Q, Cao J, et al. Ten Words Only Still Help: Improving Black-Box AI-Generated Text Detection via Proxy-Guided Efficient Re-Sampling[C]// Proceedings of the Thirty-Third International Joint Conference on Artificial Intelligence. 2024: 494-502.
[15] Yu N, Skripniuk V, Abdelnabi S, et al. Artificial gan fingerprints: Rooting deepfake attribution in training data [C]//ICCV. 2021.
[16] Zhao Y, Pang T, Du C, et al. A recipe for watermarking diffusion models [J]. arXiv preprint arXiv:2303.10137, 2023.
[17] Nie G, Kim C, Yang Y, et al. Attributing image generative models using latent fingerprints [C]// ICML. 2023.
[18] Yu N, Skripniuk V, Chen D, et al. Responsible disclosure of generative models using scalable fingerprinting [C]//ICLR. 2022.
[19] Albright M, McCloskey S, Honeywell A. Source generator attribution via inversion. [C]//CVPR Workshops: volume 8. 2019.
[20] Zhang B, Zhou J P, Shumailov I, et al. On attribution of deepfakes [J]. arXiv preprint arXiv:2008.09194, 2020.
[21] Hirofumi S, Fukuchi K, Akimoto Y, et al. Did you use my gan to generate fake? post-hoc attribution of gan generated images via latent recovery [C]//2022 International Joint Conference on Neural Networks (IJCNN). IEEE, 2022: 1-8.
[22] Laszkiewicz M, Ricker J, Lederer J, et al. Single-model attribution via final-layer inversion [C]// ICML2024
[23] Marra F, Gragnaniello D, Verdoliva L, et al. Do gans leave artificial fingerprints? [C]//MIPR. 2019.
[24] Yu N, Davis L S, Fritz M. Attributing fake images to gans: Learning and analyzing gan fingerprints [C]//ICCV. 2019.
[25] Frank J, Eisenhofer T, Schönherr L, et al. Leveraging frequency analysis for deep fake image recognition [C]//ICML. 2020.
[26] Yang T, Huang Z, Cao J, et al. Deepfake network architecture attribution [C]//AAAI. 2022.
[27] Yang T, Wang D, Tang F, et al. Progressive open space expansion for open-set model attribution [C]//CVPR. 2023
[28] Abady L, Wang J, Tondi B, et al. A siamese-based verification system for open-set architecture attribution of synthetic images[J]. Pattern Recognition Letters, 2024, 180: 75-81.
[29] Yang T, Cao J, Wang D, et al. Model Synthesis for Zero-Shot Model Attribution [J]. arXiv preprint arXiv: 2307.15977v2, 2024.
供稿:杨天韵 石宇辉