语音/音频处理学术速递[10.25]
文摘
2024-10-25 18:00
北京
今日论文合集:cs.SD语音11篇,eess.AS音频处理12篇。本文经arXiv每日学术速递授权转载
【1】We Augmented Whisper With kNN and You Won't Believe What Came Next标题:我们用kNN增强耳语,你不会相信接下来会发生什么
链接:https://arxiv.org/abs/2410.18850
作者:Maya K. Nachesa, Vlad Niculae
备注:6 pages incl. appendix, 2 figures, 6 tables
【2】 Wavetable Synthesis Using CVAE for Timbre Control Based on Semantic Label
标题:基于语义标签的基于CVAE的音色控制波表合成
链接:https://arxiv.org/abs/2410.18628
作者:Tsugumasa Yutani, Yuya Yamamoto, Shuyo Nakatani, Hiroko Terasawa
备注:6 pages, 4 figures, Accepted at APSIPA ASC 2024
【3】 STTATTS: Unified Speech-To-Text And Text-To-Speech Model
标题:STTATTS:统一的语音到文本和文本到语音模型
链接:https://arxiv.org/abs/2410.18607
作者:Hawau Olamide Toyin, Hao Li, Hanan Aldarmaki
备注:11 pages, 4 Figures, EMNLP 2024 Findings
【4】 A contrastive-learning approach for auditory attention detection
标题:听觉注意力检测的对比学习方法
链接:https://arxiv.org/abs/2410.18395
作者:Seyed Ali Alavi Bajestan, Mark Pitt, Donald S. Williamson
【5】 A Unimodal Speaker-Level Membership Inference Detector for Contrastive Pretraining
标题:用于对比预训练的单模式说话者级隶属度推断检测器
链接:https://arxiv.org/abs/2410.18371
作者:Ruoxi Cheng, Yizhong Ding, Shuirong Cao, Shitong Shao, Zhiqiang Wang
【6】 Contextual Biasing to Improve Domain-specific Custom Vocabulary Audio Transcription without Explicit Fine-Tuning of Whisper Model
标题:上下文偏置以改善特定领域的自定义词汇音频转录,而无需对Whisper模型进行显式微调
链接:https://arxiv.org/abs/2410.18363
作者:Vishakha Lall, Yisi Liu
【7】 Unified Microphone Conversion: Many-to-Many Device Mapping via Feature-wise Linear Modulation
标题:统一麦克风转换:通过逐流线性调制实现多对多设备映射
链接:https://arxiv.org/abs/2410.18322
作者:Myeonghoon Ryu, Hongseok Oh, Suji Lee, Han Park
备注:Currently under review for ICASSP 2025
【8】 Robust and Explainable Depression Identification from Speech Using Vowel-Based Ensemble Learning Approaches
标题:使用基于元音的合奏学习方法从语音中识别稳健且可解释的抑郁症
链接:https://arxiv.org/abs/2410.18298
作者:Kexin Feng, Theodora Chaspari
备注:accepted at the IEEE-EMBS International Conference on Biomedical and Health Informatics (BHI 2024)
【9】 Optimizing the role of human evaluation in LLM-based spoken document summarization systems
标题:优化基于LLM的口语文档摘要系统中人工评估的作用
链接:https://arxiv.org/abs/2410.18218
作者:Margaret Kroll, Kelsey Kraus
备注:None
【10】 Melody Construction for Persian lyrics using LSTM recurrent neural networks
标题:使用LSTM循环神经网络构建波斯语歌词的旋律
链接:https://arxiv.org/abs/2410.18203
作者:Farshad Jafari, Farzad Didehvar, Amin Gheibi
【11】 Music102: An $D_{12}$-equivariant transformer for chord progression accompaniment
标题:Music 102:用于和弦进行伴奏的$D_{12}$-等变Transformer
链接:https://arxiv.org/abs/2410.18151
作者:Weiliang Luo
备注:10 pages, 3 figures
【1】 A Survey on Speech Large Language Models
标题:语音大语言模型研究
链接:https://arxiv.org/abs/2410.18908
作者:Jing Peng, Yucheng Wang, Yu Xi, Xv Li, Kai Yu
【2】 We Augmented Whisper With kNN and You Won't Believe What Came Next
标题:我们用kNN增强耳语,你不会相信接下来会发生什么
链接:https://arxiv.org/abs/2410.18850
作者:Maya K. Nachesa, Vlad Niculae
备注:6 pages incl. appendix, 2 figures, 6 tables
【3】 Wavetable Synthesis Using CVAE for Timbre Control Based on Semantic Label
标题:基于语义标签的基于CVAE的音色控制波表合成
链接:https://arxiv.org/abs/2410.18628
作者:Tsugumasa Yutani, Yuya Yamamoto, Shuyo Nakatani, Hiroko Terasawa
备注:6 pages, 4 figures, Accepted at APSIPA ASC 2024
【4】 STTATTS: Unified Speech-To-Text And Text-To-Speech Model
标题:STTATTS:统一的语音到文本和文本到语音模型
链接:https://arxiv.org/abs/2410.18607
作者:Hawau Olamide Toyin, Hao Li, Hanan Aldarmaki
备注:11 pages, 4 Figures, EMNLP 2024 Findings
【5】 A contrastive-learning approach for auditory attention detection
标题:听觉注意力检测的对比学习方法
链接:https://arxiv.org/abs/2410.18395
作者:Seyed Ali Alavi Bajestan, Mark Pitt, Donald S. Williamson
【6】 A Unimodal Speaker-Level Membership Inference Detector for Contrastive Pretraining
标题:用于对比预训练的单模式说话者级隶属度推断检测器
链接:https://arxiv.org/abs/2410.18371
作者:Ruoxi Cheng, Yizhong Ding, Shuirong Cao, Shitong Shao, Zhiqiang Wang
【7】 Contextual Biasing to Improve Domain-specific Custom Vocabulary Audio Transcription without Explicit Fine-Tuning of Whisper Model
标题:上下文偏置以改善特定领域的自定义词汇音频转录,而无需对Whisper模型进行显式微调
链接:https://arxiv.org/abs/2410.18363
作者:Vishakha Lall, Yisi Liu
【8】 Unified Microphone Conversion: Many-to-Many Device Mapping via Feature-wise Linear Modulation
标题:统一麦克风转换:通过逐流线性调制实现多对多设备映射
链接:https://arxiv.org/abs/2410.18322
作者:Myeonghoon Ryu, Hongseok Oh, Suji Lee, Han Park
备注:Currently under review for ICASSP 2025
【9】 Robust and Explainable Depression Identification from Speech Using Vowel-Based Ensemble Learning Approaches
标题:使用基于元音的合奏学习方法从语音中识别稳健且可解释的抑郁症
链接:https://arxiv.org/abs/2410.18298
作者:Kexin Feng, Theodora Chaspari
备注:accepted at the IEEE-EMBS International Conference on Biomedical and Health Informatics (BHI 2024)
【10】 Optimizing the role of human evaluation in LLM-based spoken document summarization systems
标题:优化基于LLM的口语文档摘要系统中人工评估的作用
链接:https://arxiv.org/abs/2410.18218
作者:Margaret Kroll, Kelsey Kraus
备注:None
【11】 Melody Construction for Persian lyrics using LSTM recurrent neural networks
标题:使用LSTM循环神经网络构建波斯语歌词的旋律
链接:https://arxiv.org/abs/2410.18203
作者:Farshad Jafari, Farzad Didehvar, Amin Gheibi
【12】 Music102: An $D_{12}$-equivariant transformer for chord progression accompaniment
标题:Music 102:用于和弦进行伴奏的$D_{12}$-等变Transformer
链接:https://arxiv.org/abs/2410.18151
作者:Weiliang Luo
备注:10 pages, 3 figures