当前,大模型受到了工业界和学术界的青睐。如何更好的利用这些预训练的大模型就成为了一个重要研究方向。现在的预训练的模型虽然取得了很好的效果,但是在应用的场景的效果往往需要进一步微调才能得到令人满意的效果。然而在微调的过程中,往往产生在原来领域上的效果下降,这个现象被称为灾难性遗忘。如何避免灾难性遗忘成为一个重要的研究问题。 在关于持续学习的工作中,图像识别的占大多数,而以语音识别背景进行的研究则要少得多[1][2]。现存的研究大多对所有的任务在一个共同的矢量空间中直接对模型的隐藏层进行梯度更新。近期在自然语言处理领域研究提出了一个很有潜力的替代方案,在训练新的任务的时候,将新的样本的损失梯度投射到和过去样本的梯度子空间的垂直方向,在过度参数量的神经网络中进行学习,就可以避免与过去损失函数的冲突,进而规避了灾难性遗忘的问题。 近期,西工大音频语音与语言处理研究组(ASLP@NPU)的论文“Towards Rehearsal-Free Multilingual ASR: A LoRA-based Case Study on Whisper”被语音领域旗舰会议Interspeecch 2024接收。该论文提出了在基于LoRA参数的无需复习“Rehersal free”领域迁移方案,引入了一个可学习的秩系数来提升训练效率。在一个用中文微调的Whisper模型上对维吾尔语和藏语进行迁移实验,以更小的参数量获得了更好的性能。本文将对该文章进行简要的解读。
论文标题:Towards Rehearsal-Free Multilingual ASR: A LoRA-based Case Study on Whisper作者列表:徐天翼,黄凯勋,郭鹏程,周瑜,黄龙涛,薛晖,谢磊
[1] S. Vander Eeckt and H. Van Hamme, “Using adapters to overcome catastrophic forgetting in end-to-end automatic speech recognition,” in ICASSP 2023-2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2023, pp. 1–5.[2] M. Yang, I. R. Lane, and S. Watanabe, “Online continual learning of end-to-end speech recognition models,” in Interspeech 2022, 23rd Annual Conference of the International Speech Communication Association, Incheon, Korea, 18-22 September 2022, H. Ko and J. H. L. Hansen, Eds. ISCA, 2022, pp. 2668–2672.[3] X. Wang, T. Chen, Q. Ge, H. Xia, R. Bao, R. Zheng, Q. Zhang, T. Gui, and X. Huang, “Orthogonal subspace learning for language model continual learning,” in Findings of the Association for Computational Linguistics: EMNLP 2023, Singapore, December 6-10, 2023, H. Bouamor, J. Pino, and K. Bali, Eds. Association for Computational Linguistics, 2023, pp. 10 658–10 671.[4] B. Zhang, H. Lv, P. Guo, Q. Shao, C. Yang, L. Xie, X. Xu, H. Bu, X. Chen, C. Zeng et al., “Wenetspeech: A 10000+ hours multidomain mandarin corpus for speech recognition,” in ICASSP 2022-2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2022, pp. 6182–6186.[5] H. Bu, J. Du, X. Na, B. Wu, and H. Zheng, “Aishell-1: An opensource mandarin speech corpus and a speech recognition baseline,” in 2017 20th conference of the oriental chapter of the international coordinating committee on speech databases and speech I/O systems and assessment (O-COCOSDA). IEEE, 2017, pp. 1–5.[6] R. Ardila, M. Branson, K. Davis, M. Kohler, J. Meyer, M. Henretty, R. Morais, L. Saunders, F. M. Tyers, and G. Weber, “Common voice: A massively-multilingual speech corpus,” in Proceedings of The 12th Language Resources and Evaluation Conference, LREC 2020, Marseille, France, May 11-16, 2020, pp. 4218–4222.[7] A. Roze, S. Yin, Z. Zhang, D. Wang, and A. Hamdulla, “Thugy20: A free uyghur speech database,” in NCMMSC’15, 2015.[8] Y. Zhao, X. Xu, J. Yue, W. Song, X. Li, L. Wu, and Q. Ji, “An open speech resource for tibetan multi-dialect and multitask recognition,” International Journal of Computational Science and Engineering, vol. 22, no. 2-3, pp. 297–304, 2020.[9] J. N. Senyan Li, Guanyu Li, “XBMU-AMDO31:An open source of Amdo Tibetan speech database and speech recognition baseline system,” in National Conference on Man-Machine Speech Communication,NCMMSC2022, 2022.[10] U. Hermjakob, J. May, and K. Knight, “Out-of-the-box universal Romanization tool uroman,” in Proceedings of ACL 2018, System Demonstrations, F. Liu and T. Solorio, Eds. Melbourne, Australia: Association for Computational Linguistics, Jul. 2018, pp. 13–18.[11] R. Aljundi, F. Babiloni, M. Elhoseiny, M. Rohrbach, and T. Tuytelaars, “Memory aware synapses: Learning what (not) to forget,” in Proceedings of the European conference on computer vision (ECCV), 2018, pp. 139–154.[12] S. Vander Eeckt and H. Van Hamme, “Using adapters to overcome catastrophic forgetting in end-to-end automatic speech recognition,” in ICASSP 2023-2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2023, pp. 1–5.[13] W. Liu, Y. Qin, Z. Peng, and T. Lee, “Sparsely shared lora on Whisper for child speech recognition,” arXiv preprint arXiv:2309.11756, 2023.[14] M. Yang, I. R. Lane, and S. Watanabe, “Online continual learning of end-to-end speech recognition models,” in Interspeech 2022, 23rd Annual Conference of the International Speech Communication Association, Incheon, Korea, 18-22 September 2022, H. Ko and J. H. L. Hansen, Eds. ISCA, 2022, pp. 2668–2672.