Xmart•学生论坛丨刘濠赫:Latent Diffusion Model as a Versatile Coarse-to-Fine Audio Decoder
形式:线上
Latent diffusion models (LDMs) have demonstrated exceptional generative capabilities across various modalities. This talk will explore LDMs as a coarse-to-fine audio decoder, offering a versatile framework for audio tasks. We will begin by covering the fundamentals of diffusion models and their control over forward and backward processes. Next, we will look into specific applications, including the AudioLDM series for text-to-audio generation, models for audio quality enhancement, and neural audio codecs. The talk will highlight common design principles across these models and include interactive demos. We will conclude by discussing the strengths and limitations of LDMs in audio decoding and potential future research directions.
刘濠赫
刘濠赫,英国萨里大学视觉、语音与信号处理中心(CVSSP)的高年级博士生。他的研究方向包括音频质量增强、生成、源分离和识别等领域。他在 TPAMI/TASLP/JSTSP/ICML/AAAI/ICASSP/INTERSPEECH 等顶级学术期刊和会议上发表多篇论文。论文总引用量超过1800次,他的GitHub开源项目广受关注,共收获超过8500颗星标。代表作包括AudioLDM、SemantiCodec、NaturalSpeech等。此外,他还曾在Meta、微软和字节跳动等公司担任实习研究员。
②
腾讯会议参加
会议号:409-237-723