信管·讲座 | Building Generalizable Sequential Decision-Making...

教育   2024-11-25 16:46   上海  

时间

TIME

2024年12月3日(周二)14:00-15:00

地点

VENUE

信管学院308会议室

主讲人

SPEAKER

 

Muning Wen(温睦宁) is currently a third-year Ph.D. student at Shanghai Jiao Tong University, under the supervision of Professor Weinan Zhang. He possesses extensive theoretical and practical experience in reinforcement learning, multi-agent systems, and LLM agents. In his recent academic endeavors, Muning has been dedicated to developing advanced RL/MARL algorithms aimed at enhancing the sequential decision-making capabilities of LLM agents in dynamic environments. Additionally, he has been deeply involved in the application of these algorithms in fields such as data science, mathematics, and embodied intelligence. In the past three years, Muning has published over ten papers in top-tier academic conferences, including NeurIPS, ICML, and ICLR. Since 2023, he has also been serving as a reviewer for these prestigious conferences.

个人主页

PERSONAL HOMEPAGE

https://scholar.google.com/citations?user=Zt1WFtQAAAAJ


主题

TITLE

Building Generalizable Sequential Decision-Making Systems: Multi-Agent Reinforcement Learning in the Era of LLMs


摘要

ABSTRACT

In this talk, the speaker will discuss the feasibility of building a sequence decision-making system with strong generalization abilities, drawing from his previous research experience in the fields of multi-agent reinforcement learning and LLM agents. The speaker will first introduce the Multi-Agent Advantage Decomposition Theorem and its application in multi-agent reinforcement learning. This approach allows for transforming the MARL problem into a sequence modeling problem, which can then be optimized in conjunction with sequence models like Transformers. Additionally, the speaker will present their latest exploration to improve LLM agents' performance, including a framework for LLM agent reinforcement learning—Action Decomposition-based Bellman Update and Policy Optimization (BAD and POAD), which aims to bridge the theoretical gaps between reinforcement learning and language model optimization and improve learning efficiency. Lastly, the speaker will explore the alignment between multi-agent sequence modeling methods and the current generative paradigm of language agents, discussing the potential and challenges of applying multi-agent reinforcement learning for systems involving multiple language agents.

欢迎 关注

上财信息
上海财经大学信息管理与工程学院官方新媒体平台,用于学院各类信息发布,欢迎关注!
 最新文章