2024-11-28 论文分享 | 智能体最新进展

文摘   2024-11-28 10:06   安徽  

点击蓝字 关注我们

论文分享 | 智能体相关研究进展


  1. Safe Multi-Agent Reinforcement Learning with Convergence to Generalized Nash Equilibrium
  2. ShowUI: One Vision-Language-Action Model for GUI Visual Agent
  3. Why the Agent Made that Decision: Explaining Deep Reinforcement Learning with Vision Masks
  4. Agent-Based Modelling Meets Generative AI in Social Network Simulations
  5. Towards Next-Generation Medical Agent: How o1 is Reshaping Decision-Making in Medical Scenarios

1.Safe Multi-Agent Reinforcement Learning with Convergence to Generalized Nash Equilibrium

Authors: Zeyang Li, Navid Azizan



Multi-agent reinforcement learning (MARL) has achieved notable success in cooperative tasks, demonstrating impressive performance and scalability. However, deploying MARL agents in real-world applications presents critical safety challenges. Current safe MARL algorithms are largely based on the constrained Markov decision process (CMDP) framework, which enforces constraints only on discounted cumulative costs and lacks an all-time safety assurance. Moreover, these methods often overlook the feasibility issue——where the system will inevitably violate state constraints within certain regions of the constraint set——resulting in either suboptimal performance or increased constraint violations. To address these challenges, we propose a novel theoretical framework for safe MARL with state-wise constraints, where safety requirements are enforced at every state the agents visit. To resolve the feasibility issue, we leverage a control-theoretic notion of the feasible region, the controlled invariant set (CIS), characterized by the safety value function. We develop a multi-agent method for identifying CISs, ensuring convergence to a Nash equilibrium on the safety value function. By incorporating CIS identification into the learning process, we introduce a multi-agent dual policy iteration algorithm that guarantees convergence to a generalized Nash equilibrium in state-wise constrained cooperative Markov games, achieving an optimal balance between feasibility and performance. Furthermore, for practical deployment in complex high-dimensional systems, we propose Multi-Agent Dual Actor-Critic (MADAC), a safe MARL algorithm that approximates the proposed iteration scheme within the deep RL paradigm. Empirical evaluations on safe MARL benchmarks demonstrate that MADAC consistently outperforms existing methods, delivering much higher rewards while reducing constraint violations.


这篇关于安全多智能体强化学习(MARL)的论文提出了一个全新的理论框架来确保多智能体安全强化学习(MARL),该框架能够约束状态,并促使行为者达到纳什均衡。论文提出了一种名为Multi-Agent Dual Policy Iteration算法(MADAC),并通过实证研究证明了其优越性,表明其性能优于现有方法。



2.ShowUI: One Vision-Language-Action Model for GUI Visual Agent

Authors: Kevin Qinghong Lin, Linjie Li, Difei Gao, Zhengyuan Yang, Shiwei Wu, Zechen Bai, Weixian Lei, Lijuan Wang, Mike Zheng Shou



Building Graphical User Interface (GUI) assistants holds significant promise for enhancing human workflow productivity. While most agents are language-based, rely ing on closed-source API with text-rich meta-information (e.g., HTML or accessibility tree), they show limitations in perceiving UI visuals as humans do, highlighting the need for GUI visual agents. In this work, we develop a vision language-action model in digital world, namely ShowUI, which features the following innovations: (i) UI-Guided Vi sual Token Selection to reduce computational costs by for mulating screenshots as an UI connected graph, adaptively identifying their redundant relationship and serve as the criteria for token selection during self-attention blocks; (ii) Interleaved Vision-Language-Action Streaming that flex ibly unifies diverse needs within GUI tasks, enabling ef fective management of visual-action history in navigation or pairing multi-turn query-action sequences per screen shot to enhance training efficiency; (iii) Small-scale High quality GUI Instruction-following Datasets by careful data curation and employing a resampling strategy to address significant data type imbalances. With above components, ShowUI, a lightweight 2B model using 256K data, achieves a strong 75.1% accuracy in zero-shot screenshot grounding. Its UI-guided token selection further reduces 33% of redun dant visual tokens during training and speeds up the perfor mance by 1.4×. Navigation experiments across web [12], mobile [36], and online [40] environments further under score the effectiveness and potential of our model in ad vancing GUI visual agents. The models are available at https://github.com/showlab/ShowUI.





3.Why the Agent Made that Decision: Explaining Deep Reinforcement Learning with Vision Masks

Authors: Rui Zuo, Zifan Wang, Simon Khan, Garrett Ethan Katz, Qinru Qiu



Due to the inherent lack of transparency in deep neural networks, it is challenging for deep reinforcement learning (DRL) agents to gain trust and acceptance from users, especially in safety-critical applications such as medical diagnosis and military operations. Existing methods for explaining an agent's decision either require retraining the agent using models that support explanation generation or rely on perturbation-based techniques to reveal the significance of different input features in the decision-making process. However, retraining the agent may compromise its integrity and performance, while perturbation-based methods have limited performance and lack knowledge accumulation or learning capabilities. Moreover, since each perturbation is performed independently, the joint state of the perturbed inputs may not be physically meaningful. To address these challenges, we introduce VisionMask, a standalone explanation model trained end-to-end to identify the most critical regions in the agent's visual input that can explain its actions. VisionMask is trained in a self-supervised manner without relying on human-generated labels. Importantly, its training does not alter the agent model, hence preserving the agent's performance and integrity. We evaluate VisionMask on Super Mario Bros (SMB) and three Atari games. Compared to existing methods, VisionMask achieves a 14.9% higher insertion accuracy and a 30.08% higher F1-Score in reproducing original actions from the selected visual explanations. We also present examples illustrating how VisionMask can be used for counterfactual analysis.


《VisionMask:一种自监督解释模型》是关于深度强化学习代理中视觉输入的关键区域识别的研究。该研究提出了一种新方法——VisionMask,旨在为DRL代理提供更透明和可信的决策过程,而无需改变代理模型。通过对比实验,研究人员发现VisionMask在Super Mario Bros和Atari游戏中表现优于现有方法,提高了对目标任务的理解和执行能力。这项工作强调了深度强化学习中的一个关键问题——提高代理的可解释性,并且展示了自监督学习在解决这一问题上的潜在价值。总之,VisionMask为探索深度强化学习的未来方向提供了新的视角,同时也展现了其在实际应用中的潜力。

4.Agent-Based Modelling Meets Generative AI in Social Network Simulations

Authors: Antonino Ferraro, Antonio Galli, Valerio La Gatta, Marco Postiglione, Gian Marco Orlando, Diego Russo, Giuseppe Riccio, Antonio Romano, Vincenzo Moscato



Agent-Based Modelling (ABM) has emerged as an essential tool for simulating social networks, encompassing diverse phenomena such as information dissemination, influence dynamics, and community formation. However, manually configuring varied agent interactions and information flow dynamics poses challenges, often resulting in oversim plified models that lack real-world generalizability. Integrating modern Large Language Models (LLMs) with ABM presents a promising av enue to address these challenges and enhance simulation fidelity, lever aging LLMs’ human-like capabilities in sensing, reasoning, and behavior. In this paper, we propose a novel framework utilizing LLM-empowered agents to simulate social network users based on their interests and per sonality traits. The framework allows for customizable agent interac tions resembling various social network platforms, including mechanisms for content resharing and personalized recommendations. We validate our framework using a comprehensive Twitter dataset from the 2020 US election, demonstrating that LLM-agents accurately replicate real users’ behaviors, including linguistic patterns and political inclinations. These agents form homogeneous ideological clusters and retain the main themes of their community. Notably, preference-based recommendations significantly influence agent behavior, promoting increased engagement, network homophily and the formation of echo chambers. Overall, our f indings underscore the potential of LLM-agents in advancing social me dia simulations and unraveling intricate online dynamics.


这篇论文提出了一个框架,利用大型语言模型(Large Language Models, LLMs)与基于代理的建模(Agent-Based Modeling, ABM)相结合,以增强社会网络模拟的准确性。该研究的重点在于根据用户的兴趣和性格特征来模拟用户的行为,并通过Twitter数据集展示了这个框架的能力。研究表明,使用LLM-代理可以复制真实用户的交互行为,包括社区形成和参与策略等。


5.Towards Next-Generation Medical Agent: How o1 is Reshaping Decision-Making in Medical Scenarios

Authors: Shaochen Xu, Yifan Zhou, Zhengliang Liu, Zihao Wu, Tianyang Zhong, Huaqin Zhao, Yiwei Li, Hanqi Jiang, Yi Pan, Junhao Chen, Jin Lu, Wei Zhang, Tuo Zhang, Lu Zhang, Dajiang Zhu, Xiang Li, Wei Liu, Quanzheng Li, Andrea Sikora, Xiaoming Zhai, Zhen Xiang, Tianming Liu



Artificial Intelligence (AI) has become essential in modern healthcare, with large language models (LLMs) offering promising advances in clinical decision-making. Traditional model-based approaches, including those leveraging in-context demonstrations and those with specialized medical fine-tuning, have demonstrated strong performance in medical language processing but struggle with real-time adaptability, multi-step reasoning, and handling complex medical tasks. Agent-based AI systems address these limitations by incorporating reasoning traces, tool selection based on context, knowledge retrieval, and both short- and long-term memory. These additional features enable the medical AI agent to handle complex medical scenarios where decision-making should be built on real-time interaction with the environment. Therefore, unlike conventional model-based approaches that treat medical queries as isolated questions, medical AI agents approach them as complex tasks and behave more like human doctors. In this paper, we study the choice of the backbone LLM for medical AI agents, which is the foundation for the agent’s overall reasoning and action generation. In particular, we consider the emergent o1 model and examine its impact on agents' reasoning, tool-use adaptability, and real-time information retrieval across diverse clinical scenarios, including high-stakes settings such as intensive care units (ICUs). Our findings demonstrate o1’s ability to enhance diagnostic accuracy and consistency, paving the way for smarter, more responsive AI tools that support better patient outcomes and decision-making efficacy in clinical practice.




  • 可以提出推文中论文简评的不足!
  • 可以分享最近更值得推荐的论文并给出理由!



2024-11-27 论文分享 | 多模态大模型最新进展
2024-11-26 论文分享 | 大语言模型最新进展
2024-11-25 论文分享 | 多模态大模型最新进展
2024-11-22 论文分享 | 智能体最新进展
