2024-12-26 论文分享 | 智能体最新进展

文摘   2024-12-26 10:34   安徽  

点击蓝字 关注我们

论文分享 | 智能体相关研究进展

我们从2024-12-20到2024-12-26的48篇文章中精选出5篇优秀的工作分享给读者。

  1. Defining and Detecting the Defects of the Large Language Model-based Autonomous Agents
  2. LegalAgentBench: Evaluating LLM Agents in Legal Domain
  3. Tacit Learning with Adaptive Information Selection for Cooperative Multi-Agent Reinforcement Learning
  4. KG4Diagnosis: A Hierarchical Multi-Agent LLM Framework with Knowledge Graph Enhancement for Medical Diagnosis
  5. Multi-Agent Path Finding in Continuous Spaces with Projected Diffusion Models

1.Defining and Detecting the Defects of the Large Language Model-based Autonomous Agents

Authors: Kaiwen Ning, Jiachi Chen, Jingwen Zhang, Wei Lia, Zexu Wang, Yuming Feng, Weizhe Zhang, Zibin Zheng

https://arxiv.org/abs/2412.18371

论文摘要

Artificial intelligence (AI) agents are systems capable of perceiving their environment, autonomously planning and executing tasks. Recent advancements in Large Language Models (LLMs) have introduced a transformative paradigm for AI agents, enabling them to interact with external resources and tools through prompt techniques. This advancement has significantly extended the capabilities of LLMs, positioning LLM-based AI Agents as an important research area. In such agents, the workflow integrates developer-written code, which manages framework construction and logic control, with LLM-generated natural language that enhances dynamic decision-making and interaction. However, discrepancies between developer-implemented logic and the dynamically generated content of LLMs in terms of behavior and expected outcomes can lead to defects, such as tool invocation failures and task execution errors. These issues introduce specific risks, leading to various defects in LLM-based AI Agents, including service interruptions and incorrect output. Despite the importance of these issues, there is a lack of systematic work that focuses on analyzing LLM-based AI Agents to uncover defects in their code. To address this gap, we present the first study focused on identifying and detecting defects in LLM Agents. We collected and analyzed  relevant posts from StackOverflow to define and classify eight types of agent defects. For each defect type, we provided detailed descriptions and illustrated with an example. Then, we designed a static analysis tool, named Agentable, to detect the defined defects. Agentable leverages Code Property Graphs (CPGs) and LLMs to analyze Agent workflows by efficiently identifying specific code patterns and analyzing natural language descriptions. To evaluate Agentable, we constructed two datasets: AgentSet, which consists of 84 real-world Agent projects, and AgentTest, which contains 78 Agent projects specifically designed to include various types of defects. Our results show that Agentable achieved an overall accuracy of  and a recall rate of . Furthermore, our analysis reveals the  defects of the Agent projects in the real-world dataset, highlighting the prevalence of these defects.

论文简评

这篇关于LLM(大型语言模型)基础代理代码缺陷的研究论文提出了一个系统化的缺陷类型分类方法,基于对StackOverflow帖子的分析,共有六千八百五十四篇。Agentable是一个静态分析工具,利用Code Property Graphs和LLMs来检测这些缺陷,并实现了高达88.79%的准确率和91.03%的召回率。此外,该研究引入了Agentable这一缺陷检测工具,这是本文的一大亮点。综上所述,这篇论文从多个方面进行了深入研究,包括填补了对LLM基础代理代码缺陷理解的空白、通过实证分析增强了发现结果的可靠性以及为缺陷检测提供了一个新的解决方案。整体来看,这篇文章不仅提供了有价值的研究成果,也展示了作者对领域内问题的深刻理解和创新探索精神。

2.LegalAgentBench: Evaluating LLM Agents in Legal Domain

Authors: Haitao Li, Junjie Chen, Jingli Yang, Qingyao Ai, Wei Jia, Youfeng Liu, Kai Lin, Yueyue Wu, Guozhi Yuan, Yiran Hu, Wuyue Wang, Yiqun Liu, Minlie Huang

https://arxiv.org/abs/2412.17259

论文摘要

With the increasing intelligence and autonomy of LLM agents, their potential applications in the legal domain are becoming越来越明显。然而,现有的通用基准无法完全捕捉现实世界司法认知与决策中的复杂性和微妙之处。因此,我们提出了 LegalAgentBench,一个专门设计用于评估中文法律领域LLM代理的综合性基准。LegalAgentBench包括来自真实世界法律场景的17个语料库,并提供37种与外部知识交互的工具。我们设计了一个可扩展的任务构建框架,并仔细标注了300个任务。这些任务涵盖多种类型,包括多跳推理和写作,且在不同难度水平上展开,能够有效反映现实世界法律场景的复杂性。此外,除了评估最终成功率外,LegalAgentBench还在中间过程中引入关键词分析来计算进度率,实现更细致的评估。我们评估了八种流行的LLM,突显了现有模型和方法的优缺点及潜在改进空间。LegalAgentBench为LLM在法律领域的实际应用设立了新的基准,其代码和数据可在 https://github.com/CSHaitao/LegalAgentBench上获取。

论文简评

这篇论文是针对当前法律领域语言模型性能评估存在的空白而提出的。该文提出了一个名为“LegalAgentBench”的基准,旨在评估在中国法律场景中使用的语言模型的能力。这个基准包含17个来自真实世界法律场景的真实案例以及37种用于交互的工具,总共包含了300个精心标注的任务,这些任务反映了复杂法律推理的需求。

论文中的三个主要优点值得我们关注:

首先,它填补了当前法律领域评估语言模型能力的空白,特别是对处理复杂法律问题时表现出色的语言模型进行了专门研究。其次,该基准提供了多样化的任务类型和难度级别,使评估过程更加全面和有效。最后,引入了一套精细的评估指标,使我们能够更深入地了解语言模型的表现。

总的来说,《LegalAgentBench》是一个非常有价值的研究成果,它不仅为法律行业的语言模型开发提供了一个良好的框架,也为其他行业内的语言模型开发者提供了参考。通过这样的基准,我们可以更好地理解语言模型如何在不同的法律环境中发挥作用,并进一步推动技术的进步。

3.Tacit Learning with Adaptive Information Selection for Cooperative Multi-Agent Reinforcement Learning

Authors: Lunjun Liu, Weilai Jiang, Yaonan Wang

https://arxiv.org/abs/2412.15639

论文摘要

In multi-agent reinforcement learning (MARL), the centralized training with decentralized execution (CTDE) framework has gained widespread adoption due to its strong performance. However, the further development of CTDE faces two key challenges. First, agents struggle to autonomously assess the relevance of input information for cooperative tasks, impairing their decision-making abilities. Second, in communication-limited scenarios with partial observability, agents are unable to access global information, restricting their ability to collaborate effectively from a global perspective. To address these challenges, we introduce a novel cooperative MARL framework based on information selection and tacit learning. In this framework, agents gradually develop implicit coordination during training, enabling them to infer the cooperative behavior of others in a discrete space without communication, relying solely on local information. Moreover, we integrate gating and selection mechanisms, allowing agents to adaptively filter information based on environmental changes, thereby enhancing their decision-making capabilities. Experiments on popular MARL benchmarks show that our framework can be seamlessly integrated with state-of-the-art algorithms, leading to significant performance improvements.

论文简评

本文主要介绍了一种名为Selective Implicit Collaboration Algorithm(SICA)的新颖多智能体强化学习框架,该框架旨在解决集中式训练与分散执行(CTDE)模式下的挑战,即允许代理自主过滤和选择相关信息,从而增强决策能力。SICA融合了选择块、通信块和再生块,为从集中到分散决策提供了渐进过渡。实验结果表明,相较于传统CTDE方法和明确沟通方法,SICA在各种基准测试中表现出显著性能改进,证明了其在不同环境中的有效性。总的来说,SICA为合作型多智能体强化学习提供了有效的解决方案,成功解决CTDE模式下的一些关键问题,并通过实验验证了其优越性。

4.KG4Diagnosis: A Hierarchical Multi-Agent LLM Framework with Knowledge Graph Enhancement for Medical Diagnosis

Authors: Kaiwen Zuo, Yirui Jiang, Fan Mo, Pietro Lio

https://arxiv.org/abs/2412.16833

论文摘要

Integrating Large Language Models (LLMs) in healthcare diagnosis demands systematic frameworks that can handle complex medical scenarios while maintaining specialized expertise. We present KG4Diagnosis, a novel hierarchical multi-agent framework that combines LLMs with automated knowledge graph construction, encompassing 362 common diseases across medical specialties. Our framework mirrors real-world medical systems through a two-tier architecture: a general practitioner (GP) agent for initial assessment and triage, coordinating with specialized agents for in-depth diagnosis in specific domains. The core innovation lies in our end-to-end knowledge graph generation methodology, incorporating: (1) semantic-driven entity and relation extraction optimized for medical terminology, (2) multi-dimensional decision relationship reconstruction from unstructured medical texts, and (3) human-guided reasoning for knowledge expansion. KG4Diagnosis serves as an extensible foundation for specialized medical diagnosis systems, with capabilities to incorporate new diseases and medical knowledge. The framework's modular design enables seamless integration of domain-specific enhancements, making it valuable for developing targeted medical diagnosis systems. We provide architectural guidelines and protocols to facilitate adoption across medical contexts.

论文简评

这篇论文旨在构建一个名为KG4Diagnosis的多代理架构,该架构整合了语言模型(LLMs)与自动知识图谱构建技术,用于医疗诊断。通过一种结构化的方法,它模仿真实世界的医疗系统,以增强诊断准确性和决策制定能力。论文中提到,这个框架的有效性在于其高度模块化的设计,允许轻松集成新的医学知识和领域。此外,该架构能够有效处理医疗AI面临的挑战,尤其是在管理无结构数据方面。总之,这篇论文为未来基于深度学习的医疗诊断提供了有价值的研究成果。

5.Multi-Agent Path Finding in Continuous Spaces with Projected Diffusion Models

Authors: Jinhao Liang, Jacob K. Christopher, Sven Koenig, Ferdinando Fioretto

https://arxiv.org/abs/2412.17993

论文摘要

Multi-Agent Path Finding (MAPF) is a fundamental problem in robotics, requiring the computation of collision-free paths for multiple agents moving from their respective start to goal positions. Coordinating multiple agents in a shared environment poses significant challenges, especially in continuous spaces where traditional optimization algorithms struggle with scalability. Moreover, these algorithms often depend on discretized representations of the environment, which can be impractical in image-based or high-dimensional settings. Recently, diffusion models have shown promise in single-agent path planning, capturing complex trajectory distributions and generating smooth paths that navigate continuous, high-dimensional spaces. However, directly extending diffusion models to MAPF introduces new challenges since these models struggle to ensure constraint feasibility, such as inter-agent collision avoidance. To overcome this limitation, this work proposes a novel approach that integrates constrained optimization with diffusion models for MAPF in continuous spaces. This unique combination directly produces feasible multi-agent trajectories that respect collision avoidance and kinematic constraints. The effectiveness of our approach is demonstrated across various challenging simulated scenarios of varying dimensionality.

论文简评

该篇论文提出了一个全新的方法来解决多智能体路径规划(Multi-Agent Path Finding, MAPF)问题,整合扩散模型和约束优化技术,以在连续空间中生成可行轨迹,确保碰撞避免和动力学约束。通过模拟各种场景展示了这种方法的有效性。

论文的关键点在于以下几点:首先,本文提出了一种新的方法来解决连续空间中的多智能体路径规划问题,有效克服了传统离散化方法的局限性。其次,研究者详细说明了选择使用扩散模型的原因,以及为什么他们认为这是解决实际应用中面临的挑战所必需的方法。最后,实验结果表明,相较于传统扩散模型和引导扩散模型,该方法在确保可行性的同时,提高了路径效率。

总的来说,这篇论文提供了一个创新的解决方案,旨在改进当前多智能体路径规划方法,以更好地应对复杂环境下的任务需求。它的成功证明了这种新型方法的必要性和有效性。


我们欢迎您在评论区中留下宝贵的建议!包括但不限于:

  • 可以提出推文中论文简评的不足!
  • 可以分享最近更值得推荐的论文并给出理由!



END

推荐阅读

2024-12-25 论文分享 | 多模态大模型最新进展
2024-12-24 论文分享 | 大语言模型最新进展
2024-12-23 论文分享 | 多模态大模型最新进展
2024-12-20 论文分享 | 智能体最新进展

智荐阁
介绍生成式大模型与推荐系统领域的前沿进展,包括但不限于:大语言模型、推荐系统、智能体学习、强化学习、生成式推荐、引导式推荐、推荐智能体、智能体推荐
 最新文章