2024-12-18 论文分享 | 智能体最新进展

文摘   2024-12-18 10:58   安徽  

点击蓝字 关注我们

论文分享 | 智能体相关研究进展

我们从2024-12-13到2024-12-18的45篇文章中精选出5篇优秀的工作分享给读者。

  1. Can Modern LLMs Act as Agent Cores in Radiology~Environments?
  2. Achieving Collective Welfare in Multi-Agent Reinforcement Learning via Suggestion Sharing
  3. A systematic review of norm emergence in multi-agent systems
  4. Agent-based Video Trimming
  5. GROOT-2: Weakly Supervised Multi-Modal Instruction Following Agents

1.Can Modern LLMs Act as Agent Cores in Radiology~Environments?

Authors: Qiaoyu Zheng, Chaoyi Wu, Pengcheng Qiu, Lisong Dai, Ya Zhang, Yanfeng Wang, Weidi Xie

https://arxiv.org/abs/2412.09529

论文摘要

Advancements in large language models (LLMs) have paved the way for LLM-based agent systems that offer enhanced accuracy and interpretability across various domains. Radiology, with its complex analytical requirements, is an ideal field for the application of these agents. This paper aims to investigate the pre-requisite question for building concrete radiology agents which is, ‘Can modern LLMs act as agent cores in radiology environments?’ To investigate it, we introduce RadABench with three-fold contributions: First, we present RadABench-Data, a comprehensive synthetic evaluation dataset for LLM-based agents, generated from an extensive taxonomy encompassing 6 anatomies, 5 imaging modalities, 10 tool categories, and 11 radiology tasks. Second, we propose RadABench-EvalPlat, a novel evaluation platform for agents featuring a prompt-driven workflow and the capability to simulate a wide range of radiology toolsets. Third, we assess the performance of 7 leading LLMs on our benchmark from 5 perspectives with multiple metrics. Our findings indicate that while current LLMs demonstrate strong capabilities in many areas, they are still not sufficiently advanced to serve as the central agent core in a fully operational radiology agent system. Additionally, we identify key factors influencing the performance of LLM-based agent cores, offering insights for clinicians on how to apply agent systems in real-world radiology practices effectively. All of our code and data are open-sourced in https://github.com/MAGIC-AI4Med/RadABench.

论文简评

这篇论文评估了现代大型语言模型(LLM)作为医学影像环境中代理核心的潜力。研究中提出了一个名为RadABench的数据集,包含复杂的放射学任务,并引入了一个新的评估平台,即RadABench-EvalPlat。论文通过评估7个领先的LLM,揭示了它们在特定放射学任务方面的强项和弱点。 文章的关键点在于以下几点:首先,引入的RadABench-Data为评估LLM在放射学环境中的能力提供了全面且结构化的数据,这对于理解LLM在这一领域表现至关重要;其次,提出的新评估平台RadABench-EvalPlat模拟真实的临床场景,为评估LLM的能力提供了一套详细的框架,用于分解任务、选择工具和生成响应;最后,对多个LLM的系统性性能分析提供了宝贵的见解,有助于了解当前限制和潜在的改进空间,以支持临床应用。综上所述,这篇文章深入探讨了现代大型语言模型在医疗影像领域的潜力,为未来的研究和实践提供了有价值的指导。

2.Achieving Collective Welfare in Multi-Agent Reinforcement Learning via Suggestion Sharing

Authors: Yue Jin, Shuangqing Wei, Giovanni Montana

https://arxiv.org/abs/2412.12326

论文摘要

In human society, the conflict between self-interest and collective well-being often obstructs efforts to achieve shared welfare. Related concepts like the Tragedy of the Commons and Social Dilemmas frequently manifest in our daily lives. As artificial agents increasingly serve as autonomous proxies for humans, we propose using multi-agent reinforcement learning (MARL) to address this issue—learning policies to maximise collective returns even when individual agents' interests conflict with the collective one. Traditional MARL solutions involve sharing rewards, values, and policies or designing intrinsic rewards to encourage agents to learn collectively optimal policies. We introduce a novel MARL approach based on Suggestion Sharing (SS), where agents exchange only action suggestions. This method enables effective cooperation without the need to design intrinsic rewards, achieving strong performance while revealing less private information compared to sharing rewards, values, or policies. Our theoretical analysis establishes a bound on the discrepancy between collective and individual objectives, demonstrating how sharing suggestions can align agents' behaviours with the collective objective. Experimental results demonstrate that SS performs competitively with baselines that rely on value or policy sharing or intrinsic rewards.

论文简评

Suggestion Sharing(SS)是多智能体强化学习(MARL)中的一种新颖策略,旨在通过协调各代理的利益来实现集体福利最大化。本文作者强调传统方法需共享奖励或政策的不足,并提出仅分享行动建议的新观点。理论分析与实证研究支持了SS的有效性,展示了其在多个环境中的竞争力,同时保护了隐私。这些成果为MARL领域提供了重要的贡献。总体而言,该文提出了一种创新的方法来促进合作,而无需共享敏感信息,从而增强了其在MARL领域的影响力。

3.A systematic review of norm emergence in multi-agent systems

Authors: Carmengelys Cordova, Joaquin Taverner, Elena Del Val, Estefania Argente

https://arxiv.org/abs/2412.10609

论文摘要

Multi-agent systems (MAS) have gained relevance in the field of artificial intelligence by offering tools for modelling complex environments where autonomous agents interact to achieve common or individual goals. In these systems, norms emerge as a fundamental component to regulate the behaviour of agents, promoting cooperation, coordination and conflict resolution. This article presents a systematic review, following the PRISMA method, on the emergence of norms in MAS, exploring the main mechanisms and factors that influence this process. Sociological, structural, emotional and cognitive aspects that facilitate the creation, propagation and reinforcement of norms are addressed. The findings highlight the crucial role of social network topology, as well as the importance of emotions and shared values in the adoption and maintenance of norms. Furthermore, opportunities are identified for future research that more explicitly integrates emotional and ethical dynamics in the design of adaptive normative systems. This work provides a comprehensive overview of the current state of research on norm emergence in MAS, serving as a basis for advancing the development of more efficient and flexible systems in artificial and real-world contexts.

论文简评

这篇论文是关于多代理系统(MAS)中规范出现的系统性综述。作者采用PRISMA方法对该主题进行了深入研究,探讨了影响这一过程的各种机制和因素,包括社会学、结构、情感和认知等多个方面。此外,文中还强调了社交网络拓扑结构以及共同价值观对规范采纳和维护的重要性。通过全面的文献回顾,本文为相关领域的研究提供了有价值的信息,并有助于加深我们对规范出现现象的理解。总的来说,这篇文章是一个重要的研究成果,填补了这一领域的空白,并为其他学者的研究提供了良好的参考框架。

4.Agent-based Video Trimming

Authors: Lingfeng Yang, Zhenyuan Chen, Xiang Li, Peiyang Jia, Liangqu Long, Jian Yang

https://arxiv.org/abs/2412.09513

论文摘要

As information becomes more accessible, user-generated videos are increasing in length, placing a burden on viewers to sift through vast content for valuable insights. This trend underscores the need for an algorithm to extract key video information efficiently. Despite significant advancements in highlight detection, moment retrieval, and video summarization, current approaches primarily focus on selecting specific time intervals, often overlooking the relevance between segments and the potential for segment arranging. In this paper, we introduce a novel task calledVideo Trimming (VT), which focuses on detecting wasted footage, selecting valuable segments, and composing them into a final video with a coherent story. To address this task, we proposeAgent-based Video Trimming (AVT), structured into three phases: Video Structuring, Clip Filtering, and Story Composition. Specifically, we employ a Video Captioning Agent to convert video slices into structured textual descriptions, a Filtering Module to dynamically discard low-quality footage based on the structured information of each clip, and a Video Arrangement Agent to select and compile valid clips into a coherent final narrative. For evaluation, we develop a Video Evaluation Agent to assess trimmed videos, conducting assessments in parallel with human evaluations. Additionally, we curate a new benchmark dataset for video trimming using raw user videos from the internet. As a result, AVT received more favorable evaluations in user studies and demonstrated superior mAP and precision on the YouTube Highlights, TVSum, and our own dataset for the highlight detection task. The code and models are available at https://ylingfeng.github.io/AVT.

论文简评

该篇论文关注一个新颖的任务——视频剪辑(Video Trimming, VT),提出了一种基于代理的剪辑框架(Agent-Based Video Trimming, AVT),该框架包括视频结构化、片段过滤和故事构建等阶段。目标是通过选择高价值片段来提高视频编辑的效率,并排除冗余素材,同时保持故事情节的一致性。研究者通过对现有亮点检测方法的评估,展示了AVT的有效性。

该文的主要优点在于对当前视频处理问题的关注,以及创新性地采用代理机制来解决视频编辑中的挑战。此外,用户研究和量化评估也显示了AVT相对于现有技术的有效性。总体而言,这篇论文为视频编辑领域提供了新的视角,具有重要的理论价值和实践意义。

5.GROOT-2: Weakly Supervised Multi-Modal Instruction Following Agents

Authors: Shaofei Cai, Bowei Zhang, Zihao Wang, Haowei Lin, Xiaojian Ma, Anji Liu, Yitao Liang

https://arxiv.org/abs/2412.10410

论文摘要

Developing agents that can follow multimodal instructions remains a fundamental challenge in robotics and AI. Although large-scale pre-training on unlabeled datasets (no language instruction) has enabled agents to learn diverse behaviors, these agents often struggle with following instructions. While augment ing the dataset with instruction labels can mitigate this issue, acquiring such high-quality annotations at scale is impractical. To address this issue, we frame the problem as a semi-supervised learning task and  introduce GROOT-2, a multimodal instructable agent trained using a novel approach that combines weak supervision with latent variable models. Our method consists of two key components: constrained self-imitating, which utilizes large amounts of unlabeled demonstrations to enable the policy to learn diverse behaviors, and human intention alignment, which uses a smaller set of labeled demonstrations to ensure the latent space reflects human intentions. GROOT-2’s effectiveness is validated across four diverse environments, ranging from video games to robotic manipulation, demonstrating its robust multimodal instruction-following capabilities.

论文简评

GROOT-2是一种创新的多模态指令跟随代理,旨在解决在复杂环境中获取高质量标注的问题。该工作的主要贡献在于利用自我模仿学习与人类意图对齐的策略,提升代理的表现。通过利用大量未标记数据,所提方法显著提高了代理的训练效率和效果。研究表明,GROOT-2在多种场景中的广泛适用性,显示出对未来人工智能和机器人研究的潜在影响。总体而言,本文全面探讨了多模态指令跟随领域面临的挑战,并通过实施GROOT-2提供了有希望的解决方案。


我们欢迎您在评论区中留下宝贵的建议!包括但不限于:

  • 可以提出推文中论文简评的不足!
  • 可以分享最近更值得推荐的论文并给出理由!

END

推荐阅读

2024-12-17 论文分享 | 大语言模型最新进展
2024-12-16 论文分享 | 多模态大模型最新进展
2024-12-13 论文分享 | 智能体最新进展
2024-12-12 论文分享 | 推荐系统最新进展

智荐阁
介绍生成式大模型与推荐系统领域的前沿进展,包括但不限于:大语言模型、推荐系统、智能体学习、强化学习、生成式推荐、引导式推荐、推荐智能体、智能体推荐
 最新文章