点击蓝字 关注我们
论文分享 | 智能体相关研究进展
Learning Multi-Agent Collaborative Manipulation for Long-Horizon Quadrupedal Pushing Towards Low-Resource Harmful Meme Detection with LMM Agents Enabling Adaptive Agent Training in Open-Ended Simulators by Targeting Diversity Magentic-One: A Generalist Multi-Agent System for Solving Complex Tasks Mr.Steve: Instruction-Following Agents in Minecraft with What-Where-When Memory
1.Learning Multi-Agent Collaborative Manipulation for Long-Horizon Quadrupedal Pushing
Authors: Chuye Hong, Yuming Feng, Yaru Niu, Shiqi Liu, Yuxiang Yang, Wenhao Yu, Tingnan Zhang, Jie Tan, Ding Zhao
https://arxiv.org/abs/2411.07104
论文摘要
Recently, quadrupedal locomotion has achieved significant success, but their manipulation capabilities, particu larly in handling large objects, remain limited, restricting their usefulness in demanding real-world applications such as search and rescue, construction, industrial automation, and room or ganization. This paper tackles the task of obstacle-aware, long horizon pushing by multiple quadrupedal robots. We propose a hierarchical multi-agent reinforcement learning framework with three levels of control. The high-level controller integrates an RRT planner and a centralized adaptive policy to generate subgoals, while the mid-level controller uses a decentralized goal-conditioned policy to guide the robots toward these sub goals. A pre-trained low-level locomotion policy executes the movement commands. We evaluate our method against several baselines in simulation, demonstrating significant improvements over baseline approaches, with 36.0% higher success rates and 24.5% reduction in completion time than the best baseline. Our framework successfully enables long-horizon, obstacle-aware manipulation tasks like Push-Cuboid and Push-T on Go1 robots in the real world.
论文简评
这篇关于多智能体强化学习框架的研究论文,在解决机器人领域的一个重要挑战——如何提升四足机器人在复杂环境中的长时操作能力方面提供了创新性的解决方案。该方法融合了高级RRT规划器与自适应策略,以指导分布式中层控制器生成子目标,从而实现更协调的多智能体协作。
实验结果表明,相比基准方法,该研究提出的框架显著提高了模拟测试和实际应用中的成功率和完成时间。这一突破性进展不仅为四足机器人在障碍感知环境中的操控能力提升提供了理论依据,也为机器人领域的其他问题开辟了潜在的研究方向。总的来说,该论文提出了一个新颖且具有潜力的多智能体强化学习框架,并通过实证验证了其在不同场景下的有效性。
2.Towards Low-Resource Harmful Meme Detection with LMM Agents
Authors: Jianzhao Huang, Hongzhan Lin, Ziyan Liu, Ziyang Luo, Guang Chen, Jing Ma
https://arxiv.org/abs/2411.05383
论文摘要
The proliferation of Internet memes in the age of social media necessitates effective identification of harmful ones. Due to the dynamic nature of memes, existing data-driven models may struggle in low-resource scenarios where only a few labeled examples are available. In this paper, we propose an agency-driven framework for low-resource harmful meme detection, employing both outward and inward analysis with few-shot annotated samples. Inspired by the powerful capacity of Large Multimodal Models (LMMs) on multimodal reasoning, we first retrieve relevant memes with annotations to leverage label information as auxiliary signals for the LMM agent. Then, we elicit knowledge-revising behavior within the LMM agent to derive well-generalized insights into meme harmfulness. By combining these strategies, our approach enables dialectical reasoning over intricate and implicit harm-indicative patterns. Extensive experiments conducted on three meme datasets demonstrate that our proposed approach achieves superior performance compared to state-of-the-art methods on the low-resource harmful meme detection task.
论文简评
该论文旨在提出一个基于大型多模态模型(Large Multimodal Models)的低资源有害网络梗检测框架。它通过外向分析和内向分析相结合的方式,利用少量标记样本实现高效学习,以增强有害网络梗检测的适应性和有效性,在低资源场景中显示出优越性。通过对三个表情包数据集的评估,证明了其在性能上的优异表现,超越了现有方法。本文提出的框架对有害网络梗检测领域具有重要的理论意义和实践价值,为解决当前社会热点问题提供了有效的解决方案。
3.Enabling Adaptive Agent Training in Open-Ended Simulators by Targeting Diversity
Authors: Robby Costales, Stefanos Nikolaidis
https://arxiv.org/abs/2411.04466
论文摘要
Thewiderapplication of end-to-end learning methods to embodied decision-making domains remains bottlenecked by their reliance on a superabundance of training data representative of the target domain. Meta-reinforcement learning (meta-RL) approaches abandon the aim of zero-shot generalization—the goal of standard reinforcement learning (RL)—in favor of few-shot adaptation, and thus hold promise for bridging larger generalization gaps. While learning this meta-level adaptive behavior still requires substantial data, efficient environment simulators approaching real-world complexity are growing in prevalence. Even so, hand designing sufficiently diverse and numerous simulated training tasks for these complex domains is prohibitively labor-intensive. Domain randomization (DR) and procedural generation (PG), offered as solutions to this problem, require simulators to possess carefully-defined parameters which directly translate to meaningful task diversity—a similarly prohibitive assumption. In this work, we present DIVA, an evolutionary approach for generating diverse training tasks in such complex, open ended simulators. Like unsupervised environment design (UED) methods, DIVA can be applied to arbitrary parameterizations, but can additionally incorporate realistically-available domain knowledge—thus inheriting the flexibility and gener ality of UED, and the supervised structure embedded in well-designed simulators exploited by DR and PG. Our empirical results showcase DIVA’s unique ability to overcome complex parameterizations and successfully train adaptive agent behav ior, far outperforming competitive baselines from prior literature. These findings highlight the potential of such semi-supervised environment design (SSED) ap proaches, of which DIVA is the first humble constituent, to enable training in realistic simulated domains, and produce more robust and capable adaptive agents. Our code is available at https://github.com/robbycostales/diva.
论文简评
这篇论文深入探讨了生成复杂、开放环境中的多样化训练任务以支持自适应代理训练的问题,并提出了DIVA(分布式进化优化)这一新颖的方法来解决此问题。该方法不仅考虑到了质量上的多样性优化,还融合了环境设计的理念,旨在提高任务的多样性和代理的表现能力。通过一系列实验结果,DIVA相较于现有的竞争基准线展现了显著的优势,显示出其在自适应代理训练中的有效性。总的来说,DIVA为解决这类问题提供了一个新的视角和解决方案,对推动相关研究的发展具有重要意义。
4.Magentic-One: A Generalist Multi-Agent System for Solving Complex Tasks
Authors: Adam Fourney, Gagan Bansal, Hussein Mozannar, Cheng Tan, Eduardo Salinas, Erkang Zhu, Friederike Niedtner, Grace Proebsting, Griffin Bassman, Jack Gerrits, Jacob Alber, Peter Chang, Ricky Loynd, Robert West, Victor Dibia, Ahmed Awadallah, Ece Kamar, Rafah Hosn, Saleema Amershi
https://arxiv.org/abs/2411.04468
论文摘要
Modern AI agents, driven by advances in large foundation models, promise to enhance our productivity and transform our lives by augmenting our knowledge and capabilities. To achieve this vision, AI agents must effectively plan, perform multi-step reasoning and actions, respond to novel observations, and recover from errors, to successfully complete complex tasks across a wide range of scenarios. In this work, we introduce Magentic-One, a high-performing open-source agentic system for solving such tasks. Magentic-One uses a multi-agent architecture where a lead agent, the Orchestrator, plans, tracks progress, and re-plans to recover from errors. Throughout task execution, the Orchestrator also di rects other specialized agents to perform tasks as needed, such as operating a web browser, navigating local files, or writing and executing Python code. Our experiments show that Magentic-One achieves statistically competitive performance to the state-of-the-art on three diverse and challenging agentic benchmarks: GAIA, AssistantBench, and WebArena. No tably, Magentic-One achieves these results without modification to core agent capabilities or to how they collaborate, demonstrating progress towards the vision of generalist agentic sys tems. Moreover, Magentic-One’s modular design allows agents to be added or removed from the team without additional prompt tuning or training, easing development and making it extensible to future scenarios. We provide an open-source implementation of Magentic-One, and we include AutoGenBench, a standalone tool for agentic evaluation. AutoGenBench provides built-in controls for repetition and isolation to run agentic benchmarks in a rig orous and contained manner– which is important when agents’ actions have side-effects. Magentic-One, AutoGenBench and detailed empirical performance evaluations of Magentic One, including ablations and error analysis are available at https://aka.ms/magentic-one.
论文简评
这篇论文主要介绍了一个名为"Magentic-One"的多代理系统,它采用模块化架构,由一个协调器统一管理,以解决复杂的任务。该系统在多个基准测试中表现出色,并提供了开源实现及 AutoGenBench 工具用于评估代理系统的性能。此外,通过引入 AutoGenBench,研究人员能够更有效地评估代理系统,从而更好地理解其性能和局限性。整体而言,Magentic-One 是一个具有创新性和实用性的多代理系统,其成功不仅在于技术上的领先,也在于其有效的代理系统性能评估方法。
5.Mr.Steve: Instruction-Following Agents in Minecraft with What-Where-When Memory
Authors: Junyeong Park, Junmo Cho, Sungjin Ahn
https://arxiv.org/abs/2411.06736
论文摘要
Significant advances have been made in developing general-purpose embodied AI in environments like Minecraft through the adoption of LLM-augmented hierar chical approaches. While these approaches, which combine high-level planners with low-level controllers, show promise, low-level controllers frequently become performance bottlenecks due to repeated failures. In this paper, we argue that the primary cause of failure in many low-level controllers is the absence of an episodic memory system. To address this, we introduce MR.STEVE (Memory Recall STEVE-1), a novel low-level controller equipped with Place Event Memory (PEM), a form of episodic memory that captures what, where, and when infor mation from episodes. This directly addresses the main limitation of the popular low-level controller, STEVE-1. Unlike previous models that rely on short-term memory, PEM organizes spatial and event-based data, enabling efficient recall and navigation in long-horizon tasks. Additionally, we propose an Exploration Strategy and a Memory-Augmented Task Solving Framework, allowing agents to alternate between exploration and task-solving based on recalled events. Our approach significantly improves task-solving and exploration efficiency compared to existing methods. We will release our code and demos on the project page: https://sites.google.com/view/mr-steve.
论文简评
本文对MR.STEVE的研究成果进行了全面而详细的介绍。该研究提出了一种名为地点事件记忆(PEM)的低级控制器,以解决现有低级控制器的局限,即有限的记忆能力。通过引入PEM系统,研究人员成功实现了高效的召回与导航功能,从而显著提高了任务解决效率。实验结果表明,与现有方法相比,MR.STEVE在探索效率和任务解决能力方面有明显提升。此外,本文详细分析了低级控制器所面临的主要挑战,并指出MR.STEVE作为有效解决方案的优势。总之,这篇论文为低级控制器的发展提供了有价值的见解,对提高游戏中的用户体验具有重要意义。
我们欢迎您在评论区中留下宝贵的建议!包括但不限于:
可以提出推文中论文简评的不足! 可以分享最近更值得推荐的论文并给出理由!
END