2024-11-18 论文分享 | 智能体最新进展

文摘 2024-11-18 09:43 安徽

点击蓝字关注我们

论文分享 | 智能体相关研究进展

Randomized Truthful Auctions with Learning Agents
Factorised Active Inference for Strategic Multi-Agent Interactions
BAMAX: Backtrack Assisted Multi-Agent Exploration using Reinforcement Learning
RedCode: Risky Code Execution and Generation Benchmark for Code Agents
Multi-Agent Stochastic Bandits Robust to Adversarial Corruptions

1.Randomized Truthful Auctions with Learning Agents

Authors: Gagan Aggarwal, Anupam Gupta, Andres Perlroth, Grigoris Velegkas

https://arxiv.org/abs/2411.09517

论文摘要

Westudy asetting where agents use no-regret learning algorithms to participate in repeated auctions. Kolumbus and Nisan (2022a) showed, rather surprisingly, that when bidders participate in second-price auctions using no-regret bidding algo rithms, no matter how large the number of interactions T is, the runner-up bidder may not converge to bidding truthfully. Our first result shows that this holds for general deterministic truthful auctions. We also show that the ratio of the learn ing rates of the bidders can qualitatively affect the convergence of the bidders. Next, we consider the problem of revenue maximization in this environment. In the setting with fully rational bidders, Myerson (1981) showed that revenue can be maximized by using a second-price auction with reserves. We show that, in stark contrast, in our setting with learning bidders, randomized auctions can have strictly better revenue guarantees than second-price auctions with reserves, when T is large enough. Finally, we study revenue maximization in the non-asymptotic regime. We define a notion of auctioneer regret comparing the revenue generated to the revenue of a second price auction with truthful bids. When the auctioneer has to use the same auction throughout the interaction, we show an (almost) tight regret bound of e . If the auctioneer can change auctions during the interac tion, but in a way that is oblivious to the bids, we show an (almost) tight bound of

论文简评

这篇论文深入探讨了学习智能体在重复拍卖中的行为，重点关注如何通过无悔学习算法指导它们收敛于诚实竞标。该研究提出，随机拍卖可以在学习性买家的情况下产生比传统方法更好的收入。论文提供了拍卖设计和收入最大化理论分析的一些结果，并讨论了这些发现对拍卖动态的理解。首先，文章关注的一个重要且相关的问题是，在拍卖设计中，尤其是在学习算法的作用下，探索学习代理的行为。这一关注点对于理解拍卖过程中的动态具有重要意义。其次，论文引入了随机拍卖作为提高收益的一种手段，在学习性买家的设置中，充分证明了这种方法的有效性，进一步验证了文章的核心观点。最后，论文的理论分析和实验结果表明，通过对学习代理的分析，我们能够更好地理解拍卖机制的运作方式，从而为未来的研究提供有价值的洞见。综上所述，本文不仅提出了一个新的研究领域，也提供了丰富的研究成果。它拓展了我们对学习代理在拍卖中行为的理解，并为我们探索更高效的拍卖策略提供了新的视角。

2.Factorised Active Inference for Strategic Multi-Agent Interactions

Authors: Jaime Ruiz-Serra, Patrick Sweeney, Michael S. Harré

https://arxiv.org/abs/2411.07362

论文摘要

Understanding how individual agents make strategic decisions within collectives is important for advancing fields as diverse as economics, neuroscience, and multi-agent systems. Two complementary approaches can be integrated to this end. The Active Inference framework (AIF) describes how agents employ a generative model to adapt their beliefs about and behaviour within their environment. Game theory formalises strategic interactions between agents with potentially competing objectives. To bridge the gap between the two, we propose a factorisation of the generative model whereby each agent maintains explicit, individual-level beliefs about the internal states of other agents, and uses them for strategic planning in a joint context. We apply our model to iterated general-sum games with 2 and 3 players, and study the ensemble effects of game transitions, where the agents’ preferences (game payoffs) change over time. This non-stationarity, beyond that caused by reciprocal adaptation, reflects a more naturalistic environment in which agents need to adapt to changing social contexts. Finally, we present a dynamical analysis of key AIF quantities: the variational free energy (VFE) and the expected free energy (EFE) from numerical simulation data. The ensemble-level EFE allows us to characterise the basins of attraction of games with multiple Nash Equilibria under different conditions, and we find that it is not necessarily minimised at the aggregate level. By integrating AIF and game theory, we can gain deeper insights into how intelligent collectives emerge, learn, and optimise their actions in dynamic environments, both cooperative and non-cooperative.

论文简评

该篇论文通过引入一种基于主动推理框架（Active Inference）的新方法，模拟了多代理间的战略互动，并能够维持个体对其他代理内部状态的信念，以便在动态环境中有效决策。此研究应用于迭代正常形式游戏中，并分析了游戏转移与均衡的动态过程。

论文的关键贡献在于，将主动推理和博弈论这两个领域的理论成果结合在一起，为理解多代理系统中的集体行为提供了新的视角。特别是，通过维持关于其他代理内部状态的明确信念，这项工作为探索群体智能的重要概念奠定了基础，并为未来研究提供了广阔的空间。总之，这篇论文不仅丰富了相关领域的知识，还为解决实际问题提供了一种创新的方法。

3.BAMAX: Backtrack Assisted Multi-Agent Exploration using Reinforcement Learning

Authors: Geetansh Kalra, Amit Patel, Atul Chaudhari, Divye Singh

https://arxiv.org/abs/2411.08400

论文摘要

Autonomous robots collaboratively exploring an unknown environment is still an open problem. The problem has its roots in coor dination among non-stationary agents, each with only a partial view of information. The problem is compounded when the multiple robots must completely explore the environment. In this paper, we introduce Back track Assisted Multi-Agent Exploration using Reinforcement Learning (BAMAX), a method for collaborative exploration in multi-agent sys tems which attempts to explore an entire virtual environment. As in the name, BAMAX leverages backtrack assistance to enhance the perfor mance of agents in exploration tasks. To evaluate BAMAX against tradi tional approaches, we present the results of experiments conducted across multiple hexagonal shaped grids sizes, ranging from 10x10 to 60x60. The results demonstrate that BAMAX outperforms other methods in terms of faster coverage and less backtracking across these environments.

论文简评

这篇论文主要探讨了在多智能体系统中使用强化学习进行协作探索的方法——BAMAX。该方法利用回溯机制来增强探索性能，在六边形网格环境中取得了显著的效果。与传统方法相比，BAMAX在速度和减少回溯次数方面表现得更佳。实验结果表明，BAMAX在覆盖和回溯等指标上优于传统算法。这些发现为多智能体系统的探索行为提供了新的视角，并为解决未知环境中的问题提供了一种有效的方法。总而言之，这篇论文对研究领域做出了重要贡献，其成果有望在未来的研究中得到广泛应用。

4.RedCode: Risky Code Execution and Generation Benchmark for Code Agents

Authors: Chengquan Guo, Xun Liu, Chulin Xie, Andy Zhou, Yi Zeng, Zinan Lin, Dawn Song, Bo Li

https://arxiv.org/abs/2411.07781

论文摘要

With the rapidly increasing capabilities and adoption of code agents for AI-assisted coding and software development, safety and security concerns, such as generating or executing malicious code, have become significant barriers to the real-world deployment of these agents. To provide comprehensive and practical evaluations on the safety of code agents, we propose RedCode, an evaluation platform with benchmarks grounded in four key principles: real interaction with systems, holistic evaluation of unsafe code generation and execution, diverse input formats, and high quality safety scenarios and tests. RedCode consists of two parts to evaluate agents’ safety in unsafe code execution and generation: (1) RedCode-Exec provides challenging code prompts in Python as inputs, aiming to evaluate code agents’ ability to recognize and handle unsafe code. We then map the Python code to other programming languages (e.g., Bash) and natural text summaries or descriptions for evaluation, leading to a total of over 4,000 testing instances. We provide 25 types of critical vulnerabilities spanning various domains, such as websites, file systems, and operating systems. We provide a Docker sandbox environment to evaluate the execution capabilities of code agents and design corresponding evaluation metrics to assess their execution results. (2) RedCode-Gen provides 160 prompts with function signatures and docstrings as input to assess whether code agents will follow instructions to generate harmful code or software. Our empirical findings, derived from evaluating three agent frameworks based on 19 LLMs, provide insights into code agents’ vulnerabilities. For instance, evaluations on RedCode-Exec show that agents are more likely to reject executing unsafe operations on the operating system, but are less likely to reject executing technically buggy code, indicating high risks. Unsafe operations described in natural text lead to a lower rejection rate than those in code format. Additionally, evaluations on RedCode-Gen reveal that more capable base models and agents with stronger overall coding abilities, such as GPT 4, tend to produce more sophisticated and effective harmful software. Our findings highlight the need for stringent safety evaluations for diverse code agents. Our dataset and code are publicly available at https://github.com/AI-secure/RedCode.

论文简评

这篇关于红代码（RedCode）的研究论文是一个重要的里程碑，为评估AI辅助生成和执行潜在危险代码的安全性提供了全面而创新的框架。论文的核心观点是，通过引入名为“红代码”的基准平台，有效评估代码生成器和执行者的安全性能。红代码系统的设计理念建立在真实系统的交互原则、对代码安全的综合评估及多样化输入格式基础之上。

研究的关键在于揭示现有代码生成器存在的安全隐患，并提出相应的改进措施。论文详细分析了红代码系统在各种复杂场景下的表现，展示了其强大的安全性测试能力。此外，论文还提供了大量实证数据，这些数据不仅证实了红代码的有效性和实用性，也清晰指出了当前代码生成工具面临的挑战。

综上所述，红代码的研究成果具有深远意义，为安全评估领域带来了新的视角和方法。通过对现有技术的深入分析和实践验证，红代码有望推动人工智能在更广泛领域中的积极应用。

5.Multi-Agent Stochastic Bandits Robust to Adversarial Corruptions

Authors: Fatemeh Ghaffari, Xuchuang Wang, Jinhang Zuo, Mohammad Hajiesmaili

https://arxiv.org/abs/2411.08167

论文摘要

We study the problem of multi-agent multi-armed bandits with adversarial corruption in a heteroge neous setting, where each agent accesses a subset of arms. The adversary can corrupt the reward observations for all agents. Agents share these corrupted rewards with each other, and the objective is to maximize the cumulative total reward of all agents (and not be misled by the adversary). We propose a multi-agent cooperative learning algorithm that is robust to adversarial corruptions. For this newly devised algorithm, we demonstrate that an adversary with an unknown corruption budget C onlyincurs an additive O((L/Lmin)C) term to the standard regret of the model in non-corruption settings, where L is the total number of agents, and Lmin is the minimum number of agents with mutual access to an arm. As a side-product, our algorithm also improves the state-of-the-art regret bounds when reducing to both the single-agent and homogeneous multi-agent scenarios, tightening multiplicative K (the number of arms) and L (the number of agents) factors, respectively.

论文简评

这篇论文深入探讨了多智能体多臂匪徒问题，并特别关注异质环境中的对抗性干扰。作者提出了一种新的算法，旨在优化累积奖励的同时降低对抗性干扰的影响，为多智能体学习系统中的重要问题提供了解决方案，具有显著的领域贡献。

理论分析表明，该算法在单智能体和异质多智能体场景下的性能超越现有方法，从而增强了其在对抗环境中的表现。该优化对解决当前多智能体学习系统中的挑战具有重要意义。

总体而言，这篇论文对多智能体多臂匪徒的研究做出了重要贡献，并提出了应对异质环境中对抗性干扰的有效策略。这一成果不仅加深了我们对多智能体学习系统的理解，也为未来的研究提供了有价值的视角。

我们欢迎您在评论区中留下宝贵的建议！包括但不限于：

可以提出推文中论文简评的不足！
可以分享最近更值得推荐的论文并给出理由！

END

2024-11-18 论文分享 | 智能体最新进展

论文分享 | 智能体相关研究进展

1.Randomized Truthful Auctions with Learning Agents

论文摘要

论文简评

2.Factorised Active Inference for Strategic Multi-Agent Interactions

论文摘要

论文简评

3.BAMAX: Backtrack Assisted Multi-Agent Exploration using Reinforcement Learning

论文摘要

论文简评

4.RedCode: Risky Code Execution and Generation Benchmark for Code Agents

论文摘要

论文简评

5.Multi-Agent Stochastic Bandits Robust to Adversarial Corruptions

论文摘要

论文简评

推荐阅读