【顶会速递】RLC2024—128篇Accept论文汇总

文摘   2024-08-28 08:40   新加坡  

【导读】最近,强化学习领域的新会议——RLC在美国马萨诸塞大学召开第一届,颁布了7个组别的杰出论文奖项,并在公开发布的博文中表示,对审稿流程做出了诸多创新和改进。

Aug 10, Oral Track 1: Evaluation - Room 168

  1. D5RL: Diverse Datasets for Data-Driven Deep Reinforcement Learning

    Rafael Rafailov, Kyle Beltran Hatch, Anikait Singh, Aviral Kumar, Laura Smith, Ilya Kostrikov, Philippe Hansen-Estruch, Victor Kolev, Philip J. Ball, Jiajun Wu, Sergey Levine, Chelsea Finn

  2. Harnessing Discrete Representations for Continual Reinforcement Learning

    Edan Jacob Meyer, Adam White, Marlos C. Machado

  3. Aquatic Navigation: A Challenging Benchmark for Deep Reinforcement Learning

    Davide Corsi, Davide Camponogara, Alessandro Farinelli

  4. Investigating the Interplay of Prioritized Replay and Generalization

    Parham Mohammad Panahi, Andrew Patterson, Martha White, Adam White

  5. ICU-Sepsis: A Benchmark MDP Built from Real Medical Data

    Kartik Choudhary, Dhawal Gupta, Philip S. Thomas

  6. Resource Usage Evaluation of Discrete Model-Free Deep Reinforcement Learning Algorithms

    Olivia P. Dizon-Paradis, Stephen E. Wormald, Daniel E. Capecci, Avanti Bhandarkar, Damon L. Woodard

  7. OCAtari: Object-Centric Atari 2600 Reinforcement Learning Environments

    Quentin Delfosse, Jannis Blüml, Bjarne Gregori, Sebastian Sztwiertnia, Kristian Kersting

  8. The Cross-environment Hyperparameter Setting Benchmark for Reinforcement Learning

    Andrew Patterson, Samuel Neumann, Raksha Kumaraswamy, Martha White, Adam White

  9. An Open-Loop Baseline for Reinforcement Learning Locomotion Tasks

    Antonin Raffin, Olivier Sigaud, Jens Kober, Alin Albu-Schaeffer, João Silvério, Freek Stulp

  10. Combining Automated Optimisation of Hyperparameters and Reward Shape

    Julian Dierkes, Emma Cramer, Holger Hoos, Sebastian Trimpe

  11. Robotic Manipulation Datasets for Offline Compositional Reinforcement Learning

    Marcel Hussing, Jorge Mendez-Mendez, Anisha Singrodia, Cassandra Kent, Eric Eaton

  12. Stable-Baselines3: Reliable Reinforcement Learning Implementations

    Antonin Raffin, Ashley Hill, Adam Gleave, Anssi Kanervisto, Maximilian Ernestus, Noah Dormann

Aug 10, Oral Track 2: Theoretical RL and bandit algorithms - Room 165/169

  1. A Provably Efficient Option-Based Algorithm for both High-Level and Low-Level Learning

    Gianluca Drappo, Alberto Maria Metelli, Marcello Restelli

  2. Bandits with Multimodal Structure

    Hassan SABER, Odalric-Ambrym Maillard

  3. Policy Gradient with Active Importance Sampling

    Matteo Papini, Giorgio Manganini, Alberto Maria Metelli, Marcello Restelli

  4. Sample Complexity of Offline Distributionally Robust Linear Markov Decision Processes

    He Wang, Laixi Shi, Yuejie Chi

  5. Improving Thompson Sampling via Information Relaxation for Budgeted Multi-armed Bandits

    Woojin Jeong, Seungki Min

  6. A Batch Sequential Halving Algorithm without Performance Degradation

    Sotetsu Koyamada, Soichiro Nishimori, Shin Ishii

  7. Graph Neural Thompson Sampling

    Shuang Wu, Arash A. Amini

  8. A Tighter Convergence Proof of Reverse Experience Replay

    Nan Jiang, Jinzhao Li, Yexiang Xue

  9. Cost Aware Best Arm Identification

    Kellen Kanarios, Qining Zhang, Lei Ying

  10. Towards Principled, Practical Policy Gradient for Bandits and Tabular MDPs

    Michael Lu, Matin Aghaei, Anant Raj, Sharan Vaswani

  11. Non-stationary Bandits and Meta-Learning with a Small Set of Optimal Arms

    Javad Azizi, Thang Duong, Yasin Abbasi-Yadkori, András György, Claire Vernade, Mohammad Ghavamzadeh

  12. Causal Contextual Bandits with Adaptive Context

    Rahul Madhavan, Aurghya Maiti, Gaurav Sinha, Siddharth Barman

  13. Global Optimality and Finite Sample Analysis of Softmax Off-Policy Actor Critic under State Distribution Mismatch

    Shangtong Zhang, Remi Tachet des Combes, Romain Laroche

Aug 10, Oral Track 3: Multi-agent RL and planning algorithms - Room 174/176

  1. Co-Learning Empirical Games & World Models

    Max Olan Smith, Michael P. Wellman

  2. Best Response Shaping

    Milad Aghajohari, Tim Cooijmans, Juan Agustin Duque, Shunichi Akatsuka, Aaron Courville

  3. Shield Decomposition for Safe Reinforcement Learning in General Partially Observable Multi-Agent Environments

    Daniel Melcer, Christopher Amato, Stavros Tripakis

  4. Quantifying Interaction Level Between Agents Helps Cost-efficient Generalization in Multi-agent Reinforcement Learning

    Yuxin Chen, Chen Tang, Thomas Tian, Chenran Li, Jinning Li, Masayoshi Tomizuka, Wei Zhan

  5. Reinforcement Learning from Delayed Observations via World Models

    Armin Karamzade, Kyungmin Kim, Montek Kalsi, Roy Fox

  6. Cyclicity-Regularized Coordination Graphs

    Oliver Järnefelt, Mahdi Kallel, Carlo D'Eramo

  7. Assigning Credit with Partial Reward Decoupling in Multi-Agent Proximal Policy Optimization

    Aditya Kapoor, Benjamin Freed, Jeff Schneider, Howie Choset

  8. Trust-based Consensus in Multi-Agent Reinforcement Learning Systems

    Ho Long Fung, Victor-Alexandru Darvariu, Stephen Hailes, Mirco Musolesi

  9. Inception: Efficiently Computable Misinformation Attacks on Markov Games

    Jeremy McMahan, Young Wu, Yudong Chen, Jerry Zhu, Qiaomin Xie

  10. Human-compatible driving agents through data-regularized self-play reinforcement learning

    Daphne Cornelisse, Eugene Vinitsky

  11. On Welfare-Centric Fair Reinforcement Learning

    Cyrus Cousins, Kavosh Asadi, Elita Lobo, Michael Littman

  12. BetaZero: Belief-State Planning for Long-Horizon POMDPs using Learned Approximations

    Robert J. Moss, Anthony Corso, Jef Caers, Mykel Kochenderfer

Aug 10, Oral Track 4: Deep reinforcement learning - Room 163

  1. Dissecting Deep RL with High Update Ratios: Combatting Value Divergence

    Marcel Hussing, Claas A Voelcker, Igor Gilitschenski, Amir-massoud Farahmand, Eric Eaton

  2. Mixture of Experts in a Mixture of RL settings

    Timon Willi, Johan Samir Obando Ceron, Jakob Nicolaus Foerster, Gintare Karolina Dziugaite, Pablo Samuel Castro

  3. Light-weight Probing of Unsupervised Representations for Reinforcement Learning

    Wancong Zhang, Anthony GX-Chen, Vlad Sobal, Yann LeCun, Nicolas Carion

  4. Zero-shot cross-modal transfer of Reinforcement Learning policies through a Global Workspace

    Léopold Maytié, Benjamin Devillers, Alexandre Arnold, Rufin VanRullen

  5. PASTA: Pretrained Action-State Transformer Agents

    Raphael Boige, Yannis Flet-Berliac, Lars C.P.M. Quaedvlieg, Arthur Flajolet, Guillaume Richard, Thomas PIERROT

  6. Combining Reconstruction and Contrastive Methods for Multimodal Representations in RL

    Philipp Becker, Sebastian Mossburger, Fabian Otto, Gerhard Neumann

  7. A Recipe for Unbounded Data Augmentation in Visual Reinforcement Learning

    Abdulaziz Almuzairee, Nicklas Hansen, Henrik I Christensen

  8. On the consistency of hyper-parameter selection in value-based deep reinforcement learning

    Johan Samir Obando Ceron, João Guilherme Madeira Araújo, Aaron Courville, Pablo Samuel Castro

  9. Policy-Guided Diffusion

    Matthew Thomas Jackson, Michael Matthews, Cong Lu, Benjamin Ellis, Shimon Whiteson, Jakob Nicolaus Foerster

  10. SplAgger: Split Aggregation for Meta-Reinforcement Learning

    Jacob Beck, Matthew Thomas Jackson, Risto Vuorio, Zheng Xiong, Shimon Whiteson

  11. Learning to Optimize for Reinforcement Learning

    Qingfeng Lan, A. Rupam Mahmood, Shuicheng YAN, Zhongwen Xu

  12. Investigating the properties of neural network representations in reinforcement learning

    Han Wang, Erfan Miahi, Martha White, Marlos C. Machado, Zaheer Abbas, Raksha Kumaraswamy, Vincent Liu, Adam White

  13. Cheap and Deterministic Inference for Deep State-Space Models of Interacting Dynamical Systems

    Andreas Look, Barbara Rakitsch, Melih Kandemir, Jan Peters

Aug 11, Oral Track 1: RL from human feedback and imitation learning - Room 168

  1. Learning Action-based Representations Using Invariance

    Max Rudolph, Caleb Chuck, Kevin Black, Misha Lvovsky, Scott Niekum, Amy Zhang

  2. Representation Alignment from Human Feedback for Cross-Embodiment Reward Learning from Mixed-Quality Demonstrations

    Connor Mattson, Anurag Sidharth Aribandi, Daniel S. Brown

  3. Offline Reinforcement Learning from Datasets with Structured Non-Stationarity

    Johannes Ackermann, Takayuki Osa, Masashi Sugiyama

  4. Offline Diversity Maximization under Imitation Constraints

    Marin Vlastelica, Jin Cheng, Georg Martius, Pavel Kolev

  5. Imitation Learning from Observation through Optimal Transport

    Wei-Di Chang, Scott Fujimoto, David Meger, Gregory Dudek

  6. ROIL: Robust Offline Imitation Learning without Trajectories

    Gersi Doko, Guang Yang, Daniel S. Brown, Marek Petrik

  7. Agent-Centric Human Demonstrations Train World Models

    James Staley, Elaine Short, Shivam Goel, Yash Shukla

  8. Inverse Reinforcement Learning with Multiple Planning Horizons

    Jiayu Yao, Weiwei Pan, Finale Doshi-Velez, Barbara E Engelhardt

  9. Semi-Supervised One Shot Imitation Learning

    Philipp Wu, Kourosh Hakhamaneshi, Yuqing Du, Igor Mordatch, Aravind Rajeswaran, Pieter Abbeel

  10. Reinforcement Learning from Human Feedback without Reward Inference: Model-Free Algorithm and Instance-Dependent Analysis

    Qining Zhang, Honghao Wei, Lei Ying

  11. Guided Data Augmentation for Offline Reinforcement Learning and Imitation Learning

    Nicholas E. Corrado, Yuxiao Qu, John U. Balis, Adam Labiosa, Josiah P. Hanna

  12. Reward (Mis)design for autonomous driving☆

    W. Bradley Knox, Alessandro Allievi, Holger Banzhaf, Felix Schmitt, Peter Stone

  13. Models of human preference for learning reward functions

    W. Bradley Knox, Stephane Hatgis-Kessell, Serena Booth, Scott Niekum, Peter Stone, Alessandro G Allievi

Aug 11, Oral Track 2: Foundations - Room 165/169

  1. The Cliff of Overcommitment with Policy Gradient Step Sizes

    Scott M. Jordan, Samuel Neumann, James E. Kostas, Adam White, Philip S. Thomas

  2. Demystifying the Recency Heuristic in Temporal-Difference Learning

    Brett Daley, Marlos C. Machado, Martha White

  3. When does Self-Prediction help? Understanding Auxiliary Tasks in Reinforcement Learning

    Claas A Voelcker, Tyler Kastner, Igor Gilitschenski, Amir-massoud Farahmand

  4. A Natural Extension To Online Algorithms For Hybrid RL With Limited Coverage

    Kevin Tan, Ziping Xu

  5. States as goal-directed concepts: an epistemic approach to state-representation learning

    Nadav Amir, Yael Niv, Angela J Langdon

  6. Tiered Reward: Designing Rewards for Specification and Fast Learning of Desired Behavior

    Zhiyuan Zhou, Shreyas Sundara Raman, Henry Sowerby, Michael Littman

  7. Unifying Model-Based and Model-Free Reinforcement Learning with Equivalent Policy Sets

    Benjamin Freed, Thomas Wei, Roberto Calandra, Jeff Schneider, Howie Choset

  8. Multistep Inverse Is Not All You Need

    Alexander Levine, Peter Stone, Amy Zhang

  9. An Idiosyncrasy of Time-discretization in Reinforcement Learning

    Kris De Asis, Richard S. Sutton

  10. Bad Habits: Policy Confounding and Out-of-Trajectory Generalization in RL

    Miguel Suau, Matthijs T. J. Spaan, Frans A Oliehoek

  11. Mitigating the Curse of Horizon in Monte-Carlo Returns

    Alex Ayoub, David Szepesvari, Francesco Zanini, Bryan Chan, Dhawal Gupta, Bruno Castro da Silva, Dale Schuurmans

  12. Structure in Deep Reinforcement Learning: A Survey and Open Problems

    Aditya Mohan, Amy Zhang, Marius Lindauer

Aug 11, Oral Track 3: Applied reinforcement learning - Room 174/176

  1. Sequential Decision-Making for Inline Text Autocomplete

    Rohan Chitnis, Shentao Yang, Alborz Geramifard

  2. A Super-human Vision-based Reinforcement Learning Agent for Autonomous Racing in Gran Turismo

    Miguel Vasco, Takuma Seno, Kenta Kawamoto, Kaushik Subramanian, Peter R. Wurman, Peter Stone

  3. Towards General Negotiation Strategies with End-to-End Reinforcement Learning

    Bram M. Renting, Thomas M. Moerland, Holger Hoos, Catholijn M Jonker

  4. JoinGym: An Efficient Join Order Selection Environment

    Junxiong Wang, Kaiwen Wang, Yueying Li, Nathan Kallus, Immanuel Trummer, Wen Sun

  5. Policy Architectures for Compositional Generalization in Control

    Allan Zhou, Vikash Kumar, Chelsea Finn, Aravind Rajeswaran

  6. Verification-Guided Shielding for Deep Reinforcement Learning

    Davide Corsi, Guy Amir, Andoni Rodríguez, Guy Katz, César Sánchez, Roy Fox

  7. Physics-Informed Model and Hybrid Planning for Efficient Dyna-Style Reinforcement Learning

    Zakariae EL ASRI, Olivier Sigaud, Nicolas THOME

  8. Bidirectional-Reachable Hierarchical Reinforcement Learning with Mutually Responsive Policies

    Yu Luo, Fuchun Sun, Tianying Ji, Xianyuan Zhan

  9. RL for Consistency Models: Reward Guided Text-to-Image Generation with Fast Inference

    Owen Oertell, Jonathan Daniel Chang, Yiyi Zhang, Kianté Brantley, Wen Sun

  10. Learning to Navigate in Mazes with Novel Layouts using Abstract Top-down Maps

    Linfeng Zhao, Lawson L.S. Wong

  11. Revisiting Sparse Rewards for Goal-Reaching Reinforcement Learning

    Gautham Vasan, Yan Wang, Fahim Shahriar, James Bergstra, Martin Jägersand, A. Rupam Mahmood

  12. Multi-view Disentanglement for Reinforcement Learning with Multiple Cameras

    Mhairi Dunion, Stefano V Albrecht

  13. Emergent behaviour and neural dynamics in artificial agents tracking odour plumes

    Satpreet H. Singh, Floris van Breugel, Rajesh P. N. Rao, Bingni W. Brunton

  14. GVFs in the real world: making predictions online for water treatment

    Muhammad Kamran Janjua, Haseeb Shah, Martha White, Erfan Miahi, Marlos C. Machado, Adam White

Aug 11, Oral Track 4: RL algorithms - Room 163

  1. Weight Clipping for Deep Continual and Reinforcement Learning

    Mohamed Elsayed, Qingfeng Lan, Clare Lyle, A. Rupam Mahmood

  2. Policy Gradient Algorithms with Monte Carlo Tree Learning for Non-Markov Decision Processes

    Tetsuro Morimura, Kazuhiro Ota, Kenshi Abe, Peinan Zhang

  3. ROER: Regularized Optimal Experience Replay

    Changling Li, Zhang-Wei Hong, Pulkit Agrawal, Divyansh Garg, Joni Pajarinen

  4. Learning Discrete World Models for Heuristic Search

    Forest Agostinelli, Misagh Soltani

  5. Boosting Soft Q-Learning by Bounding

    Jacob Adamczyk, Volodymyr Makarenko, Stas Tiomkin, Rahul V Kulkarni

  6. Reward Centering

    Abhishek Naik, Yi Wan, Manan Tomar, Richard S. Sutton

  7. Stabilizing Extreme Q-learning by Maclaurin Expansion

    Motoki Omura, Takayuki Osa, YUSUKE Mukuta, Tatsuya Harada

  8. Contextualized Hybrid Ensemble Q-learning: Learning Fast with Control Priors

    Emma Cramer, Bernd Frauenknecht, Ramil Sabirov, Sebastian Trimpe

  9. A Simple Mixture Policy Parameterization for Improving Sample Efficiency of CVaR Optimization

    Yudong Luo, Yangchen Pan, Han Wang, Philip Torr, Pascal Poupart

  10. PID Accelerated Temporal Difference Algorithms

    Mark Bedaywi, Amin Rakhsha, Amir-massoud Farahmand

  11. SwiftTD: A Fast and Robust Algorithm for Temporal Difference Learning

    Khurram Javed, Arsalan Sharifnassab, Richard S. Sutton

  12. Posterior Sampling for Continuing Environments

    Wanqiao Xu, Shi Dong, Benjamin Van Roy

  13. Off-Policy Actor-Critic with Emphatic Weightings

    Eric Graves, Ehsan Imani, Raksha Kumaraswamy, Martha White

Aug 12, Oral Track 1: Social and economic aspects - Room 168

  1. Value Internalization: Learning and Generalizing from Social Reward

    Frieda Rong, Max Kleiman-Weiner

  2. Can Differentiable Decision Trees Enable Interpretable Reward Learning from Human Feedback?

    Akansha Kalra, Daniel S. Brown

  3. Three Dogmas of Reinforcement Learning

    David Abel, Mark K Ho, Anna Harutyunyan

  4. MultiHyRL: Robust Hybrid RL for Obstacle Avoidance against Adversarial Attacks on the Observation Space

    Jan de Priester, Zachary Bell, Prashant Ganesh, Ricardo Sanfelice

  5. Enabling Intelligent Interactions between an Agent and an LLM: A Reinforcement Learning Approach

    Bin Hu, Chenyang Zhao, Pu Zhang, Zihao Zhou, Yuanhang Yang, Zenglin Xu, Bin Liu

  6. Risk Sensitive Dead-end Identification in Safety-Critical Offline Reinforcement Learning

    Taylor W. Killian, Sonali Parbhoo, Marzyeh Ghassemi

Aug 12, Oral Track 2: Theoretical RL - Room 165/169

  1. The Role of Inherent Bellman Error in Offline Reinforcement Learning with Linear Function Approximation

    Noah Golowich, Ankur Moitra

  2. Optimizing Rewards while meeting ω-regular Constraints

    Christopher Zeitler, Kristina Miller, Sayan Mitra, John Schierman, Mahesh Viswanathan

  3. Distributionally Robust Constrained Reinforcement Learning under Strong Duality

    Zhengfei Zhang, Kishan Panaganti, Laixi Shi, Yanan Sui, Adam Wierman, Yisong Yue

  4. Non-adaptive Online Finetuning for Offline Reinforcement Learning

    Audrey Huang, Mohammad Ghavamzadeh, Nan Jiang, Marek Petrik

  5. Constant Stepsize Q-learning: Distributional Convergence, Bias and Extrapolation

    Yixuan Zhang, Qiaomin Xie

  6. An Optimal Tightness Bound for the Simulation Lemma

    Sam Lobel, Ronald Parr

  7. Temporal Difference Learning with Compressed Updates: Error-Feedback meets Reinforcement Learning

    Aritra Mitra, George J. Pappas, Hamed Hassani

Aug 12, Oral Track 3: Exploration - Room 174/176

  1. The Limits of Pure Exploration in POMDPs: When the Observation Entropy is Enough

    Riccardo Zamboni, Duilio Cirino, Marcello Restelli, Mirco Mutti

  2. Surprise-Adaptive Intrinsic Motivation for Unsupervised Reinforcement Learning

    Adriana Hugessen, Roger Creus Castanyer, Faisal Mohamed, Glen Berseth

  3. Exploring Uncertainty in Distributional Reinforcement Learning

    Georgy Antonov, Peter Dayan

  4. More Efficient Randomized Exploration for Reinforcement Learning via Approximate Sampling

    Haque Ishfaq, Yixin Tan, Yu Yang, Qingfeng Lan, Jianfeng Lu, A. Rupam Mahmood, Doina Precup, Pan Xu

  5. Planning to Go Out-of-Distribution in Offline-to-Online Reinforcement Learning

    Trevor McInroe, Adam Jelley, Stefano V Albrecht, Amos Storkey

  6. Action Noise in Off-Policy Deep Reinforcement Learning: Impact on Exploration and Performance

    Jakob Hollenstein, Sayantan Auddy, Matteo Saveriano, Erwan Renaudo, Justus Piater

Aug 12, Oral Track 4: Hierarchical RL and planning algorithms - Room 163

  1. Online Planning in POMDPs with State-Requests

    Raphaël Avalos, Eugenio Bargiacchi, Ann Nowe, Diederik Roijers, Frans A Oliehoek

  2. Informed POMDP: Leveraging Additional Information in Model-Based RL

    Gaspard Lambrechts, Adrien Bolland, Damien Ernst

  3. Bounding-Box Inference for Error-Aware Model-Based Reinforcement Learning

    Erin J Talvitie, Zilei Shao, Huiying Li, Jinghan Hu, Jacob Boerma, Rory Zhao, Xintong Wang

  4. Dreaming of Many Worlds: Learning Contextual World Models aids Zero-Shot Generalization

    Sai Prasanna, Karim Farid, Raghu Rajan, André Biedenkapp

  5. Learning Abstract World Models for Value-preserving Planning with Options

    Rafael Rodriguez-Sanchez, George Konidaris

  6. Granger Causal Interaction Skill Chains

    Caleb Chuck, Kevin Black, Aditya Arjun, Yuke Zhu, Scott Niekum

  7. On Uncertainty in Deep State Space Models for Model-Based Reinforcement Learning

    Philipp Becker, Gerhard Neumann

  8. Mitigating Value Hallucination in Dyna-Style Planning via Multistep Predecessor Models

    Farzane Aminmansour, Taher Jafferjee, Ehsan Imani, Erin J. Talvitie, Michael Bowling, Martha White


点击阅读原文:获取完整pdf

深度强化学习实验室
【开源开放、共享共进】强化学习社区\x26amp;实验室,分享推动DeepRL技术落地与社区发展,社区 deeprlhub.com
 最新文章