PAMI 2024 | 端到端自动驾驶的主流方案汇总(1)

文摘   2024-08-15 00:26   上海  
Projection:https://github.com/OpenDriveLab/End-to-end-Autonomous-Driving
Arxiv:https://arxiv.org/abs/2306.16927
本期概述
哈喽大家早上好!今天又是拉通对齐的一天!
PAMI(IEEE Transactions on Pattern Analysis and Machine Intelligence)是IEEE旗下的一本顶级期刊。本期我们分享PAMI 2024的一篇综述论文:End-to-end Autonomous Driving: Challenges and Frontiers!
由于本篇论文的内容特别充实,我们本期只专注于端到端自动驾驶的方法流派!明天李小毛将重点介绍端到端自动驾驶中的关键问题,欢迎关注[端到端自动驾驶],下不迷路!
E2E?最终答案?
Classical Approach(传统方法)传统的自动驾驶系统通常采用模块化的方法,分成感知(Perception)、预测(Prediction)、规划(Planning)等独立的模块。End-to-end Paradigm(端到端范式):将所有的功能模块融合在一个模型中进行联合优化。感知、预测和规划都在同一个模型内联合优化,反向传播(backpropagation)用于优化整个系统
Classical Approach(传统方法):采用模块化设计,每个模块独立处理感知、预测和规划等任务。
  • 优点:在于其可解释性强,每个模块的功能明确,可以独立调试。
  • 局限:(1)由于每个模块的优化目标不同,如感知模块可能会优化平均精度(mAP),而规划模块则侧重于驾驶的安全性和舒适性,这导致整个系统的优化目标可能无法统一协调。(2)由于系统是按顺序处理的,每个模块之间的误差会累积,从而导致信息的逐步丢失,不仅增加了计算负担,还可能导致计算资源的次优利用 。
End-to-end Paradigm(端到端范式)将感知、预测和规划任务集成到一个单一模型中。
  • 优点:(1)简化了系统结构,通过联合训练整个模型,可以实现从感知到控制输出的无缝衔接,提高整体性能。(2)具有更高的计算效率,共享的骨干网络可以减少冗余计算。数据驱动的优化方法也使得通过增加训练资源,系统性能可以显著提升。减少了模块间的错误累积,提高计算资源利用效率,提高系统的鲁棒性泛化能力
两个流派?

端到端自动驾驶中的两种关键方法:模仿学习(Imitation Learning)和强化学习(Reinforcement Learning)。前者包括行为克隆(Behavior Cloning)和逆最优控制(Inverse Optimal Control)
(1)模仿学习:通过模仿专家行为来训练代理(agent)。通过学习,使得代理的策略π输出的规划轨迹控制信号尽可能匹配专家的策略πβ模仿学习需要一个包含专家策略生成的轨迹数据集(即一系列状态-动作对)的数据集D = {ξi}。
模仿学习-行为克隆(Behavior Cloning)通过最小化以下损失函数来训练代理的策略:
表示代理的动作与专家动作之间的距离(损失)。行为克隆会将每个状态 作为输入,使用一个神经网络模型 来预测相应的动作 。通过最小化这个损失函数,代理能够学习如何在给定状态下做出与专家相似的决策。

行为克隆方法,通过专家策略(πβ)生成数据,数据缓冲区保存了专家状态-动作对。通过监督学习学习一个策略π,以模仿专家的行为。
代表作:利用端到端的神经网络从摄像头输入生成控制信号[3, 8, 51]、多传感器输入[6, 52]、辅助任务[16, 28]、改进的专家设计(多模态融合)[21]、BC的协变量偏移问题[26, 53, 54, 55]、BC的因果混淆问题[5, 10, 25, 56]

[3]: D. A. Pomerleau, "Alvinn: An autonomous land vehicle in a neural network," in NeurIPS, 1988.

[5]: S. Casas, W. Luo, A. Sadat, and R. Urtasun, "MP3: A unified model to map, perceive, predict and plan," in CVPR, 2021.

[6]: A. Prakash, K. Chitta, and A. Geiger, "Multi-modal fusion transformer for end-to-end autonomous driving," in CVPR, 2021.

[8]: M. Bojarski, D. Del Testa, D. Dworakowski, B. Firner, B. Flepp, P. Goyal, L. D. Jackel, M. Monfort, U. Muller, and J. Zhang, "End to end learning for self-driving cars," arXiv.org, vol. 1604.07316, 2016.

[10]: M. Müller, “End-to-end imitation learning with conditional adversarial networks,” arXiv preprint arXiv:1805.01987, 2018.

[16]: A. Dosovitskiy, G. Ros, F. Codevilla, A. Lopez, and V. Koltun, "CARLA: An open urban driving simulator," in CoRL, 2017.

[21]: A. Sadat, S. Casas, M. Ren, X. Wu, P. Dhawan, and R. Urtasun, "Perceive, predict, and plan: Safe motion planning through interpretable semantic representations," in ECCV, 2020.

[25]: A. Codevilla, E. Santana, A. M. Lopez, and A. Gaidon, "Exploring the limitations of behavior cloning for autonomous driving," in ICCV, 2019.

[28]: A. Prakash, K. Chitta, and A. Geiger, "Multi-modal fusion transformer for end-to-end autonomous driving," in CVPR, 2021.

[39]: A. Sadat, S. Casas, M. Ren, X. Wu, P. Dhawan, and R. Urtasun, "Perceive, predict, and plan: Safe motion planning through interpretable semantic representations," in ECCV, 2020.

[52]: W. Zeng, W. Luo, S. Suo, A. Sadat, B. Yang, S. Casas, and R. Urtasun, "End-to-end interpretable neural motion planner," in CVPR, 2019.

[56]: L. I. Kunze, F. Landgraf, T. Ruhkopf, D. Gill, and K. Dietmayer, “Meta-learning with non-iid data for class-incremental continual learning,” in CVPR Workshops, 2021.

[58]: J. Wen, Y. Li, T. Luo, H. Wang, and W. Li, "DRIFT: A framework for improving the generalization of object detection models under distribution shift," in NeurIPS, 2020.

该方法是大部分上车端到端自动驾驶模型使用的,李小毛在此补充几条~

CVPR自动驾驶公开赛冠军!Hydra-MDP: 端到端多模态规划与多目标 Hydra 蒸馏

IROS2024 | ParkingE2E:端到端自动泊车模型

模仿学习-逆最优控制(Inverse Optimal Control,IOC):通过专家的演示数据来推导出一个能够解释专家行为的奖励函数,然后基于该奖励函数来优化代理的策略。在连续的高维自动驾驶场景中,奖励的定义是隐含的,难以优化。

逆最优控制方法,通过专家策略(πβ)收集数据,这些数据包括专家在不同状态下采取的动作。学习成本函数C并优化轨迹集合T来找到具有最小成本的最优轨迹τ,最终拟合策略π

代表作:轨迹代价从固定的专家轨迹采样轨迹[1]、鸟瞰图(BEV)中学习代价体积[32]、运动学模型学习代价体积[32, 39, 70]、从其他代理的未来运动中计算的联合代价体积[69]、从概率语义占用或空闲空间层计算代价体积[39, 70, 71]、生成对抗模仿学习(Generative Adversarial Imitation Learning, GAIL)[65, 66, 67]

[1]: S. Casas, A. Sadat, and R. Urtasun, "MP3: A unified model to map, perceive, predict and plan," in CVPR, 2021.

[32]: W. Zeng, W. Luo, S. Suo, A. Sadat, B. Yang, S. Casas, and R. Urtasun, "End-to-end interpretable neural motion planner," in CVPR, 2019.

[39]: A. Sadat, S. Casas, M. Ren, X. Wu, P. Dhawan, and R. Urtasun, "Perceive, predict, and plan: Safe motion planning through interpretable semantic representations," in ECCV, 2020.

[65]: J. Ho and S. Ermon, "Generative adversarial imitation learning," in NeurIPS, 2016.
[66]: Y. Li, J. Song, and S. Ermon, "Infogail: Interpretable imitation learning from visual demonstrations," in NeurIPS, 2017.

[67]: G. Lee, D. Kim, W. Oh, K. Lee, and S. Oh, "Mixgail: Autonomous driving using demonstrations with mixed qualities," in IROS, 2020.

[69]: H. Wang, P. Cai, R. Fan, Y. Sun, and M. Liu, "End-to-end interactive prediction and planning with optical flow distillation for autonomous driving," in CVPR Workshops, 2021.

[70]: P. Hu, A. Huang, J. Dolan, D. Held, and D. Ramanan, "Safe local motion planning with self-supervised freespace forecasting," in CVPR, 2021.

[71]: T. Khurana, P. Hu, A. Dave, J. Ziglar, D. Held, and D. Ramanan, "Differentiable raycasting for self-supervised occupancy forecasting," in ECCV, 2022.

(2)强化学习(Reinforcement Learning):通过试错学习策略的领域,在环境中执行一系列动作来学习最佳的策略。RL特别擅长解决那些无法直接定义明确目标的复杂问题,在自动驾驶中的应用主要集中在仿真环境中。

强化学习方法中,系统通过与环境的反复互动来学习。策略πk\pi_k应用于当前环境,系统通过与环境的交互生成新数据,然后更新策略(πk+1\pi_{k+1}

代表作:车辆在空旷街道上的车道保持[4]RL与监督学习(Supervised Learning, SL)结合[18, 19]、微调通过模仿学习(Imitation Learning, IL)预训练的网络[17, 79]、特权BEV语义地图上训练了RL代理,并利用该策略自动收集数据集,以训练下游的IL代理[21]、采用了Q函数和表格动态规划,为静态数据集生成额外或改进的标签[20]、RL网络访问模拟器信息[80, 81]

[4]: A. Kendall, J. Hawke, D. Janz, et al., “Learning to drive in a day,” in ICRA, 2019.

[17]: X. Liang, T. Wang, L. Yang, and E. Xing, “CIRL: Controllable imitative reinforcement learning for vision-based self-driving,” in ECCV, 2018.

[18]: M. Toromanoff, E. Wirbel, and F. Moutarde, “End-to-end model-free reinforcement learning for urban driving using implicit affordances,” in CVPR, 2020.

[19]: R. Chekroun, M. Toromanoff, S. Hornauer, and F. Moutarde, “GRI: General reinforced imitation and its application to vision-based autonomous driving,” Robotics, 2023.GRI: 

[20]: D. Chen, V. Koltun, and P. Krähenbühl, “Learning to drive from a world on rails,” in ICCV, 2021.

[21]: Z. Zhang, A. Liniger, D. Dai, F. Yu, and L. Van Gool, “End-to-end urban driving by imitating a reinforcement learning coach,” in ICCV, 2021.

[79]: E. Ohn-Bar, A. Prakash, A. Behl, K. Chitta, and A. Geiger, “Learning situational driving,” in CVPR, 2020.

[80]: W. B. Knox, A. Allievi, H. Banzhaf, F. Schmitt, and P. Stone, “Reward (mis)design for autonomous driving,” AI, 2023.

[81]: C. Zhang, R. Guo, W. Zeng, et al., “Rethinking closed-loop training for autonomous driving,” in ECCV, 2022.

本期结语
李小毛理解,这项综述应该写在23年,有一些方法没有纳入流派总结中,比如比较热门的UAD方法,就是使用了无监督的方法:
无需模块化和 3D 手动标注的端到端自动驾驶,UAD框架解析(1)
还有大型语言模型(LLMs),也并不属于上面的总结:
CVPR 2024 | LMDrive:使用大语言模型的闭环端到端自动驾驶 Pipeline详解
CVPR 2024 | LMDrive:使用大语言模型的闭环端到端自动驾驶 模块实现
港大&华为诺亚 | DriveGPT4:可解释的端到端自动驾驶!
就这样~下期见!
如果对你的开发、科研有帮助,拜托拜托关注我们,我们将持续奉上优秀的端到端自动驾驶领域研究的分享干货!

温馨提示:点赞=学会,收藏=精通
点击在看,我们一起充电!

端到端自动驾驶
关注AD(Autonomous Driving)行业最前沿的人工智能解决方案,致力于打造为一个自动驾驶从业者及相关科研学者们的课外充电,技术分享,以及社区交流的服务平台!
 最新文章