强推收藏!使用大语言模型的自动驾驶方案研究综述

文摘   2024-08-16 09:34   上海  

Projection:https://github.com/Thinklab-SJTU/Awesome-LLM4AD

Arxiv:https://arxiv.org/pdf/2311.01043

本期概述

喽大家周五快乐!非常开心能够和大家分享经验、分享时间!

还记得上期介绍的端到端自动驾驶综述吗?李小毛觉得这篇文章似乎写在23年上半年,很多我们分享过的文章都没有包含在内我们今天学习一篇比较新的论文:LLM4Drive: A Survey of Large Language Models for Autonomous Driving。LLM4Drive同样是综述,主要研究了使用大语言模型(LLMs)的自动驾驶系统!这篇论文在本周更新(24年8月12日),基本上汇总了最全的相关研究!enjoy~

图中左侧和右侧分别表示通过模拟器和离线数据集来提升驾驶能力的两种传统方法。当前方法依赖于数据来提升系统性能,但仍然无法完全覆盖极端CASE。大语言模型可以通过整合常识(Common Sense,蓝色实线箭头)来弥补这一不足。

大语言模型 For 自动驾驶

大语言模型(LLMs)在自动驾驶(AD)系统中的应用PipeLine。整个系统分为三个层次:输入(Inputs)模型(Modal)任务(Tasks)输入(Inputs)分为传感器和Token。模型(Modal)分为大语言模型(LLMs)与视觉网络(Visual Network)和多模态模型(Multi-Modal)。任务(Tasks)包括规划与控制、感知、问答与生成。规控涉及行为、路径和意图的规划,依赖于从视觉网络和LLMs。感知包括检测、分割、跟踪、运动预测和轨迹预测,由视觉网络处理的传感器数据支持,LLMs提供辅助。问答生成任务直接依赖于LLMs,问答任务用于回答自然语言问题,生成任务用于生成驾驶场景或预测视频。

(1)预测任务
使用语言提示作为语义线索,将LLMs与3D检测任务和跟踪任务结合[2309.04379] Language Prompt for Autonomous Driving (arxiv.org)

将高分辨率信息引入多模态大型语言模型中,用于风险物体定位和意图与建议预测(ROLISP)任务[2309.05186] HiLM-D: Towards High-Resolution Understanding in Multimodal Large Language Models for Autonomous Driving (arxiv.org)

将预训练语言模型作为文本输入编码器集成到自动驾驶轨迹预测任务中。[2309.05282] Can you text what is happening? Integrating pre-trained language encoders into trajectory prediction models for autonomous driving (arxiv.org)

利用LLM的能力理解复杂场景,提高预测性能,并通过生成车道变换意图和轨迹的解释来提供可解释的预测[2403.18344] LC-LLM: Explainable Lane-Change Intention and Trajectory Predictions with Large Language Models (arxiv.org)

结合了VLM自动数据查询和标注,以及伪标签的持续学习[2403.17373] AIDE: An Automatic Data Engine for Object Detection in Autonomous Driving (arxiv.org)

设计并实施了提示工程,以使GPT4-V能够理解复杂的交通场景[2403.11057] Large Language Models Powered Context-aware Motion Prediction in Autonomous Driving (arxiv.org)

多任务决策模型在无信号交叉路口的自动驾驶场景中进行决策[2307.16118] MTD-GPT: A Multi-Task Decision-Making GPT Model for Autonomous Driving at Unsignalized Intersections (arxiv.org)

语言增强的目标导向闭环端到端自动驾驶解决方案。[2403.20116] LeGo-Drive: Language-enhanced Goal-oriented Closed-Loop End-to-End Autonomous Driving (arxiv.org)

(2)规控任务
大语言模型在自动驾驶中的安全决策应用[2312.00812] Empowering Autonomous Driving with Large Language Models: A Safety Perspective (arxiv.org)
大语言模型在自动驾驶中的实用性验证[2312.09397] Personalized Autonomous Driving with Large Language Models: Field Experiments (arxiv.org)
ChatGPT作为自动驾驶中的智能副驾驶ChatGPT as Your Vehicle Co-Pilot: An Initial Attempt | IEEE Journals & Magazine | IEEE Xplore
模拟人类驾驶反应的大语言模型[2310.08034] Receive, Reason, and React: Drive as You Say with Large Language Models in Autonomous Vehicles (arxiv.org)
通过大语言模型在动态环境中进行自动驾驶轨迹规划[2310.03026] LanguageMPC: Large Language Models as Decision Makers for Autonomous Driving (arxiv.org)
语言增强技术生成鸟瞰视图地图[2310.02251] Talk2BEV: Language-enhanced Bird's-eye View Maps for Autonomous Driving (arxiv.org)
基于大语言模型的虚拟驾驶员框架[2309.13193] SurrealDriver: Designing LLM-powered Generative Driver Agent Framework based on Human Drivers' Driving-thinking Data (arxiv.org)
大语言模型实现了自动驾驶中的人类语音交互[2309.10228] Drive as You Speak: Enabling Human-Like Interaction with Large Language Models in Autonomous Vehicles (arxiv.org)
能够理解和处理复杂交通场景的大语言模型,提高交通管理效率[2309.06719] TrafficGPT: Viewing, Processing and Interacting with Traffic Foundation Models (arxiv.org)
模拟人类驾驶行为的端到端自动驾驶模型[2307.07162] Drive Like a Human: Rethinking Autonomous Driving with Large Language Models (arxiv.org)
结合认知推理的大语言模型方法[2309.16292] DiLu: A Knowledge-Driven Approach to Autonomous Driving with Large Language Models (arxiv.org)
大语言模型优化复杂城市环境中的交通信号控制[2403.08337] LLM-Assisted Light: Leveraging Large Language Model Capabilities for Human-Mimetic Traffic Signal Control in Complex Urban Environments (arxiv.org)
多模态大模型进行实时交通事故分析和预防[2312.13156] AccidentGPT: Accident Analysis and Prevention from V2X Environmental Perception with Multi-modal Large Model (arxiv.org)
基于语言推理的闭环规划工具[2401.00125] LLM-Assist: Enhancing Closed-Loop Planning with Language-Based Reasoning (arxiv.org)
无训练机制的适应性驾驶策略[2402.05932] Driving Everywhere with Large Language Model Policy Adaptation (arxiv.org)
微调预训练模型的方法:
多模态对齐,实现行为规划状态与大语言模型的无缝结合[2312.09245] DriveMLM: Aligning Multi-Modal Large Language Models with Behavioral Planning States for Autonomous Driving (arxiv.org)
提出闭环端到端驾驶模型[2312.07488] LMDrive: Closed-Loop End-to-End Driving with Large Language Models (arxiv.org)
将语言代理引入自动驾驶[2311.10813] A Language Agent for Autonomous Driving (arxiv.org)
利用GPT模型学习驾驶行为[2310.01415] GPT-Driver: Learning to Drive with GPT (arxiv.org)
结合图形视觉问答与驾驶模型,实现了深层场景理解和决策优化。[2312.14150] DriveLM: Driving with Graph Visual Question Answering (arxiv.org)
一种可解释的端到端驾驶模型,通过大语言模型解读驾驶场景和决策路径。DriveGPT4: Interpretable End-to-end Autonomous Driving via Large Language Model (arxiv.org)
探索大语言模型作为自动驾驶代理的潜力,展示了其在多任务处理中的优势。Driving with LLMs: Fusing Object-Level Vector Modality for Explainable Autonomous Driving | IEEE Conference Publication | IEEE Xplore
用于无信号交叉路口决策的多任务GPT模型[2307.16118] MTD-GPT: A Multi-Task Decision-Making GPT Model for Autonomous Driving at Unsignalized Intersections (arxiv.org)
基于知识驱动的多代理框架,将大语言模型应用于协同自动驾驶[2407.14239] KoMA: Knowledge-driven Multi-agent Framework for Autonomous Driving with Large Language Models (arxiv.org)
利用异步大语言模型提升自动驾驶中的任务并发处理能力,减少了延迟[2406.14556] Asynchronous Large Language Model Enhanced Planner for Autonomous Driving (arxiv.org)
引入规划感知的大语言模型,优化路径规划和决策流程[2406.01587] PlanAgent: A Multi-modal Large Language Agent for Closed-loop Vehicle Motion Planning (arxiv.org)
提出了协作驾驶框架,通过大语言模型增强多驾驶员协同和持续学习的能力。[2404.06345] AgentsCoDriver: Large Language Model Empowered Collaborative Driving with Lifelong Learning (arxiv.org)
融合视觉和语言模型[2402.12289] DriveVLM: The Convergence of Autonomous Driving and Large Vision-Language Models (arxiv.org)
利用检索增强的上下文学习,实现更具可解释性和普遍适用性的驾驶决策[2402.10828] RAG-Driver: Generalisable Driving Explanations with Retrieval-Augmented In-Context Learning in Multi-Modal Large Language Model (arxiv.org)
将视觉语言规划融入自动驾驶[2401.05577] VLP: Vision Language Planning for Autonomous Driving (arxiv.org)
结合人类决策逻辑和3D场景感知,优化了动态环境中的决策过程。[2401.03641] DME-Driver: Integrating Human Decision Logic and 3D Scene Perception in Autonomous Driving (arxiv.org)
(3)生成任务

通用世界模型,提升模型泛化能力ADriver-I: Generalizable World Models with Action-Rational Decision-Making for Autonomous Driving.

引入潜在扩散模型,生成高质量、多视角的驾驶场景视频DrivingDiffusion: Harnessing Latent Diffusion Models for Video Generation in Autonomous Driving.

自适应性和高质量的多视角视频生成DriveDreamer: World Model-based Autonomous Driving Video Generation.

场景级扩散技术,优化了交通模拟的生成质量CTG++: Towards Generalizable Cross-Task Generation with Pre-trained Models.

可控的驾驶场景生成GAIA-1: Generalizable Autonomous Driving World Models for Real-time Adaptive Driving.

利用3D几何控制生成街景视图MagicDrive: Controllable Scene Generation via 3D Geometric Manipulation for Autonomous Driving.

多视角世界模型应用于自动驾驶规划,端到端驾驶中视角一致的视频生成方法Driving into the Future: View-consistent Video Generation for Autonomous Driving with World Models.

通过LLM生成和模拟挑战性的安全关键场景ChatScene: Generative Safety-critical Autonomous Driving Simulation with Large Language Models.

利用GPT-4进行奖赏函数的生成和优化,为自动驾驶中的强化学习提供了接近人类驾驶标准的评分机制。REvolve: Reinforcement Learning Reward Functions Generation using Large Language Models.

基于大规模视频预测模型,通过时间推理模块提升自动驾驶中对多样驾驶场景的泛化能力。GenAD: Generative Video Prediction for Autonomous Driving.

在DriveDreamer基础上引入LLM,生成定制化的高质量多视角驾驶视频DriveDreamer-2: Leveraging LLMs for Customized Autonomous Driving Video Generation.

通过自然语言命令实现可编辑的高质量3D驾驶场景模拟ChatSim: High-fidelity 3D Autonomous Driving Scene Simulation via Natural Language Commands.

集成人类模仿推理能力,优化复杂交通场景下的信号控制LLM-Assisted Light: Human-like Traffic Signal Control via Large Language Models.

采用强化学习优化LLM生成的代码,实现自动驾驶中的代码生成与优化LangProp: Code Generation and Optimization in Autonomous Driving via Reinforcement Learning and Large Language Models.

(4)问答任务
视觉问答:

多模态大语言模型与行为规划状态的对齐DriveMLM: Aligning Multi-Modal Large Language Models with Behavioral Planning States for Autonomous Driving.

将图形视觉问答整合到驾驶任务中DriveLM: Driving with Graph Visual Question Answering.

引入链式推理和可解释性机制,使决策过程更透明和可验证。Reason2Drive: Towards Interpretable and Chain-Based Reasoning for Autonomous Driving.

通过视频问答形式,实现对自动驾驶过程中动态场景的实时分析和反馈LingoQA: Video Question Answering for Autonomous Driving.

采用多模态语言模型,通过“思维链”增强了模型推理能力Dolphins: Multimodal Language Model for Driving.

结合大语言模型,实时解释驾驶决策DriveGPT4: Interpretable End-to-End Autonomous Driving via Large Language Model.

集成安全评估模块和对齐场景,提高系统的安全性和模型对齐度A Superalignment Framework: A Framework for Aligning Models in Autonomous Driving.

高效轻量级的多帧视觉语言模型EM-VLM4AD: Multi-Frame, Lightweight & Efficient Vision-Language Models for Question Answering in Autonomous Driving.

过多模态生成预训练模型提升运输领域复杂任务处理能力TransGPT:Multi-modal Generative Pre-trained Transformer for Transportation.

传统问答:

将多模态大型语言模型与自动驾驶的行为规划状态保持一致Domain Knowledge Distillation: An Empirical Study in the Autonomous Driving Domain.

通过大语言模型强化用户命令的理解与推理能力Human-Centric Autonomous Systems: With LLMs for User Command Reasoning.

通过大语言模型构建并验证了自动驾驶系统的安全性需求Engineering Safety: Requirements for Autonomous Driving with Large Language Models.

将大语言模型与混合推理相结合的方式Hybrid Reasoning: Based on Large Language Models for Autonomous Car Driving.

本期结语
本篇综述可以说是简单粗暴,上来就是对现有论文的整理。。基本上没有加入总结性阐述。其实后面还有评估指标和数据集方面的论文,李小毛个人不是很感兴趣这部分内容,有需要这部分的整理内容的同学请私戳后台!
往期回顾
CVPR2024 | 通过大语言模型实现可编辑逼真3D驾驶场景的仿真
高度提炼|自动驾驶算法船新框架?CVPR 2023 Best PaperUniAD框架解析(1)
高度提炼|自动驾驶算法船新框架?CVPR 2023 Best PaperUniAD代码解析(2)
CVPR 2024 | 华为诺亚:注入BEV视角的多模态大模型
深度聚焦|最强落地端到端自动驾驶算法!Tesla FSD v12公开模型分享!
如果对你的开发、科研有帮助,拜托拜托关注我们,我们将持续奉上优秀的端到端自动驾驶领域研究的分享干货!

温馨提示:点赞=学会,收藏=精通
点击在看,我们一起充电!

端到端自动驾驶
关注AD(Autonomous Driving)行业最前沿的人工智能解决方案,致力于打造为一个自动驾驶从业者及相关科研学者们的课外充电,技术分享,以及社区交流的服务平台!
 最新文章