Projection:https://github.com/Thinklab-SJTU/Awesome-LLM4AD
Arxiv:https://arxiv.org/pdf/2311.01043
本期概述
哈喽大家周五快乐!非常开心能够和大家分享经验、分享时间!
还记得上期介绍的端到端自动驾驶综述吗?李小毛觉得这篇文章似乎写在23年上半年,很多我们分享过的文章都没有包含在内我们今天学习一篇比较新的论文:LLM4Drive: A Survey of Large Language Models for Autonomous Driving。LLM4Drive同样是综述,主要研究了使用大语言模型(LLMs)的自动驾驶系统!这篇论文在本周更新(24年8月12日),基本上汇总了最全的相关研究!enjoy~
图中左侧和右侧分别表示通过模拟器和离线数据集来提升驾驶能力的两种传统方法。当前方法依赖于数据来提升系统性能,但仍然无法完全覆盖极端CASE。大语言模型可以通过整合常识(Common Sense,蓝色实线箭头)来弥补这一不足。
大语言模型 For 自动驾驶
大语言模型(LLMs)在自动驾驶(AD)系统中的应用PipeLine。整个系统分为三个层次:输入(Inputs)、模型(Modal)和任务(Tasks)。输入(Inputs)分为传感器和Token。模型(Modal)分为大语言模型(LLMs)与视觉网络(Visual Network)和多模态模型(Multi-Modal)。任务(Tasks)包括规划与控制、感知、问答与生成。规控涉及行为、路径和意图的规划,依赖于从视觉网络和LLMs。感知包括检测、分割、跟踪、运动预测和轨迹预测,由视觉网络处理的传感器数据支持,LLMs提供辅助。问答和生成任务直接依赖于LLMs,问答任务用于回答自然语言问题,生成任务用于生成驾驶场景或预测视频。
将高分辨率信息引入多模态大型语言模型中,用于风险物体定位和意图与建议预测(ROLISP)任务[2309.05186] HiLM-D: Towards High-Resolution Understanding in Multimodal Large Language Models for Autonomous Driving (arxiv.org)
将预训练语言模型作为文本输入编码器集成到自动驾驶轨迹预测任务中。[2309.05282] Can you text what is happening? Integrating pre-trained language encoders into trajectory prediction models for autonomous driving (arxiv.org)
利用LLM的能力理解复杂场景,提高预测性能,并通过生成车道变换意图和轨迹的解释来提供可解释的预测[2403.18344] LC-LLM: Explainable Lane-Change Intention and Trajectory Predictions with Large Language Models (arxiv.org)
结合了VLM自动数据查询和标注,以及伪标签的持续学习[2403.17373] AIDE: An Automatic Data Engine for Object Detection in Autonomous Driving (arxiv.org)
设计并实施了提示工程,以使GPT4-V能够理解复杂的交通场景[2403.11057] Large Language Models Powered Context-aware Motion Prediction in Autonomous Driving (arxiv.org)
多任务决策模型在无信号交叉路口的自动驾驶场景中进行决策[2307.16118] MTD-GPT: A Multi-Task Decision-Making GPT Model for Autonomous Driving at Unsignalized Intersections (arxiv.org)
语言增强的目标导向闭环端到端自动驾驶解决方案。[2403.20116] LeGo-Drive: Language-enhanced Goal-oriented Closed-Loop End-to-End Autonomous Driving (arxiv.org)
通用世界模型,提升模型泛化能力ADriver-I: Generalizable World Models with Action-Rational Decision-Making for Autonomous Driving.
引入潜在扩散模型,生成高质量、多视角的驾驶场景视频DrivingDiffusion: Harnessing Latent Diffusion Models for Video Generation in Autonomous Driving.
自适应性和高质量的多视角视频生成DriveDreamer: World Model-based Autonomous Driving Video Generation.
场景级扩散技术,优化了交通模拟的生成质量CTG++: Towards Generalizable Cross-Task Generation with Pre-trained Models.
可控的驾驶场景生成GAIA-1: Generalizable Autonomous Driving World Models for Real-time Adaptive Driving.
利用3D几何控制生成街景视图MagicDrive: Controllable Scene Generation via 3D Geometric Manipulation for Autonomous Driving.
多视角世界模型应用于自动驾驶规划,端到端驾驶中视角一致的视频生成方法Driving into the Future: View-consistent Video Generation for Autonomous Driving with World Models.
通过LLM生成和模拟挑战性的安全关键场景ChatScene: Generative Safety-critical Autonomous Driving Simulation with Large Language Models.
利用GPT-4进行奖赏函数的生成和优化,为自动驾驶中的强化学习提供了接近人类驾驶标准的评分机制。REvolve: Reinforcement Learning Reward Functions Generation using Large Language Models.
基于大规模视频预测模型,通过时间推理模块提升自动驾驶中对多样驾驶场景的泛化能力。GenAD: Generative Video Prediction for Autonomous Driving.
在DriveDreamer基础上引入LLM,生成定制化的高质量多视角驾驶视频DriveDreamer-2: Leveraging LLMs for Customized Autonomous Driving Video Generation.
通过自然语言命令实现可编辑的高质量3D驾驶场景模拟ChatSim: High-fidelity 3D Autonomous Driving Scene Simulation via Natural Language Commands.
集成人类模仿推理能力,优化复杂交通场景下的信号控制LLM-Assisted Light: Human-like Traffic Signal Control via Large Language Models.
采用强化学习优化LLM生成的代码,实现自动驾驶中的代码生成与优化LangProp: Code Generation and Optimization in Autonomous Driving via Reinforcement Learning and Large Language Models.
多模态大语言模型与行为规划状态的对齐DriveMLM: Aligning Multi-Modal Large Language Models with Behavioral Planning States for Autonomous Driving.
将图形视觉问答整合到驾驶任务中DriveLM: Driving with Graph Visual Question Answering.
引入链式推理和可解释性机制,使决策过程更透明和可验证。Reason2Drive: Towards Interpretable and Chain-Based Reasoning for Autonomous Driving.
通过视频问答形式,实现对自动驾驶过程中动态场景的实时分析和反馈LingoQA: Video Question Answering for Autonomous Driving.
采用多模态语言模型,通过“思维链”增强了模型推理能力Dolphins: Multimodal Language Model for Driving.
结合大语言模型,实时解释驾驶决策DriveGPT4: Interpretable End-to-End Autonomous Driving via Large Language Model.
集成安全评估模块和对齐场景,提高系统的安全性和模型对齐度A Superalignment Framework: A Framework for Aligning Models in Autonomous Driving.
高效轻量级的多帧视觉语言模型EM-VLM4AD: Multi-Frame, Lightweight & Efficient Vision-Language Models for Question Answering in Autonomous Driving.
过多模态生成预训练模型提升运输领域复杂任务处理能力TransGPT:Multi-modal Generative Pre-trained Transformer for Transportation.
将多模态大型语言模型与自动驾驶的行为规划状态保持一致Domain Knowledge Distillation: An Empirical Study in the Autonomous Driving Domain.
通过大语言模型强化用户命令的理解与推理能力Human-Centric Autonomous Systems: With LLMs for User Command Reasoning.
通过大语言模型构建并验证了自动驾驶系统的安全性需求Engineering Safety: Requirements for Autonomous Driving with Large Language Models.
将大语言模型与混合推理相结合的方式Hybrid Reasoning: Based on Large Language Models for Autonomous Car Driving.