Autonomous driving paper index
Sce2DriveX: A Generalized MLLM Framework for Scene-to-Drive Learning
One-line summary
We propose Sce2DriveX, a human-like chain-of-thought (CoT) driving reasoning MLLM framework, designed to achieve progressive learning from multi-view scene understanding to behavior analysis, motion planning, and vehicle control driving process.
Engineering notes
Extensive experiments demonstrate that Sce2DriveX achieves state-of-the-art performance across tasks from scene understanding to end-to-end driving, as well as robust generalization in handling diverse driving scenes on the CARLA Bench2Drive benchmark.
Chinese explanation / 中文解读
中文解读待补充:本站会优先为端到端自动驾驶、BEV感知、3D目标检测、轨迹预测、路径规划、LiDAR感知等高价值论文补充中文说明。
Original abstract
End-to-end autonomous driving, which directly maps raw sensor inputs to low-level vehicle controls, is an crucial part of Embodied AI. Despite successes in applying Multimodal Large Language Models (MLLMs) for high-level traffic scene semantic understanding, it remains challenging to effectively translate these conceptual semantics understandings into low-level motion control commands and achieve cross-scene driving generalization and consensus. We propose Sce2DriveX, a human-like chain-of-thought (CoT) driving reasoning MLLM framework, designed to achieve progressive learning from multi-view scene understanding to behavior analysis, motion planning, and vehicle control driving process. Sce2DriveX utilizes multimodal joint learning of local scene videos and global Bird’s Eye View (BEV) maps to deeply understand long-range spatiotemporal relationships and road topology, enhancing its 3D dynamic/static scene perception and reasoning capabilities and achieving cross-scene generalization. Meanwhile, it reconstructs the implicit cognitive chain inherent in human driving, further enhancing the consensus between autonomous driving and human thought. To improve model performance, we construct the first comprehensive Visual Question Answering (VQA) driving instruction dataset, which tailored for 3D spatial understanding and long-axis task reasoning, and introduce a task-oriented three-stage training pipeline to support supervised fine-tuning. Extensive experiments demonstrate that Sce2DriveX achieves state-of-the-art performance across tasks from scene understanding to end-to-end driving, as well as robust generalization in handling diverse driving scenes on the CARLA Bench2Drive benchmark.
Links and sources
Need this topic turned into a technical roadmap?
Full Self Driving can prepare a custom autonomous driving literature review, code map, dataset map, and B2B technology assessment.
Request B2B research
Comments