Autonomous driving paper index

DriveWorld: 4D Pre-Trained Scene Understanding via World Models for Autonomous Driving

2024-05-07 · Computer Vision and Pattern Recognition · arXiv: 2405.04390

autonomous driving3d object detectionoccupancy predictionoccupancymulti-object trackingobject trackingobject detectionpredictionplanning

One-line summary

In this paper, we address this challenge by introducing a world model-based autonomous driving 4D representation learning framework, dubbed DriveWorld, which is capable of pretraining from multi-camera driving videos in a spatiotemporal fashion.

Engineering notes

When pretrained with the OpenScene dataset, DriveWorld achieves a 7.5% increase in mAP for 3D object detection, a 3.0% increase in IoU for online mapping, a 5.0% increase in AMOTA for multi-object tracking, a 0.1m decrease in minADE for motionforecasting, a 3.0% increase in IoU for occupancy prediction, and a 0.34m reduction in average L2 error for planning.

Chinese explanation / 中文解读

中文解读待补充:本站会优先为端到端自动驾驶、BEV感知、3D目标检测、轨迹预测、路径规划、LiDAR感知等高价值论文补充中文说明。

Original abstract

Vision-centric autonomous driving has recently raised wide attention due to its lower cost. Pretraining is essential for extracting a universal representation. However, current vision-centric pretraining typically relies on either 2D or 3D pre-text tasks, overlooking the temporal characteristics of autonomous driving as a 4D scene understanding task. In this paper, we address this challenge by introducing a world model-based autonomous driving 4D representation learning framework, dubbed DriveWorld, which is capable of pretraining from multi-camera driving videos in a spatiotemporal fashion. Specifically, we propose a Memory State-Space Model for spatiotemporal modelling, which consists of a Dynamic Memory Bank module for learning temporal-aware latent dynamics to predict future changes and a Static Scene Propagation module for learning spatial-aware latent statics to offer comprehensive scene contexts. We additionally introduce a Task Prompt to decouple task-aware features for various downstream tasks. The experiments demonstrate that DriveWorld delivers promising results on various autonomous driving tasks. When pretrained with the OpenScene dataset, DriveWorld achieves a 7.5% increase in mAP for 3D object detection, a 3.0% increase in IoU for online mapping, a 5.0% increase in AMOTA for multi-object tracking, a 0.1m decrease in minADE for motionforecasting, a 3.0% increase in IoU for occupancy prediction, and a 0.34m reduction in average L2 error for planning.

5.0Engineering value
7.0Research novelty
5.0Business relevance

Links and sources

Need this topic turned into a technical roadmap?

Full Self Driving can prepare a custom autonomous driving literature review, code map, dataset map, and B2B technology assessment.

Request B2B research

Comments

No comments yet. Be the first to share your thoughts on this paper.
Login or register to leave a comment