Autonomous driving paper index

V-Fusion: 2D Detection-enhanced Multimodal 3D BEV Object Detection

2025-04-06 · IEEE International Conference on Acoustics, Speech, and Signal Processing

autonomous drivingautonomous vehiclebev3d object detectionobject detectionnuscenesperception

One-line summary

In this paper, we propose V-Fusion, a high-quality 2D detection-enhanced multimodal BEV object detection method.

Engineering notes

However, current multimodal 3D object detection methods focus on unifying modalities into a bird’s-eye view (BEV) representation, which overlooks the inherent characteristics of camera perspective view (PV), where 2D detection performance significantly surpasses that of state-of-the-art 3D detectors. Notably, V-Fusion achieves 74.1 NDS performance on the challenging nuScenes dataset, outperforming SparseFusion in 1.0 NDS and offering comparable inference speed.

Chinese explanation / 中文解读

中文解读待补充:本站会优先为端到端自动驾驶、BEV感知、3D目标检测、轨迹预测、路径规划、LiDAR感知等高价值论文补充中文说明。

Original abstract

Integrating information from multiple sensors enhances the performance of autonomous vehicle perception systems. However, current multimodal 3D object detection methods focus on unifying modalities into a bird’s-eye view (BEV) representation, which overlooks the inherent characteristics of camera perspective view (PV), where 2D detection performance significantly surpasses that of state-of-the-art 3D detectors. In this paper, we propose V-Fusion, a high-quality 2D detection-enhanced multimodal BEV object detection method. By leveraging the 2D priors of PV, we construct 3D query proposals that complement BEV 3D queries. To address the modal discrepancy in generating 3D queries from 2D priors, we propose a depth-robust 2D-to-3D query generation strategy. Additionally, we introduce a novel geometry-constrained self-attention mechanism to enhance the interaction of BEV 3D queries and employ an additional set of learnable 3D queries to account for potentially missed objects. Notably, V-Fusion achieves 74.1 NDS performance on the challenging nuScenes dataset, outperforming SparseFusion in 1.0 NDS and offering comparable inference speed.

5.5Engineering value
8.0Research novelty
5.0Business relevance

Links and sources

Need this topic turned into a technical roadmap?

Full Self Driving can prepare a custom autonomous driving literature review, code map, dataset map, and B2B technology assessment.

Request B2B research

Comments

No comments yet. Be the first to share your thoughts on this paper.
Login or register to leave a comment