Autonomous driving paper index

C2FMFusion: a multimodal BEV feature fusion method for autonomous driving

2026-07-04 · Journal of Electronic Imaging

BEV Perception 3D Object Detection LiDAR Perception Sensor Fusion

autonomous driving systemautonomous drivingbev3d object detectionobject detectionlidarnusceneswaymoprediction

One-line summary

In this paper, we propose a multimodal and multiscale feature fusion framework tailored for 3D object detection and map segmentation tasks in autonomous driving.

Engineering notes

Extensive experiments on the nuScenes and Waymo datasets demonstrate the superiority of our approach, achieving up to a 0.5% improvement in detection accuracy and a 2.4% increase in intersection over union for map segmentation compared with state-of-the-art methods.

Chinese explanation / 中文解读

中文解读待补充：本站会优先为端到端自动驾驶、BEV感知、3D目标检测、轨迹预测、路径规划、LiDAR感知等高价值论文补充中文说明。

Original abstract

Cameras and LiDAR are currently the two most commonly used sensors in autonomous driving. The effectiveness of feature fusion between camera and LiDAR modalities directly impacts the stability and safety of autonomous driving systems. Existing fusion methods typically concatenate bird’s eye view (BEV) features from both modalities under a unified resolution, followed by convolutional or self-attention operations to integrate them. However, these approaches suffer from limited receptive fields and insufficient feature alignment. In this paper, we propose a multimodal and multiscale feature fusion framework tailored for 3D object detection and map segmentation tasks in autonomous driving. First, we design an uncertainty-aware prediction branch combined with a global alignment module to enable adaptive weighting and accurate alignment of LiDAR and camera BEV features. Then, we introduce an inter-head feature interaction strategy integrated into an efficient attention mechanism to facilitate information complementation and conflict correction among attention heads, enhancing global context awareness. Finally, a coarse-to-fine multimodal fusion strategy is presented, leveraging efficient attention to reduce the computational burden of the fusion process. Extensive experiments on the nuScenes and Waymo datasets demonstrate the superiority of our approach, achieving up to a 0.5% improvement in detection accuracy and a 2.4% increase in intersection over union for map segmentation compared with state-of-the-art methods.

5.5Engineering value

8.0Research novelty

6.0Business relevance

Links and sources

Need this topic turned into a technical roadmap?

Full Self Driving can prepare a custom autonomous driving literature review, code map, dataset map, and B2B technology assessment.

Request B2B research

Comments

No comments yet. Be the first to share your thoughts on this paper.