Autonomous driving paper index

3D Object Detection Method Based on Image-Point Cloud Interactive Fusion

2025-06-20 · 2025 5th International Conference on Artificial Intelligence, Big Data and Algorithms (CAIBDA)

BEV Perception 3D Object Detection LiDAR Perception Sensor Fusion

autonomous drivingbev3d object detectionobject detectionlidarpoint cloudkittiradarperception

One-line summary

To effectively fuse these modalities, this paper proposes a 3D object detection framework based on the interactive fusion of image and point-cloud features.

Engineering notes

Numerous experiments on the KITTI and Dual Radar datasets show that our method significantly improves the detection performance for cars, pedestrians, and cyclists.

Chinese explanation / 中文解读

中文解读待补充：本站会优先为端到端自动驾驶、BEV感知、3D目标检测、轨迹预测、路径规划、LiDAR感知等高价值论文补充中文说明。

Original abstract

The rapid development of artificial intelligence and computer vision has propelled autonomous driving to a crucial frontier in the research of intelligent transportation systems. LiDAR and cameras, commonly used as environmental perception sensors in autonomous driving, each have inherent limitations in their single-modality forms. To effectively fuse these modalities, this paper proposes a 3D object detection framework based on the interactive fusion of image and point-cloud features. In the camera branch, the Orthographic Feature Attention Module is used to transform 2D image features into image BEV features. During the fusion stage, a flow-field alignment technique is introduced to address the spatial misalignment issue, ensuring the precise correspondence of cross-modal features. Meanwhile, an interactive attention module is designed to deeply explore and learn the complementary information between different modalities during the fusion process, further enhancing the fusion effect. Numerous experiments on the KITTI and Dual Radar datasets show that our method significantly improves the detection performance for cars, pedestrians, and cyclists. Analyses across different distance ranges highlight the advantages of multimodal fusion in long - range object detection and the complementary role of visual information in enhancing sparse LiDAR point clouds.

5.0Engineering value

7.0Research novelty

5.0Business relevance

Links and sources

Official / arXiv page

Need this topic turned into a technical roadmap?

Full Self Driving can prepare a custom autonomous driving literature review, code map, dataset map, and B2B technology assessment.

Request B2B research

Comments

No comments yet. Be the first to share your thoughts on this paper.