Autonomous driving paper index

Multimodal Multi-Sensor Camera-LiDAR Fusion for 3D Object Detection in Autonomous Vehicles

2025-11-12 · 2025 20th International Joint Symposium on Artificial Intelligence and Natural Language Processing (iSAI-NLP)

3D Object Detection LiDAR Perception Sensor Fusion

autonomous drivingautonomous vehicle3d object detectionobject detectionlidarpoint cloudcamera-lidar fusionperception

One-line summary

This paper presents Middle-Shortcut Fusion, a novel mid-level multimodal fusion framework that effectively integrates synchronized LiDAR point clouds and RGB images using a two-stream neural architecture with cross-modal residual shortcut pathways.

Engineering notes

The incorporation of the mid-level fusion technique, when combining camera and LiDAR modalities, demonstrates remarkable improvements over individual sensorbased detection methods, highlighting its superior capability for autonomous perception. Experimental results show that the proposed approach consistently surpasses conventional early and late fusion baselines, achieving superior detection precision and faster inference in vehicle detection tasks.

Chinese explanation / 中文解读

中文解读待补充：本站会优先为端到端自动驾驶、BEV感知、3D目标检测、轨迹预测、路径规划、LiDAR感知等高价值论文补充中文说明。

Original abstract

Accurate and real-time 3D object detection is essential for reliable autonomous vehicle perception. This paper presents Middle-Shortcut Fusion, a novel mid-level multimodal fusion framework that effectively integrates synchronized LiDAR point clouds and RGB images using a two-stream neural architecture with cross-modal residual shortcut pathways. These shortcuts enable efficient feature propagation and deep intermodal interaction, substantially enhancing detection accuracy and computational efficiency. The incorporation of the mid-level fusion technique, when combining camera and LiDAR modalities, demonstrates remarkable improvements over individual sensorbased detection methods, highlighting its superior capability for autonomous perception. To validate the framework, a highresolution dataset capturing real-world semi-urban driving conditions was curated. Experimental results show that the proposed approach consistently surpasses conventional early and late fusion baselines, achieving superior detection precision and faster inference in vehicle detection tasks. By uniting the semantic richness of visual cues with the geometric precision of LiDAR through learnable shortcut pathways, the proposed fusion method establishes a new benchmark for efficient and robust multimodal perception in dynamic urban environments.

5.5Engineering value

8.0Research novelty

5.5Business relevance

Links and sources

Official / arXiv page

Need this topic turned into a technical roadmap?

Full Self Driving can prepare a custom autonomous driving literature review, code map, dataset map, and B2B technology assessment.

Request B2B research

Comments

No comments yet. Be the first to share your thoughts on this paper.