Autonomous driving paper index

MSFNet3D: Monocular 3D Object Detection via Dual-Branch Depth-Consistent Fusion and Semantic-Guided Point Cloud Refinement

2025-03-14 · World Electric Vehicle Journal

autonomous driving3d object detection3d detectioninstance segmentationobject detectionlidarpoint cloudkittiperception

One-line summary

Our contributions are threefold: (1) We introduce a dual-branch network to optimize depth maps and propose a multi-scale channel spatial attention module (MS_CBAM).

Engineering notes

Key topics: autonomous driving, 3d object detection, 3d detection, instance segmentation, object detection, lidar, point cloud, kitti, perception. See the paper for implementation details and experimental results.

Chinese explanation / 中文解读

中文解读待补充：本站会优先为端到端自动驾驶、BEV感知、3D目标检测、轨迹预测、路径规划、LiDAR感知等高价值论文补充中文说明。

Original abstract

The rapid development of autonomous driving has underscored the pivotal role of 3D perception. Monocular 3D object detection, as a cost-effective alternative to expensive lidar systems, is garnering increasing attention. However, existing pseudo-lidar methods encounter challenges such as coarse quality and insufficient semantic information when generating 3D point clouds from monocular images. To address these issues, this paper introduces MSFNet3D, which aims to overcome the quality limitations of pseudo-lidar point cloud. Our contributions are threefold: (1) We introduce a dual-branch network to optimize depth maps and propose a multi-scale channel spatial attention module (MS_CBAM). This module captures multi-scale geometric features through a hierarchical feature pyramid and an adaptive weight allocation mechanism, thereby addressing the scale sensitivity inherent in traditional attention mechanisms. (2) We propose a consistency-weighted fusion strategy that employs local gradient consistency analysis and differentiable weighted optimization to achieve a pixel-level fusion of image and depth features. This approach reduces feature conflicts within the dual-branch network and enhances the model’s robustness in complex scenes. (3) We introduce a semantic-guided pseudo-point cloud enhancement method that leverages an instance segmentation network to extract object-specific semantic regions and generate high-confidence point cloud, consequently improving the accuracy of object detection. Experiments on the KITTI dataset show that the proposed method performs excellently under various detection challenges, achieving an average precision of 18.87% in the 3D detection of car objects, which is a 1.67% improvement over the original model. The method also shows good performance in detecting pedestrians and cyclists. The proposed framework can provide economical and reliable 3D perception for mass-produced electric vehicles.

5.0Engineering value

7.0Research novelty

5.0Business relevance

Links and sources

Need this topic turned into a technical roadmap?

Full Self Driving can prepare a custom autonomous driving literature review, code map, dataset map, and B2B technology assessment.

Request B2B research

Comments

No comments yet. Be the first to share your thoughts on this paper.