Autonomous driving paper index
MDFusion: Multi-Dimension Semantic–Spatial Feature Fusion for LiDAR–Camera 3D Object Detection
One-line summary
To mitigate these issues, this paper proposes a novel multi-dimension semantic–spatial feature fusion (MDFusion) method that combines LiDAR and image features in 2D and 3D spaces.
Engineering notes
Extensive experiments on the KITTI and ONCE datasets demonstrate that our method achieves competitive performance in complex scenes, significantly improving the multi-modal fusion quality and detection accuracy while maintaining computational efficiency.
Chinese explanation / 中文解读
中文解读待补充:本站会优先为端到端自动驾驶、BEV感知、3D目标检测、轨迹预测、路径规划、LiDAR感知等高价值论文补充中文说明。
Original abstract
Accurate 3D object detection is becoming increasingly vital for the development of robust perception systems, particularly in applications such as autonomous driving vehicles and robotic systems. Many existing approaches rely on bird’s eye view (BEV) feature maps to facilitate multi-modal interaction, as BEV representations enable efficient operations. However, the inherent sparsity of LiDAR BEV features often leads to misalignment with the dense semantic information in camera images, resulting in suboptimal fusion quality and degraded detection performance, especially in complex and dynamic environments. To mitigate these issues, this paper proposes a novel multi-dimension semantic–spatial feature fusion (MDFusion) method that combines LiDAR and image features in 2D and 3D spaces. Specifically, image semantic features are extracted using the DeepLabV3 segmentation network, which captures rich contextual information and is aligned with LiDAR point cloud voxel features through a summation operation to achieve precise semantic fusion. Additionally, LiDAR BEV features are fused with downsampled image features in 2D space via concatenation and spatially adaptive dilated convolution. The mechanism dynamically adjusts to the spatial characteristics of the data, ensuring robust feature integration. Extensive experiments on the KITTI and ONCE datasets demonstrate that our method achieves competitive performance in complex scenes, significantly improving the multi-modal fusion quality and detection accuracy while maintaining computational efficiency.
Links and sources
Need this topic turned into a technical roadmap?
Full Self Driving can prepare a custom autonomous driving literature review, code map, dataset map, and B2B technology assessment.
Request B2B research
Comments