Autonomous driving paper index
FA-BEVFusion: LiDAR-Camera Fusion for 3D Roadside Perception
One-line summary
To address these challenges, we propose a bird’s-eye view (BEV) perception model named FA-BEVFusion, which integrates a frequency-spatial domain enhanced feature extraction module and a depth-guided dual-modal cross-attention mechanism.
Engineering notes
Compared to the baseline, the proposed model achieves a 1.5 percentage point increase in mean AP on the DAIR-V2X-I roadside dataset.
Chinese explanation / 中文解读
中文解读待补充:本站会优先为端到端自动驾驶、BEV感知、3D目标检测、轨迹预测、路径规划、LiDAR感知等高价值论文补充中文说明。
Original abstract
With the advancement of smart city infrastructure and vehicle-road collaboration technologies, roadside perception systems have become a critical focus in autonomous driving, leveraging the inherent advantage of a top-down perspective. However, in roadside scenarios, the image features of distant small targets in 3D object detection tasks are inherently weak, and point cloud data tends to become extremely sparse as the distance from the sensor grows. Conventional multi-modal fusion algorithms struggle to effectively model the inter-modal correlations between heterogeneous data modalities, leading to constrained detection accuracy in roadside 3D object detection. To address these challenges, we propose a bird’s-eye view (BEV) perception model named FA-BEVFusion, which integrates a frequency-spatial domain enhanced feature extraction module and a depth-guided dual-modal cross-attention mechanism. Specifically, a multi-scale feature fusion framework enhanced by a frequency-spatial domain feature extraction module is employed to aggregate image features, thereby improving the model’s ability to represent weak features. A depth-guided dual-modal spatial-channel attention feature fusion module is designed to achieve effective integration of heterogeneous cross-modal features. Compared to the baseline, the proposed model achieves a 1.5 percentage point increase in mean AP on the DAIR-V2X-I roadside dataset. The results demonstrate the effectiveness of multi-scale frequency-spatial enhancement and depth-guided attention mechanisms in roadside fusion perception tasks.
Links and sources
Need this topic turned into a technical roadmap?
Full Self Driving can prepare a custom autonomous driving literature review, code map, dataset map, and B2B technology assessment.
Request B2B research
Comments