Autonomous driving paper index

FA-BEVFusion: LiDAR-Camera Fusion for 3D Roadside Perception

2026-01-23 · 2026 International Conference on Robotics, Automation and Intelligent Transportation Systems (RAITS)

BEV Perception 3D Object Detection LiDAR Perception Sensor Fusion

autonomous drivingbev3d object detectionobject detectionlidarpoint cloudperception

One-line summary

To address these challenges, we propose a bird’s-eye view (BEV) perception model named FA-BEVFusion, which integrates a frequency-spatial domain enhanced feature extraction module and a depth-guided dual-modal cross-attention mechanism.

Engineering notes

Compared to the baseline, the proposed model achieves a 1.5 percentage point increase in mean AP on the DAIR-V2X-I roadside dataset.

Chinese explanation / 中文解读

中文解读待补充：本站会优先为端到端自动驾驶、BEV感知、3D目标检测、轨迹预测、路径规划、LiDAR感知等高价值论文补充中文说明。

Original abstract

With the advancement of smart city infrastructure and vehicle-road collaboration technologies, roadside perception systems have become a critical focus in autonomous driving, leveraging the inherent advantage of a top-down perspective. However, in roadside scenarios, the image features of distant small targets in 3D object detection tasks are inherently weak, and point cloud data tends to become extremely sparse as the distance from the sensor grows. Conventional multi-modal fusion algorithms struggle to effectively model the inter-modal correlations between heterogeneous data modalities, leading to constrained detection accuracy in roadside 3D object detection. To address these challenges, we propose a bird’s-eye view (BEV) perception model named FA-BEVFusion, which integrates a frequency-spatial domain enhanced feature extraction module and a depth-guided dual-modal cross-attention mechanism. Specifically, a multi-scale feature fusion framework enhanced by a frequency-spatial domain feature extraction module is employed to aggregate image features, thereby improving the model’s ability to represent weak features. A depth-guided dual-modal spatial-channel attention feature fusion module is designed to achieve effective integration of heterogeneous cross-modal features. Compared to the baseline, the proposed model achieves a 1.5 percentage point increase in mean AP on the DAIR-V2X-I roadside dataset. The results demonstrate the effectiveness of multi-scale frequency-spatial enhancement and depth-guided attention mechanisms in roadside fusion perception tasks.

5.0Engineering value

7.0Research novelty

5.0Business relevance

Links and sources

Official / arXiv page

Need this topic turned into a technical roadmap?

Full Self Driving can prepare a custom autonomous driving literature review, code map, dataset map, and B2B technology assessment.

Request B2B research

Comments

No comments yet. Be the first to share your thoughts on this paper.