Autonomous driving paper index

Research on Multi-Source Heterogeneous Collaborative Perception System Based on Unmanned Aerial Vehicle and Unmanned Ground Vehicle

2026-06-19 · Drones

autonomous drivingbev3d object detectionobject detectionperception

One-line summary

Complex urban scenarios impose high demands on the environmental perception capabilities of unmanned systems, which serve as a prerequisite for executing autonomous missions such as disaster response, infrastructure inspection, and smart city operations.

Engineering notes

Key topics: autonomous driving, bev, 3d object detection, object detection, perception. See the paper for implementation details and experimental results.

Chinese explanation / 中文解读

中文解读待补充:本站会优先为端到端自动驾驶、BEV感知、3D目标检测、轨迹预测、路径规划、LiDAR感知等高价值论文补充中文说明。

Original abstract

Complex urban scenarios impose high demands on the environmental perception capabilities of unmanned systems, which serve as a prerequisite for executing autonomous missions such as disaster response, infrastructure inspection, and smart city operations. UAVs, leveraging their high mobility, can provide accurate prior maps and wide-area aerial observation for unmanned ground vehicles. However, their long-range perception accuracy is limited. Conversely, UGVs can achieve high-precision environmental perception along their navigation paths using prior maps, but suffer from a constrained field of view. The collaboration between the two platforms complements their respective strengths, thereby enhancing 3D object perception and mapping accuracy in complex scenarios. To address the aforementioned challenges, this study proposes a cross-platform feature fusion method for 3D object perception and an incremental map updating approach for UAVs and UGVs. First, a dynamic SLAM method that integrates an optimized YOLOv8 with ORB-SLAM3 is employed to mitigate map blurring caused by dynamic noise, providing prior map information for UGVs. Second, a multimodal fusion perception model is constructed for UGVs, utilizing attention mechanisms to achieve deep fusion of multimodal Bird’s-Eye-View (BEV) features. This overcomes issues such as diminishing complementarity between modalities and weak temporal feature associations. Finally, an air ground fusion model based on a cross-attention mechanism is developed to fuse aerial view features with ground-based fused BEV features across platforms, yielding a unified feature representation for 3D object detection and generating a fused high-precision map. Experimental results demonstrate that under complex occlusion scenarios in a simulated dataset, the proposed collaborative perception system improves the mean Average Precision (mAP) by 12.7% and 15.7% compared to using a single UAV or a single UGV, respectively, while increasing the map accuracy F1-score by 0.21. This study provides technical support for achieving real-time and accurate air ground collaborative perception in complex dynamic environments.

5.0Engineering value
7.0Research novelty
5.0Business relevance

Links and sources

Need this topic turned into a technical roadmap?

Full Self Driving can prepare a custom autonomous driving literature review, code map, dataset map, and B2B technology assessment.

Request B2B research

Comments

No comments yet. Be the first to share your thoughts on this paper.
Login or register to leave a comment