Autonomous driving paper index

Achieving Speed-Accuracy Balance in Vision-based 3D Occupancy Prediction via Geometric-Semantic Disentanglement

2025-04-11 · AAAI Conference on Artificial Intelligence

autonomous drivingbevoccupancy predictionoccupancynuscenesperceptionprediction

One-line summary

To this end, we redirect the focus from accuracy only to both accuracy and efficiency.

Engineering notes

Specifically, the predicted geometric structure (e.g., depth) guides the projection of 2D image features into 3D voxel space, which significantly affects feature discriminability and subsequent semantic learning. Our method achieves 39.4% mIoU at 20 FPS on Occ3D-nuScenes, showcasing a state-of-the-art balance between accuracy and efficiency.

Chinese explanation / 中文解读

中文解读待补充：本站会优先为端到端自动驾驶、BEV感知、3D目标检测、轨迹预测、路径规划、LiDAR感知等高价值论文补充中文说明。

Original abstract

Occupancy prediction plays a pivotal role in autonomous driving (AD) due to its capabilities of fine-grained 3D perception and general object recognition. However, existing methods often incur high computational costs, which conflict with AD's real-time demand. To this end, we redirect the focus from accuracy only to both accuracy and efficiency. By conducting a head-to-head comparison of existing methods, we find it challenging to balance accuracy and efficiency. We identify a core issue for this challenge: the strong coupling between geometry and semantics. Specifically, the predicted geometric structure (e.g., depth) guides the projection of 2D image features into 3D voxel space, which significantly affects feature discriminability and subsequent semantic learning. To address this issue, we focus on two key aspects: model design and learning strategies. 1) For model design, we propose a dual-branch network that disentangles the representation of geometry and semantics. The voxel branch utilizes a novel re-parameterized large-kernel 3D convolution to refine geometric structure efficiently, while the BEV branch employs temporal fusion and BEV encoding for efficient semantic learning. 2) For learning strategies, we propose to separate geometric learning from semantic learning by the mixup of ground-truth and predicted depths. Our method achieves 39.4% mIoU at 20 FPS on Occ3D-nuScenes, showcasing a state-of-the-art balance between accuracy and efficiency.

5.5Engineering value

8.0Research novelty

5.0Business relevance

Links and sources

Need this topic turned into a technical roadmap?

Full Self Driving can prepare a custom autonomous driving literature review, code map, dataset map, and B2B technology assessment.

Request B2B research

Comments

No comments yet. Be the first to share your thoughts on this paper.