Autonomous driving paper index
TBBOcc: A Lightweight Twin‐Branch Binarized Network for Efficient 3D Semantic Occupancy Prediction in Autonomous Driving
One-line summary
In this paper, we propose a lightweight two‐branch binarization network, TBBOcc, to break through the bottleneck of ‘efficiency‐accuracy’ trade‐off through multi‐technology co‐optimization.
Engineering notes
Experiments show that TBBOcc achieves 39.1% mean intersection over union (mIoU) on the Occ3D‐nuScenes validation set with 32.8 M parameter counts and 164.8 G FLOPs, which reduces the amount of parameters by 26.6%, computation by 33.7%, and improves the accuracy by 3.3% compared with the baseline model FlashOcc.
Chinese explanation / 中文解读
中文解读待补充:本站会优先为端到端自动驾驶、BEV感知、3D目标检测、轨迹预测、路径规划、LiDAR感知等高价值论文补充中文说明。
Original abstract
The safety decisions of autonomous driving systems rely on the accurate understanding of 3D scenes, and the existing 3D occupancy prediction (OCC) models are difficult to meet the requirements of in‐vehicle deployment due to their high computational complexity and a large number of parameters. Traditional methods (e.g., OccWorld, FlashOcc) rely on full‐precision floating‐point operations and dense 3D convolution, resulting in hundreds of millions of model parameters. In this paper, we propose a lightweight two‐branch binarization network, TBBOcc, to break through the bottleneck of ‘efficiency‐accuracy’ trade‐off through multi‐technology co‐optimization. First, we design two‐branch binarized feature extraction, using channel compression and hyperbolic tangent relaxation activation function to alleviate the problem of vanishing binarized gradient, which reduces the computation amount while retaining the key geometrical information; second, we improve the EfficientViM module by integrating state space modeling and a two‐dimensional normalization strategy, which enhances the ability of global temporal feature modeling; and lastly, we introduce a dynamic temporal fusion mechanism, combining binocular depth estimation with deformable BEV pooling to capture the spatio‐temporal evolution laws. Experiments show that TBBOcc achieves 39.1% mean intersection over union (mIoU) on the Occ3D‐nuScenes validation set with 32.8 M parameter counts and 164.8 G FLOPs, which reduces the amount of parameters by 26.6%, computation by 33.7%, and improves the accuracy by 3.3% compared with the baseline model FlashOcc. Especially, it performs well in dynamic obstacles (e.g., pedestrians, traffic cones) and complex scenes. In this paper, binarization computation is introduced into the 3D OCC task for the first time, which provides an efficient and reliable technical path for real‐time environment sensing for autonomous driving.
Links and sources
Need this topic turned into a technical roadmap?
Full Self Driving can prepare a custom autonomous driving literature review, code map, dataset map, and B2B technology assessment.
Request B2B research
Comments