Autonomous driving paper index
LapDINO: A DINOv3 and Laplacian Pyramid-Based Approach for Outdoor Terrain Segmentation
One-line summary
Specifically, we leverage DINOv3 to extract global semantic features as a “semantic map”, while simultaneously obtaining multi-scale high-frequency details through Laplacian pyramid decomposition as “structural contours”.
Engineering notes
Experimental results demonstrate that the proposed method achieves state-of-the-art performance, striking an optimal balance between accuracy and computational efficiency, thereby providing a robust and efficient engineering solution for terrain perception in off-road environments.
Chinese explanation / 中文解读
中文解读待补充:本站会优先为端到端自动驾驶、BEV感知、3D目标检测、轨迹预测、路径规划、LiDAR感知等高价值论文补充中文说明。
Original abstract
As autonomous driving technology expands from structured urban roads to unstructured outdoor environments, precise understanding of complex terrain has become a critical requirement for ensuring safe vehicle navigation. However, outdoor environments are characterized by high dynamics, drastic illumination variations, ambiguous category boundaries, and prohibitive annotation costs, making traditional supervised learning methods that rely on large amounts of pixel-level annotations difficult to generalize. In this paper, we propose a novel dual-path bidirectional interactive encoder, termed LapDINO, that effectively combines the strong semantic generalization capability of the self-supervised foundation model DINOv3 with the multi-scale frequency analysis capacity of the Laplacian pyramid. Specifically, we leverage DINOv3 to extract global semantic features as a “semantic map”, while simultaneously obtaining multi-scale high-frequency details through Laplacian pyramid decomposition as “structural contours”. Building upon this, we design a bidirectional cross-attention fusion mechanism that enables dynamic interaction and mutual refinement between semantic information and geometric details. Furthermore, we introduce a multi-branch attention enhancement module that extracts pyramid features from three complementary perspectives. To address domain shift, we design lightweight visual adapters that enable efficient fine-tuning of the frozen DINOv3 backbone. Finally, we construct two off-road terrain segmentation datasets, VOTD and VOCD, to facilitate research in this domain. Experimental results demonstrate that the proposed method achieves state-of-the-art performance, striking an optimal balance between accuracy and computational efficiency, thereby providing a robust and efficient engineering solution for terrain perception in off-road environments.
Links and sources
Need this topic turned into a technical roadmap?
Full Self Driving can prepare a custom autonomous driving literature review, code map, dataset map, and B2B technology assessment.
Request B2B research
Comments