Autonomous driving paper index
DiffPlanner: Constraint-Guided Diffusion Trajectory Planning with Object-Level Scene Encoding and Transformer Context Modeling for Autonomous Driving
One-line summary
Trajectory planning is a safety-critical task in autonomous driving that demands real-time generation of accurate, physically feasible, and traffic-compliant driving paths.
Engineering notes
To address these challenges, we propose DiffPlanner, an end-to-end constraint-aware trajectory planning framework that deeply integrates three complementary modules: (1) an object-level scene feature encoding module that replaces pixel-level inputs with compact instance-aware descriptors, significantly reducing input dimensionality while preserving complete scene semantics; (2) a Transformer-based spatiotemporal context modeling module that leverages multi-head self-attention to capture global cross-entity interactions, including vehicle–vehicle, vehicle–pedestrian, and vehicle–road element relationships; and (3) a constraint-guided diffusion trajectory generation module that formulates planning as conditional iterative denoising with differentiable safety and kinematic guidance functions embedded in the reverse process. Extensive experiments on the Argoverse 2 benchmark demonstrate that DiffPlanner achieves state-of-the-art performance, reducing minFDE6 by 11.9% and collision rate by 55.6% compared with the strongest baseline, while preserving real-time inference efficiency.
Chinese explanation / 中文解读
中文解读待补充:本站会优先为端到端自动驾驶、BEV感知、3D目标检测、轨迹预测、路径规划、LiDAR感知等高价值论文补充中文说明。
Original abstract
Trajectory planning is a safety-critical task in autonomous driving that demands real-time generation of accurate, physically feasible, and traffic-compliant driving paths. Existing approaches exhibit three key limitations: (1) dense rasterized scene representations introduce substantial computational redundancy that hinders real-time deployment; (2) pairwise interaction modeling is insufficient for capturing global spatiotemporal dependencies among heterogeneous traffic participants and map elements; and (3) deterministic or single-mode generation paradigms cannot produce diverse, multimodal trajectories while simultaneously enforcing safety and kinematic constraints. To address these challenges, we propose DiffPlanner, an end-to-end constraint-aware trajectory planning framework that deeply integrates three complementary modules: (1) an object-level scene feature encoding module that replaces pixel-level inputs with compact instance-aware descriptors, significantly reducing input dimensionality while preserving complete scene semantics; (2) a Transformer-based spatiotemporal context modeling module that leverages multi-head self-attention to capture global cross-entity interactions, including vehicle–vehicle, vehicle–pedestrian, and vehicle–road element relationships; and (3) a constraint-guided diffusion trajectory generation module that formulates planning as conditional iterative denoising with differentiable safety and kinematic guidance functions embedded in the reverse process. Extensive experiments on the Argoverse 2 benchmark demonstrate that DiffPlanner achieves state-of-the-art performance, reducing minFDE6 by 11.9% and collision rate by 55.6% compared with the strongest baseline, while preserving real-time inference efficiency.
Links and sources
Need this topic turned into a technical roadmap?
Full Self Driving can prepare a custom autonomous driving literature review, code map, dataset map, and B2B technology assessment.
Request B2B research
Comments