Autonomous driving paper index

DiffPlanner: Constraint-Guided Diffusion Trajectory Planning with Object-Level Scene Encoding and Transformer Context Modeling for Autonomous Driving

2026-04-10 · 2026 International Conference on Image, Signal Processing and Pattern Recognition (ISPP)

End-to-End Autonomous Driving Path Planning

autonomous drivingend-to-endtrajectory planningdeploymentplanning

One-line summary

Trajectory planning is a safety-critical task in autonomous driving that demands real-time generation of accurate, physically feasible, and traffic-compliant driving paths.

Engineering notes

To address these challenges, we propose DiffPlanner, an end-to-end constraint-aware trajectory planning framework that deeply integrates three complementary modules: (1) an object-level scene feature encoding module that replaces pixel-level inputs with compact instance-aware descriptors, significantly reducing input dimensionality while preserving complete scene semantics; (2) a Transformer-based spatiotemporal context modeling module that leverages multi-head self-attention to capture global cross-entity interactions, including vehicle–vehicle, vehicle–pedestrian, and vehicle–road element relationships; and (3) a constraint-guided diffusion trajectory generation module that formulates planning as conditional iterative denoising with differentiable safety and kinematic guidance functions embedded in the reverse process. Extensive experiments on the Argoverse 2 benchmark demonstrate that DiffPlanner achieves state-of-the-art performance, reducing minFDE6 by 11.9% and collision rate by 55.6% compared with the strongest baseline, while preserving real-time inference efficiency.

Chinese explanation / 中文解读

中文解读待补充：本站会优先为端到端自动驾驶、BEV感知、3D目标检测、轨迹预测、路径规划、LiDAR感知等高价值论文补充中文说明。

Original abstract

Trajectory planning is a safety-critical task in autonomous driving that demands real-time generation of accurate, physically feasible, and traffic-compliant driving paths. Existing approaches exhibit three key limitations: (1) dense rasterized scene representations introduce substantial computational redundancy that hinders real-time deployment; (2) pairwise interaction modeling is insufficient for capturing global spatiotemporal dependencies among heterogeneous traffic participants and map elements; and (3) deterministic or single-mode generation paradigms cannot produce diverse, multimodal trajectories while simultaneously enforcing safety and kinematic constraints. To address these challenges, we propose DiffPlanner, an end-to-end constraint-aware trajectory planning framework that deeply integrates three complementary modules: (1) an object-level scene feature encoding module that replaces pixel-level inputs with compact instance-aware descriptors, significantly reducing input dimensionality while preserving complete scene semantics; (2) a Transformer-based spatiotemporal context modeling module that leverages multi-head self-attention to capture global cross-entity interactions, including vehicle–vehicle, vehicle–pedestrian, and vehicle–road element relationships; and (3) a constraint-guided diffusion trajectory generation module that formulates planning as conditional iterative denoising with differentiable safety and kinematic guidance functions embedded in the reverse process. Extensive experiments on the Argoverse 2 benchmark demonstrate that DiffPlanner achieves state-of-the-art performance, reducing minFDE6 by 11.9% and collision rate by 55.6% compared with the strongest baseline, while preserving real-time inference efficiency.

6.0Engineering value

8.0Research novelty

6.0Business relevance

Links and sources

Official / arXiv page

Need this topic turned into a technical roadmap?

Full Self Driving can prepare a custom autonomous driving literature review, code map, dataset map, and B2B technology assessment.

Request B2B research

Comments

No comments yet. Be the first to share your thoughts on this paper.