Autonomous driving paper index

DiffPlanner: Constraint-Guided Diffusion Trajectory Planning with Object-Level Scene Encoding and Transformer Context Modeling for Autonomous Driving

2026-04-10 · 2026 International Conference on Image, Signal Processing and Pattern Recognition (ISPP)

autonomous drivingend-to-endtrajectory planningdeploymentplanning

One-line summary

Trajectory planning is a safety-critical task in autonomous driving that demands real-time generation of accurate, physically feasible, and traffic-compliant driving paths.

Engineering notes

To address these challenges, we propose DiffPlanner, an end-to-end constraint-aware trajectory planning framework that deeply integrates three complementary modules: (1) an object-level scene feature encoding module that replaces pixel-level inputs with compact instance-aware descriptors, significantly reducing input dimensionality while preserving complete scene semantics; (2) a Transformer-based spatiotemporal context modeling module that leverages multi-head self-attention to capture global cross-entity interactions, including vehicle–vehicle, vehicle–pedestrian, and vehicle–road element relationships; and (3) a constraint-guided diffusion trajectory generation module that formulates planning as conditional iterative denoising with differentiable safety and kinematic guidance functions embedded in the reverse process. Extensive experiments on the Argoverse 2 benchmark demonstrate that DiffPlanner achieves state-of-the-art performance, reducing minFDE6 by 11.9% and collision rate by 55.6% compared with the strongest baseline, while preserving real-time inference efficiency.

Chinese explanation / 中文解读

中文解读待补充:本站会优先为端到端自动驾驶、BEV感知、3D目标检测、轨迹预测、路径规划、LiDAR感知等高价值论文补充中文说明。

Original abstract

Trajectory planning is a safety-critical task in autonomous driving that demands real-time generation of accurate, physically feasible, and traffic-compliant driving paths. Existing approaches exhibit three key limitations: (1) dense rasterized scene representations introduce substantial computational redundancy that hinders real-time deployment; (2) pairwise interaction modeling is insufficient for capturing global spatiotemporal dependencies among heterogeneous traffic participants and map elements; and (3) deterministic or single-mode generation paradigms cannot produce diverse, multimodal trajectories while simultaneously enforcing safety and kinematic constraints. To address these challenges, we propose DiffPlanner, an end-to-end constraint-aware trajectory planning framework that deeply integrates three complementary modules: (1) an object-level scene feature encoding module that replaces pixel-level inputs with compact instance-aware descriptors, significantly reducing input dimensionality while preserving complete scene semantics; (2) a Transformer-based spatiotemporal context modeling module that leverages multi-head self-attention to capture global cross-entity interactions, including vehicle–vehicle, vehicle–pedestrian, and vehicle–road element relationships; and (3) a constraint-guided diffusion trajectory generation module that formulates planning as conditional iterative denoising with differentiable safety and kinematic guidance functions embedded in the reverse process. Extensive experiments on the Argoverse 2 benchmark demonstrate that DiffPlanner achieves state-of-the-art performance, reducing minFDE6 by 11.9% and collision rate by 55.6% compared with the strongest baseline, while preserving real-time inference efficiency.

6.0Engineering value
8.0Research novelty
6.0Business relevance

Links and sources

Need this topic turned into a technical roadmap?

Full Self Driving can prepare a custom autonomous driving literature review, code map, dataset map, and B2B technology assessment.

Request B2B research

Comments

No comments yet. Be the first to share your thoughts on this paper.
Login or register to leave a comment