Autonomous driving paper index

Vision Transformer for Sequence-to-Action Networks in Autonomous Driving Control

2025-10-14 · Information and Communication Technology Convergence

end-to-end autonomous drivingautonomous drivingend-to-endvision transformerpredictioncontrol

One-line summary

End-to-end autonomous driving requires robust mapping from sequential visual inputs to control actions.

Engineering notes

To validate the feasibility of the proposed framework, we construct a synthetic sequence-to-action benchmark, where object trajectories correspond to left, straight, or right movements. Comparative experiments demonstrate that ViT-S2A consistently outperforms a CNN-LSTM baseline in both convergence speed and prediction accuracy, that highlights the effectiveness of global attention in modeling spatiotemporal dependencies.

Chinese explanation / 中文解读

中文解读待补充：本站会优先为端到端自动驾驶、BEV感知、3D目标检测、轨迹预测、路径规划、LiDAR感知等高价值论文补充中文说明。

Original abstract

End-to-end autonomous driving requires robust mapping from sequential visual inputs to control actions. While convolutional neural network (CNN)-based encoders combined with temporal models such as LSTMs have been widely adopted, they are limited in capturing long-range spatial dependencies and global context within visual sequences. This paper introduces a lightweight vision transformer-based sequence-to-action (ViT-S2A) network that integrates a compact ViT encoder with an long short-term memory (LSTM)-based temporal aggregation module to directly predict discrete driving actions. To validate the feasibility of the proposed framework, we construct a synthetic sequence-to-action benchmark, where object trajectories correspond to left, straight, or right movements. Comparative experiments demonstrate that ViT-S2A consistently outperforms a CNN-LSTM baseline in both convergence speed and prediction accuracy, that highlights the effectiveness of global attention in modeling spatiotemporal dependencies. These results indicate that transformer-based architectures offer a promising direction for scalable, data-efficient autonomous driving control models.

5.5Engineering value

8.0Research novelty

5.0Business relevance

Links and sources

Official / arXiv page

Need this topic turned into a technical roadmap?

Full Self Driving can prepare a custom autonomous driving literature review, code map, dataset map, and B2B technology assessment.

Request B2B research

Comments

No comments yet. Be the first to share your thoughts on this paper.