Autonomous driving paper index

Vision Transformer for Sequence-to-Action Networks in Autonomous Driving Control

2025-10-14 · Information and Communication Technology Convergence

end-to-end autonomous drivingautonomous drivingend-to-endvision transformerpredictioncontrol

One-line summary

End-to-end autonomous driving requires robust mapping from sequential visual inputs to control actions.

Engineering notes

To validate the feasibility of the proposed framework, we construct a synthetic sequence-to-action benchmark, where object trajectories correspond to left, straight, or right movements. Comparative experiments demonstrate that ViT-S2A consistently outperforms a CNN-LSTM baseline in both convergence speed and prediction accuracy, that highlights the effectiveness of global attention in modeling spatiotemporal dependencies.

Chinese explanation / 中文解读

中文解读待补充:本站会优先为端到端自动驾驶、BEV感知、3D目标检测、轨迹预测、路径规划、LiDAR感知等高价值论文补充中文说明。

Original abstract

End-to-end autonomous driving requires robust mapping from sequential visual inputs to control actions. While convolutional neural network (CNN)-based encoders combined with temporal models such as LSTMs have been widely adopted, they are limited in capturing long-range spatial dependencies and global context within visual sequences. This paper introduces a lightweight vision transformer-based sequence-to-action (ViT-S2A) network that integrates a compact ViT encoder with an long short-term memory (LSTM)-based temporal aggregation module to directly predict discrete driving actions. To validate the feasibility of the proposed framework, we construct a synthetic sequence-to-action benchmark, where object trajectories correspond to left, straight, or right movements. Comparative experiments demonstrate that ViT-S2A consistently outperforms a CNN-LSTM baseline in both convergence speed and prediction accuracy, that highlights the effectiveness of global attention in modeling spatiotemporal dependencies. These results indicate that transformer-based architectures offer a promising direction for scalable, data-efficient autonomous driving control models.

5.5Engineering value
8.0Research novelty
5.0Business relevance

Links and sources

Need this topic turned into a technical roadmap?

Full Self Driving can prepare a custom autonomous driving literature review, code map, dataset map, and B2B technology assessment.

Request B2B research

Comments

No comments yet. Be the first to share your thoughts on this paper.
Login or register to leave a comment