Autonomous driving paper index
Vision Transformer for Sequence-to-Action Networks in Autonomous Driving Control
One-line summary
End-to-end autonomous driving requires robust mapping from sequential visual inputs to control actions.
Engineering notes
To validate the feasibility of the proposed framework, we construct a synthetic sequence-to-action benchmark, where object trajectories correspond to left, straight, or right movements. Comparative experiments demonstrate that ViT-S2A consistently outperforms a CNN-LSTM baseline in both convergence speed and prediction accuracy, that highlights the effectiveness of global attention in modeling spatiotemporal dependencies.
Chinese explanation / 中文解读
中文解读待补充:本站会优先为端到端自动驾驶、BEV感知、3D目标检测、轨迹预测、路径规划、LiDAR感知等高价值论文补充中文说明。
Original abstract
End-to-end autonomous driving requires robust mapping from sequential visual inputs to control actions. While convolutional neural network (CNN)-based encoders combined with temporal models such as LSTMs have been widely adopted, they are limited in capturing long-range spatial dependencies and global context within visual sequences. This paper introduces a lightweight vision transformer-based sequence-to-action (ViT-S2A) network that integrates a compact ViT encoder with an long short-term memory (LSTM)-based temporal aggregation module to directly predict discrete driving actions. To validate the feasibility of the proposed framework, we construct a synthetic sequence-to-action benchmark, where object trajectories correspond to left, straight, or right movements. Comparative experiments demonstrate that ViT-S2A consistently outperforms a CNN-LSTM baseline in both convergence speed and prediction accuracy, that highlights the effectiveness of global attention in modeling spatiotemporal dependencies. These results indicate that transformer-based architectures offer a promising direction for scalable, data-efficient autonomous driving control models.
Links and sources
Need this topic turned into a technical roadmap?
Full Self Driving can prepare a custom autonomous driving literature review, code map, dataset map, and B2B technology assessment.
Request B2B research
Comments