Autonomous driving paper index
Pixel-to-Control: End-to-End Autonomous Driving via Spatio-Temporal BEV Architecture for Control Sequence Prediction
One-line summary
This paper proposes a lightweight hierarchical transformer framework for fully end-to-end autonomous driving in urban environments.
Engineering notes
Key topics: end-to-end autonomous driving, autonomous driving, bev, end-to-end driving, end-to-end, perception, prediction, planning, control. See the paper for implementation details and experimental results.
Chinese explanation / 中文解读
中文解读待补充:本站会优先为端到端自动驾驶、BEV感知、3D目标检测、轨迹预测、路径规划、LiDAR感知等高价值论文补充中文说明。
Original abstract
This paper proposes a lightweight hierarchical transformer framework for fully end-to-end autonomous driving in urban environments. Unlike prior black-box approaches, the proposed method maintains modular interpretability while directly predicting low-level control commands from multi-view camera inputs. Visual features are encoded into a BEV representation, which is shared across a planning transformer and a control transformer to generate future trajectories and control actions, respectively. To enhance training efficiency, auxiliary perception tasks, such as BEV-based map and object decoding, are introduced only during the training phase. These tasks improve representation learning without increasing inference cost. The proposed framework is validated in a closed-loop simulation environment, achieving real-time performance at 15 Hz across urban scenarios, including intersections and highways. Experimental results demonstrate the strong potential of the method for scalable and interpretable end-to-end driving.
Links and sources
Need this topic turned into a technical roadmap?
Full Self Driving can prepare a custom autonomous driving literature review, code map, dataset map, and B2B technology assessment.
Request B2B research
Comments