Autonomous driving paper index
PRIX: Learning to Plan From Raw Pixels for End-to-End Autonomous Driving
One-line summary
To address these challenges, we propose PRIX (Plan from Raw pIXels).
Engineering notes
PRIX achieves SOTA performance on the NavSim-v2 and nuScenes datasets. On NavSim-v1, it also outperforms the majority of multimodal planners and other camera-only approaches on most metrics.
Chinese explanation / 中文解读
中文解读待补充:本站会优先为端到端自动驾驶、BEV感知、3D目标检测、轨迹预测、路径规划、LiDAR感知等高价值论文补充中文说明。
Original abstract
While end-to-end autonomous driving models show promising results, their practical deployment is often hindered by large model sizes, a reliance on expensive LiDAR sensors and computationally intensive BEV feature representations. This limits their scalability, especially for mass-market vehicles equipped only with cameras. To address these challenges, we propose PRIX (Plan from Raw pIXels). Our novel and efficient end-to-end driving architecture operates using only camera data, without explicit BEV representation and forgoing the need for LiDAR. PRIX leverages a visual feature extractor coupled with a generative planning head to predict safe trajectories from raw pixel inputs directly. A core component of our architecture is the Context-aware Recalibration Transformer (CaRT), a novel module designed to effectively enhance multi-level visual features for more robust planning. PRIX achieves SOTA performance on the NavSim-v2 and nuScenes datasets. On NavSim-v1, it also outperforms the majority of multimodal planners and other camera-only approaches on most metrics. Critically, PRIX is significantly more efficient on NavSim-v1, boasting faster inference speeds and a smaller model size. This combination of performance and efficiency makes it a practical solution for real-world deployment.
Links and sources
Need this topic turned into a technical roadmap?
Full Self Driving can prepare a custom autonomous driving literature review, code map, dataset map, and B2B technology assessment.
Request B2B research
Comments