Autonomous driving paper index
Pix2Planning: End-to-End Planning by Vision-language Model for Autonomous Driving on Carla Simulator
One-line summary
Inspired by the great power of the neural language model, we propose an end-to-end framework, which transfers the planning task as a language sequence generation task conditioned on pixel inputs.
Engineering notes
We have conducted extensive experiments on CARLA benchmarks and our model achieves state-of-the-art performance compared with other visual methods.
Chinese explanation / 中文解读
中文解读待补充:本站会优先为端到端自动驾驶、BEV感知、3D目标检测、轨迹预测、路径规划、LiDAR感知等高价值论文补充中文说明。
Original abstract
The end-to-end neural network has become a hot topic in recent years. Compared with traditional module-based solutions, the end-to-end paradigm is able to reduce the accumulated error and avoid information loss, so that it earns great attention in autonomous driving tasks. However, the current end-to-end network designs easily lose useful information during training due to the complexity of mapping high-dimensional visual observation to navigation waypoints. Since the future navigation point is reasoned from the former one, the planning task is like a sequence generation task. Inspired by the great power of the neural language model, we propose an end-to-end framework, which transfers the planning task as a language sequence generation task conditioned on pixel inputs. The proposed method firstly extracts and transforms the image feature from camera-view to bird-eye-view (BEV). Then the target navigation point is constructed into a text sequence, as the prompt of the visual-language transformer. Finally, the auto-regressive transformer decoder receives the BEV feature and the text sequences to generate sequential waypoints. Overall, our proposed method can make full use of the environmental information and express the planning trajectory as a language sequence to learn the correspondence between trajectory sequences and images. We have conducted extensive experiments on CARLA benchmarks and our model achieves state-of-the-art performance compared with other visual methods.
Links and sources
Need this topic turned into a technical roadmap?
Full Self Driving can prepare a custom autonomous driving literature review, code map, dataset map, and B2B technology assessment.
Request B2B research
Comments