Autonomous driving paper index

Risk-Sensitive Distributional Proximal Policy Optimization for Safe Highway Lane-Change Decision-Making

2026-06-22 · Applied Sciences

autonomous drivingreinforcement learning

One-line summary

To address this issue, this paper proposes a Risk-Sensitive Distributional Proximal Policy Optimization (PPO) method, termed Risk-Sensitive Distributional Proximal Policy Optimization (RSDPPO), for highway lane-changing decision-making.

Engineering notes

The results show that, under medium-density traffic, the proposed method outperforms representative baseline algorithms in cumulative reward, success rate, and safety reward.

Chinese explanation / 中文解读

中文解读待补充:本站会优先为端到端自动驾驶、BEV感知、3D目标检测、轨迹预测、路径规划、LiDAR感知等高价值论文补充中文说明。

Original abstract

Decision-making is a critical module for intelligent vehicles to achieve safe and efficient autonomous driving. However, most existing reinforcement learning-based decision-making methods optimize policies by maximizing the expected return, which may inadequately account for low-probability but high-cost safety risks in complex traffic interactions. To address this issue, this paper proposes a Risk-Sensitive Distributional Proximal Policy Optimization (PPO) method, termed Risk-Sensitive Distributional Proximal Policy Optimization (RSDPPO), for highway lane-changing decision-making. Within the PPO framework, a distributional state-value function is introduced to model the return distribution under the current policy, and a Wang distortion-based risk measure is further incorporated to construct a risk-sensitive advantage function. In this way, risk information contained in the return distribution can be propagated into the policy gradient update, guiding the learned policy to avoid high-risk driving behaviors while maintaining training stability. Simulation experiments are conducted in a highway lane-changing scenario with heterogeneous surrounding vehicles. The results show that, under medium-density traffic, the proposed method outperforms representative baseline algorithms in cumulative reward, success rate, and safety reward. Further evaluation under higher-density traffic demonstrates that RSDPPO maintains better overall performance, indicating stronger adaptability to denser traffic conditions. Ablation studies further show that risk-averse distortion improves the balance between safety and efficiency by increasing safety margins during car-following and lane-changing maneuvers. These results indicate that RSDPPO provides an effective risk-sensitive policy optimization framework for safety-oriented highway lane-changing decision-making.

5.0Engineering value
8.0Research novelty
5.0Business relevance

Links and sources

Need this topic turned into a technical roadmap?

Full Self Driving can prepare a custom autonomous driving literature review, code map, dataset map, and B2B technology assessment.

Request B2B research

Comments

No comments yet. Be the first to share your thoughts on this paper.
Login or register to leave a comment