Autonomous driving paper index

Risk-Sensitive Distributional Proximal Policy Optimization for Safe Highway Lane-Change Decision-Making

2026-06-22 · Applied Sciences

autonomous drivingreinforcement learning

One-line summary

To address this issue, this paper proposes a Risk-Sensitive Distributional Proximal Policy Optimization (PPO) method, termed Risk-Sensitive Distributional Proximal Policy Optimization (RSDPPO), for highway lane-changing decision-making.

Engineering notes

The results show that, under medium-density traffic, the proposed method outperforms representative baseline algorithms in cumulative reward, success rate, and safety reward.

Chinese explanation / 中文解读

中文解读待补充：本站会优先为端到端自动驾驶、BEV感知、3D目标检测、轨迹预测、路径规划、LiDAR感知等高价值论文补充中文说明。

Original abstract

Decision-making is a critical module for intelligent vehicles to achieve safe and efficient autonomous driving. However, most existing reinforcement learning-based decision-making methods optimize policies by maximizing the expected return, which may inadequately account for low-probability but high-cost safety risks in complex traffic interactions. To address this issue, this paper proposes a Risk-Sensitive Distributional Proximal Policy Optimization (PPO) method, termed Risk-Sensitive Distributional Proximal Policy Optimization (RSDPPO), for highway lane-changing decision-making. Within the PPO framework, a distributional state-value function is introduced to model the return distribution under the current policy, and a Wang distortion-based risk measure is further incorporated to construct a risk-sensitive advantage function. In this way, risk information contained in the return distribution can be propagated into the policy gradient update, guiding the learned policy to avoid high-risk driving behaviors while maintaining training stability. Simulation experiments are conducted in a highway lane-changing scenario with heterogeneous surrounding vehicles. The results show that, under medium-density traffic, the proposed method outperforms representative baseline algorithms in cumulative reward, success rate, and safety reward. Further evaluation under higher-density traffic demonstrates that RSDPPO maintains better overall performance, indicating stronger adaptability to denser traffic conditions. Ablation studies further show that risk-averse distortion improves the balance between safety and efficiency by increasing safety margins during car-following and lane-changing maneuvers. These results indicate that RSDPPO provides an effective risk-sensitive policy optimization framework for safety-oriented highway lane-changing decision-making.

5.0Engineering value

8.0Research novelty

5.0Business relevance

Links and sources

PDF from original source

Need this topic turned into a technical roadmap?

Full Self Driving can prepare a custom autonomous driving literature review, code map, dataset map, and B2B technology assessment.

Request B2B research

Comments

No comments yet. Be the first to share your thoughts on this paper.