Autonomous driving paper index
Comparative evaluation of XGBoost, TabNet, and FT transformer models for fatal crash prediction under extreme class imbalance
One-line summary
Fatal traffic crashes are a rare yet catastrophically consequential event in real-world crash data, typically constituting less than 1% of total records.
Engineering notes
To examine the generalizability of the framework beyond a single jurisdiction and a single time window, the analysis is complemented by (i) a temporal hold-out within Batman (training on 2013–2020, testing on 2021–2022) and (ii) external benchmarking on an independent publicly available rare-event crash corpus ( n = 12,316; fatal rate = 1.28%) [61, 62]; the architectural ranking and rank-based operational gains are reproduced in both regimes.
Chinese explanation / 中文解读
中文解读待补充:本站会优先为端到端自动驾驶、BEV感知、3D目标检测、轨迹预测、路径规划、LiDAR感知等高价值论文补充中文说明。
Original abstract
Fatal traffic crashes are a rare yet catastrophically consequential event in real-world crash data, typically constituting less than 1% of total records. This extreme class imbalance poses a fundamental challenge for machine learning-based severity prediction, as standard algorithms tend to ignore the minority class in favor of maximizing overall accuracy. This study investigates whether modern deep tabular learning architectures (TabNet, FT-Transformer) offer consistent advantages over the traditional gradient boosting method XGBoost in predicting fatal crashes under conditions of extreme class imbalance. The analysis is conducted on 5,676 traffic crashes recorded in Batman province of Türkiye between 2013 and 2022, with a fatal crash rate of only 0.8%. Methodologically, a leakage-controlled design was implemented through ex-ante variable selection, structured missing value handling, and SMOTE-based balancing applied exclusively to the training set. Model performance was evaluated not only with decomposition metrics such as ROC-AUC, but also with PR-AUC, Recall@K/Lift, and cost-sensitive analyses, which are more meaningful for imbalanced data. The results show that FT-Transformer achieved the strongest performance with ROC-AUC = 0.820 (vs. XGBoost: 0.752, TabNet: 0.760) and PR-AUC = 0.031 (approximately 3.9× above the random baseline of 0.008). It captured approximately 44% of fatal crashes in the riskiest 10% of cases, providing a ≈ 4.4-fold lift compared to random selection. Calibration analyses revealed that FT-Transformer produced more reliable risk scores: in the predicted probability band of 0.5–0.8, its observed positive rate reached the 8–15% range, representing a 4–7× elevation above the near-zero rates (0–2%) recorded for XGBoost and TabNet across the same probability range. These findings indicate that transformer-based tabular architectures offer consistent statistical, operational, and cost-sensitive advantages under extreme imbalance, supporting their use as decision-support tools in traffic safety management. To examine the generalizability of the framework beyond a single jurisdiction and a single time window, the analysis is complemented by (i) a temporal hold-out within Batman (training on 2013–2020, testing on 2021–2022) and (ii) external benchmarking on an independent publicly available rare-event crash corpus ( n = 12,316; fatal rate = 1.28%) [61, 62]; the architectural ranking and rank-based operational gains are reproduced in both regimes.
Links and sources
Need this topic turned into a technical roadmap?
Full Self Driving can prepare a custom autonomous driving literature review, code map, dataset map, and B2B technology assessment.
Request B2B research
Comments