Autonomous driving paper index
PEC-DETR: Pyramid Edge Enhancement, Dual-Gated Calibration, and Efficient Low-Resolution Attention for Object Detection in Autonomous Driving
One-line summary
To address these issues, we propose PEC-DETR, an enhanced RT-DETR framework incorporating three complementary modules.
Engineering notes
While the Real-Time DEtection TRansformer (RT-DETR) achieves a favourable speed–accuracy trade-off, its direct application to autonomous driving exhibits three limitations: loss of fine-grained boundary information in the backbone, insufficient multi-scale coverage due to the absence of high-resolution P2 features, and quadratic attention complexity in the Attention-based Intra-scale Feature Interaction (AIFI) encoder. Under a five-seed multi-run protocol, experiments on the KITTI and nuScenes benchmarks demonstrate that PEC-DETR achieves 90.5 ± 0.16% mAP50 and 72.4 ± 0.18% mAP50:95 on KITTI (+1.7/+2.7 over RT-DETR-R18) and 56.0 ± 0.22% mAP50 and 37.0 ± 0.24% mAP50:95 on nuScenes (+2.3/+2.2 over RT-DETR-R18), with 20.7 M parameters and 87 GFLOPs, providing competitive performance among methods of comparable complexity.
Chinese explanation / 中文解读
中文解读待补充:本站会优先为端到端自动驾驶、BEV感知、3D目标检测、轨迹预测、路径规划、LiDAR感知等高价值论文补充中文说明。
Original abstract
Abstract Object detection in autonomous driving is a safety-critical visual measurement task that demands high accuracy under extreme scale variation, frequent occlusion, and strict computational constraints. While the Real-Time DEtection TRansformer (RT-DETR) achieves a favourable speed–accuracy trade-off, its direct application to autonomous driving exhibits three limitations: loss of fine-grained boundary information in the backbone, insufficient multi-scale coverage due to the absence of high-resolution P2 features, and quadratic attention complexity in the Attention-based Intra-scale Feature Interaction (AIFI) encoder. To address these issues, we propose PEC-DETR, an enhanced RT-DETR framework incorporating three complementary modules. The Pyramid Edge Feature Injection (PEFI) module extracts fixed Sobel-based edge priors and injects them into pyramid levels {P3, P4, P5} via a Channel-Stacked Edge Aggregation (CSEA) operator, preserving fine-grained structural cues throughout the feature hierarchy. The Dual-Gated Cross-Calibration Feature Pyramid Network (DGCC-FPN) extends the neck to four scales {P2, P3, P4, P5} and introduces a sigmoid-gated sequential cross-calibration mechanism for adaptive, content-aware multi-scale fusion. The Efficient Low-Resolution Attention (ELRA) encoder replaces standard multi-head self-attention in AIFI with a pyramid-pooled asymmetric attention mechanism, reducing encoder complexity from O(N²C) to O(NC). Under a five-seed multi-run protocol, experiments on the KITTI and nuScenes benchmarks demonstrate that PEC-DETR achieves 90.5 ± 0.16% mAP50 and 72.4 ± 0.18% mAP50:95 on KITTI (+1.7/+2.7 over RT-DETR-R18) and 56.0 ± 0.22% mAP50 and 37.0 ± 0.24% mAP50:95 on nuScenes (+2.3/+2.2 over RT-DETR-R18), with 20.7 M parameters and 87 GFLOPs, providing competitive performance among methods of comparable complexity.
Links and sources
Need this topic turned into a technical roadmap?
Full Self Driving can prepare a custom autonomous driving literature review, code map, dataset map, and B2B technology assessment.
Request B2B research
Comments