Autonomous driving paper index

PEC-DETR: Pyramid Edge Enhancement, Dual-Gated Calibration, and Efficient Low-Resolution Attention for Object Detection in Autonomous Driving

2026-06-24 · Measurement Science and Technology

autonomous drivingobject detectionnusceneskitti

One-line summary

To address these issues, we propose PEC-DETR, an enhanced RT-DETR framework incorporating three complementary modules.

Engineering notes

While the Real-Time DEtection TRansformer (RT-DETR) achieves a favourable speed–accuracy trade-off, its direct application to autonomous driving exhibits three limitations: loss of fine-grained boundary information in the backbone, insufficient multi-scale coverage due to the absence of high-resolution P2 features, and quadratic attention complexity in the Attention-based Intra-scale Feature Interaction (AIFI) encoder. Under a five-seed multi-run protocol, experiments on the KITTI and nuScenes benchmarks demonstrate that PEC-DETR achieves 90.5 ± 0.16% mAP50 and 72.4 ± 0.18% mAP50:95 on KITTI (+1.7/+2.7 over RT-DETR-R18) and 56.0 ± 0.22% mAP50 and 37.0 ± 0.24% mAP50:95 on nuScenes (+2.3/+2.2 over RT-DETR-R18), with 20.7 M parameters and 87 GFLOPs, providing competitive performance among methods of comparable complexity.

Chinese explanation / 中文解读

中文解读待补充:本站会优先为端到端自动驾驶、BEV感知、3D目标检测、轨迹预测、路径规划、LiDAR感知等高价值论文补充中文说明。

Original abstract

Abstract Object detection in autonomous driving is a safety-critical visual measurement task that demands high accuracy under extreme scale variation, frequent occlusion, and strict computational constraints. While the Real-Time DEtection TRansformer (RT-DETR) achieves a favourable speed–accuracy trade-off, its direct application to autonomous driving exhibits three limitations: loss of fine-grained boundary information in the backbone, insufficient multi-scale coverage due to the absence of high-resolution P2 features, and quadratic attention complexity in the Attention-based Intra-scale Feature Interaction (AIFI) encoder. To address these issues, we propose PEC-DETR, an enhanced RT-DETR framework incorporating three complementary modules. The Pyramid Edge Feature Injection (PEFI) module extracts fixed Sobel-based edge priors and injects them into pyramid levels {P3, P4, P5} via a Channel-Stacked Edge Aggregation (CSEA) operator, preserving fine-grained structural cues throughout the feature hierarchy. The Dual-Gated Cross-Calibration Feature Pyramid Network (DGCC-FPN) extends the neck to four scales {P2, P3, P4, P5} and introduces a sigmoid-gated sequential cross-calibration mechanism for adaptive, content-aware multi-scale fusion. The Efficient Low-Resolution Attention (ELRA) encoder replaces standard multi-head self-attention in AIFI with a pyramid-pooled asymmetric attention mechanism, reducing encoder complexity from O(N²C) to O(NC). Under a five-seed multi-run protocol, experiments on the KITTI and nuScenes benchmarks demonstrate that PEC-DETR achieves 90.5 ± 0.16% mAP50 and 72.4 ± 0.18% mAP50:95 on KITTI (+1.7/+2.7 over RT-DETR-R18) and 56.0 ± 0.22% mAP50 and 37.0 ± 0.24% mAP50:95 on nuScenes (+2.3/+2.2 over RT-DETR-R18), with 20.7 M parameters and 87 GFLOPs, providing competitive performance among methods of comparable complexity.

5.5Engineering value
7.0Research novelty
5.0Business relevance

Links and sources

Need this topic turned into a technical roadmap?

Full Self Driving can prepare a custom autonomous driving literature review, code map, dataset map, and B2B technology assessment.

Request B2B research

Comments

No comments yet. Be the first to share your thoughts on this paper.
Login or register to leave a comment