与协作混合分配培训一起

论文标题

与协作混合分配培训一起

DETRs with Collaborative Hybrid Assignments Training

论文作者

Zong, Zhuofan, Song, Guanglu, Liu, Yu

论文摘要

在本文中，我们提供了这样的观察结果，即在DEDR中分配为正样本的查询很少，一对一的匹配导致了编码器输出的稀疏监督，这极大地损害了编码器和副签证的歧视性特征学习，以在解码器中进行注意。为了减轻这一点，我们提出了一种新颖的协作混合分配培训计划，即$ \ Mathcal {c} $ o-detr，以从多功能标签分配的方式中学习更高效，有效的基于DETR的检测器。这种新的培训方案可以通过训练由一到一对稳定的标签分配（例如ATSS和更快的RCNN）监督的多个平行的辅助负责人来轻松增强编码器在端到端探测器中的学习能力。此外，我们通过从这些辅助头中提取正坐标来提高解码器中阳性样品的训练效率来进行额外的定制正疑问。在推断中，这些辅助头被丢弃，因此我们的方法没有向原始检测器引入其他参数和计算成本，而不需要手工制作的非最大抑制（NMS）。我们进行了广泛的实验，以评估所提出的方法对DITR变体的有效性，包括DAB-DER，可变形 - det和Dino-Dino-Demenformable-detr。可可谷的最先进的用SWIN-L的恐龙式可将dino-dement-dem-l的AP提高到59.5％。令人惊讶的是，与VIT-L主链合并，我们在可可测试-DEV上获得了66.0％的AP，而LVIS VAL上的AP在67.9％的AP上获得了67.9％的AP，其表现优于先前的方法，其型号尺寸较少。代码可在\ url {https://github.com/sense-x/co-detr}中找到。

In this paper, we provide the observation that too few queries assigned as positive samples in DETR with one-to-one set matching leads to sparse supervision on the encoder's output which considerably hurt the discriminative feature learning of the encoder and vice visa for attention learning in the decoder. To alleviate this, we present a novel collaborative hybrid assignments training scheme, namely $\mathcal{C}$o-DETR, to learn more efficient and effective DETR-based detectors from versatile label assignment manners. This new training scheme can easily enhance the encoder's learning ability in end-to-end detectors by training the multiple parallel auxiliary heads supervised by one-to-many label assignments such as ATSS and Faster RCNN. In addition, we conduct extra customized positive queries by extracting the positive coordinates from these auxiliary heads to improve the training efficiency of positive samples in the decoder. In inference, these auxiliary heads are discarded and thus our method introduces no additional parameters and computational cost to the original detector while requiring no hand-crafted non-maximum suppression (NMS). We conduct extensive experiments to evaluate the effectiveness of the proposed approach on DETR variants, including DAB-DETR, Deformable-DETR, and DINO-Deformable-DETR. The state-of-the-art DINO-Deformable-DETR with Swin-L can be improved from 58.5% to 59.5% AP on COCO val. Surprisingly, incorporated with ViT-L backbone, we achieve 66.0% AP on COCO test-dev and 67.9% AP on LVIS val, outperforming previous methods by clear margins with much fewer model sizes. Codes are available at \url{https://github.com/Sense-X/Co-DETR}.

下载PDF全文

下载文献需遵守相关版权规定

论文标题