旋转对象检测的KFIOU损失

论文标题

旋转对象检测的KFIOU损失

The KFIoU Loss for Rotated Object Detection

论文作者

Yang, Xue, Zhou, Yue, Zhang, Gefan, Yang, Jirui, Wang, Wentao, Yan, Junchi, Zhang, Xiaopeng, Tian, Qi

论文摘要

与发达的水平对象检测区域不同，基于计算的IOU损失很容易采用，并且与检测指标相适应。相比之下，旋转探测器通常涉及基于Skewiou的更复杂的损失，这对基于梯度的训练不友好。在本文中，我们提出了基于高斯建模和高斯产品的有效近似偏斜损失，该产品主要由两个项目组成。第一个学期是对比例尺不敏感的中心点损失，用于快速缩小两个边界框的中心点之间的距离。在独立于距离的第二项中，采用高斯分布的乘积可以固有地通过其定义模仿Skewiou的机制，并显示了其与一定距离内趋势级别的skewiou损失的一致性（即9像素内）。这与最近基于高斯建模的旋转探测器相反，例如GWD损失和KLD损失涉及人类指定的分布距离度量，该距离需要在数据集和检测器中进行其他多参数调整。与确切的偏压损失相比，所得的称为KFIOU损失的新损失更容易实施，并且由于其完全不同的能力和处理非重叠案例的能力而变得更好。我们进一步将技术扩展到3-D情况，该案例也与2-D相同的问题遭受。各种公共数据集（2-D/3-D，空中/文本/面部图像）的广泛结果显示了我们方法的有效性。

Differing from the well-developed horizontal object detection area whereby the computing-friendly IoU based loss is readily adopted and well fits with the detection metrics. In contrast, rotation detectors often involve a more complicated loss based on SkewIoU which is unfriendly to gradient-based training. In this paper, we propose an effective approximate SkewIoU loss based on Gaussian modeling and Gaussian product, which mainly consists of two items. The first term is a scale-insensitive center point loss, which is used to quickly narrow the distance between the center points of the two bounding boxes. In the distance-independent second term, the product of the Gaussian distributions is adopted to inherently mimic the mechanism of SkewIoU by its definition, and show its alignment with the SkewIoU loss at trend-level within a certain distance (i.e. within 9 pixels). This is in contrast to recent Gaussian modeling based rotation detectors e.g. GWD loss and KLD loss that involve a human-specified distribution distance metric which require additional hyperparameter tuning that vary across datasets and detectors. The resulting new loss called KFIoU loss is easier to implement and works better compared with exact SkewIoU loss, thanks to its full differentiability and ability to handle the non-overlapping cases. We further extend our technique to the 3-D case which also suffers from the same issues as 2-D. Extensive results on various public datasets (2-D/3-D, aerial/text/face images) with different base detectors show the effectiveness of our approach.

下载PDF全文

下载文献需遵守相关版权规定

论文标题