测试时间增加的更好聚合

论文标题

测试时间增加的更好聚合

Better Aggregation in Test-Time Augmentation

论文作者

Shanmugam, Divya, Blalock, Davis, Balakrishnan, Guha, Guttag, John

论文摘要

测试时间增强 - 测试输入的转换版本的预测的聚合 - 是图像分类的常见实践。传统上，预测是使用简单的平均值组合在一起的。在本文中，我们提出了1）实验分析，这些分析阐明了简单平均值是次优的情况，而2）一种解决这些缺点的方法。一个关键的发现是，即使测试时间的增加会产生准确性的净提高，它也可以将许多正确的预测更改为错误的预测。我们深入研究了何时以及为什么测试时间扩展将预测从正确变为不正确，反之亦然。在这些见解的基础上，我们提出了一种基于学习的方法，用于汇总测试时间增加。跨多种模型，数据集和增强的实验表明，我们的方法对现有方法进行了一致的改进。

Test-time augmentation -- the aggregation of predictions across transformed versions of a test input -- is a common practice in image classification. Traditionally, predictions are combined using a simple average. In this paper, we present 1) experimental analyses that shed light on cases in which the simple average is suboptimal and 2) a method to address these shortcomings. A key finding is that even when test-time augmentation produces a net improvement in accuracy, it can change many correct predictions into incorrect predictions. We delve into when and why test-time augmentation changes a prediction from being correct to incorrect and vice versa. Building on these insights, we present a learning-based method for aggregating test-time augmentations. Experiments across a diverse set of models, datasets, and augmentations show that our method delivers consistent improvements over existing approaches.

下载PDF全文

下载文献需遵守相关版权规定

论文标题