增强价值多少数据？对缩放法，不变性和隐式正则化的调查

论文标题

增强价值多少数据？对缩放法，不变性和隐式正则化的调查

How Much Data Are Augmentations Worth? An Investigation into Scaling Laws, Invariance, and Implicit Regularization

论文作者

Geiping, Jonas, Goldblum, Micah, Somepalli, Gowthami, Shwartz-Ziv, Ravid, Goldstein, Tom, Wilson, Andrew Gordon

论文摘要

尽管数据增加具有明显的性能优势，但对它们如此有效的原因知之甚少。在本文中，我们解散了数据增强运行的几种关键机制。建立增强和其他实际数据之间的汇率，我们发现在分布外测试方案中，产生多种样本但与数据分布不一致的样本的增强可能比其他培训数据更有价值。此外，我们发现鼓励不变的数据增加可能比不变性更有价值，尤其是在中小型训练集上。在观察之后，我们表明增强在训练过程中会引起更多的随机性，从而有效地平坦了损失格局。

Despite the clear performance benefits of data augmentations, little is known about why they are so effective. In this paper, we disentangle several key mechanisms through which data augmentations operate. Establishing an exchange rate between augmented and additional real data, we find that in out-of-distribution testing scenarios, augmentations which yield samples that are diverse, but inconsistent with the data distribution can be even more valuable than additional training data. Moreover, we find that data augmentations which encourage invariances can be more valuable than invariance alone, especially on small and medium sized training sets. Following this observation, we show that augmentations induce additional stochasticity during training, effectively flattening the loss landscape.

下载PDF全文

下载文献需遵守相关版权规定

论文标题