一项关于分配转移鲁棒性从培训和数据增强的角度的实证研究

论文标题

一项关于分配转移鲁棒性从培训和数据增强的角度的实证研究

An Empirical Study on Distribution Shift Robustness From the Perspective of Pre-Training and Data Augmentation

论文作者

Liu, Ziquan, Xu, Yi, Xu, Yuanhong, Qian, Qi, Li, Hao, Jin, Rong, Ji, Xiangyang, Chan, Antoni B.

论文摘要

近年来，在分销转移的机器学习模型的性能一直是社区的重点。已经提出了大多数当前方法，以提高分布从算法观点转移的鲁棒性，即设计更好的培训算法以帮助转移的测试分布中的概括。本文从预训练和数据增强的角度研究了分布转移问题，这是深度学习实践中的两个重要因素，这些因素尚未被现有工作系统地研究。通过在五个重要的分配偏移数据集中评估七个预训练的模型，包括重新设备和VIT，并通过五种不同的学习算法从五个重要的分配偏移数据集上评估，并在五个重要的分配偏移数据集中评估了第一个针对预培训和数据增强的综合经验研究。通过我们从1,330个模型获得的经验结果，我们提供以下主要观察结果：1）如果我们选择尊重数据属性的适当的预训练模型，则将ERM与数据增强相结合可以实现最先进的性能； 2）专业算法在处理特定类型的分布转移时，进一步提高了ERM之上的鲁棒性，例如，用于大规模分布数据的伪造相关性和珊瑚的GroupDro； 3）比较不同的预训练模式，架构和数据尺寸，我们提供了有关分配转移预训练的新颖观察结果，该观察阐明了针对不同类型的分配变化设计或选择预训练策略。总而言之，我们的实证研究为广泛的培训模型进行了全面的基线，并通过数据扩展进行了微调，这可能会激发研究在分配转移研究的未来中利用预训练和数据增强功能的研究。

The performance of machine learning models under distribution shift has been the focus of the community in recent years. Most of current methods have been proposed to improve the robustness to distribution shift from the algorithmic perspective, i.e., designing better training algorithms to help the generalization in shifted test distributions. This paper studies the distribution shift problem from the perspective of pre-training and data augmentation, two important factors in the practice of deep learning that have not been systematically investigated by existing work. By evaluating seven pre-trained models, including ResNets and ViT's with self-supervision and supervision mode, on five important distribution-shift datasets, from WILDS and DomainBed benchmarks, with five different learning algorithms, we provide the first comprehensive empirical study focusing on pre-training and data augmentation. With our empirical result obtained from 1,330 models, we provide the following main observations: 1) ERM combined with data augmentation can achieve state-of-the-art performance if we choose a proper pre-trained model respecting the data property; 2) specialized algorithms further improve the robustness on top of ERM when handling a specific type of distribution shift, e.g., GroupDRO for spurious correlation and CORAL for large-scale out-of-distribution data; 3) Comparing different pre-training modes, architectures and data sizes, we provide novel observations about pre-training on distribution shift, which sheds light on designing or selecting pre-training strategy for different kinds of distribution shifts. In summary, our empirical study provides a comprehensive baseline for a wide range of pre-training models fine-tuned with data augmentation, which potentially inspires research exploiting the power of pre-training and data augmentation in the future of distribution shift study.

下载PDF全文

下载文献需遵守相关版权规定

论文标题