论文标题

USB:统一的半监督学习基准分类

USB: A Unified Semi-supervised Learning Benchmark for Classification

论文作者

Wang, Yidong, Chen, Hao, Fan, Yue, Sun, Wang, Tao, Ran, Hou, Wenxin, Wang, Renjie, Yang, Linyi, Zhou, Zhi, Guo, Lan-Zhe, Qi, Heli, Wu, Zhen, Li, Yu-Feng, Nakamura, Satoshi, Ye, Wei, Savvides, Marios, Raj, Bhiksha, Shinozaki, Takahiro, Schiele, Bernt, Wang, Jindong, Xie, Xing, Zhang, Yue

论文摘要

半监督学习(SSL)通过利用大量未标记的数据来增强有限标记的样品来改善模型的概括。但是,当前,流行的SSL评估协议通常受到计算机视觉(CV)任务的约束。此外,以前的工作通常从头开始训练深层神经网络,这既耗时又不友好。为了解决上述问题,我们通过从简历,自然语言处理(NLP)和音频处理(AUDIO)中选择15种多样化,具有挑战性和全面的任务来构建一个统一的SSL基准(USB)进行分类,并系统地评估了主导的SSL方法,并开放了一个模块化的代码,以进行模块化评估。我们进一步为简历任务提供了最新的神经模型的预训练版本,以使成本负担得起,以进行进一步调整。 USB启用对来自多个域的更多任务的单个SSL算法的评估,但成本较低。具体而言,在单个NVIDIA V100上,仅需要39个GPU天才能在USB中评估15个任务的FixMatch,而在使用Torchss的5 CV任务上,需要335 GPU天(4 CV数据集的279天GPU天)。

Semi-supervised learning (SSL) improves model generalization by leveraging massive unlabeled data to augment limited labeled samples. However, currently, popular SSL evaluation protocols are often constrained to computer vision (CV) tasks. In addition, previous work typically trains deep neural networks from scratch, which is time-consuming and environmentally unfriendly. To address the above issues, we construct a Unified SSL Benchmark (USB) for classification by selecting 15 diverse, challenging, and comprehensive tasks from CV, natural language processing (NLP), and audio processing (Audio), on which we systematically evaluate the dominant SSL methods, and also open-source a modular and extensible codebase for fair evaluation of these SSL methods. We further provide the pre-trained versions of the state-of-the-art neural models for CV tasks to make the cost affordable for further tuning. USB enables the evaluation of a single SSL algorithm on more tasks from multiple domains but with less cost. Specifically, on a single NVIDIA V100, only 39 GPU days are required to evaluate FixMatch on 15 tasks in USB while 335 GPU days (279 GPU days on 4 CV datasets except for ImageNet) are needed on 5 CV tasks with TorchSSL.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源