论文标题
Batchensemble:一种有效合奏和终身学习的替代方法
BatchEnsemble: An Alternative Approach to Efficient Ensemble and Lifelong Learning
论文作者
论文摘要
在单独训练多个神经网络并将其预测进行平均的合奏中,已被证明在提高单个神经网络的准确性和预测性不确定性方面已广泛成功。但是,培训和测试的合奏成本随着网络数量的线性增加,这很快就变得站不住脚。 在本文中,我们提出了batchensemble,这是一种合奏方法,其计算和内存成本明显低于典型的合奏。 batchensemble通过将每个重量矩阵定义为所有合奏成员之间共享权重的哈达姆产物和每个成员的排名矩阵。与合奏不同,Batchensemble不仅可以在设备之间并行,其中一个设备会在设备中训练一个成员,而且在设备中可以并行,在该设备中,对于给定的迷你批次,同时对多个集合成员进行了同时更新。在CIFAR-10,CIFAR-100,WMT14 EN-DE/EN-FR翻译和分布外任务中,Batchensemble产生竞争精度和不确定性,作为典型的合奏;测试时间的加速度为3倍,在4号尺寸的合奏中,记忆的降低为3倍。我们还将批评者申请到终身学习中,在分裂-CIFAR-100上,批处理的表现可与渐进的神经网络相当,同时具有较低的计算和内存成本。我们进一步表明,Batchensemble可以轻松地扩展到涉及100个顺序学习任务的分裂成像上的终身学习。
Ensembles, where multiple neural networks are trained individually and their predictions are averaged, have been shown to be widely successful for improving both the accuracy and predictive uncertainty of single neural networks. However, an ensemble's cost for both training and testing increases linearly with the number of networks, which quickly becomes untenable. In this paper, we propose BatchEnsemble, an ensemble method whose computational and memory costs are significantly lower than typical ensembles. BatchEnsemble achieves this by defining each weight matrix to be the Hadamard product of a shared weight among all ensemble members and a rank-one matrix per member. Unlike ensembles, BatchEnsemble is not only parallelizable across devices, where one device trains one member, but also parallelizable within a device, where multiple ensemble members are updated simultaneously for a given mini-batch. Across CIFAR-10, CIFAR-100, WMT14 EN-DE/EN-FR translation, and out-of-distribution tasks, BatchEnsemble yields competitive accuracy and uncertainties as typical ensembles; the speedup at test time is 3X and memory reduction is 3X at an ensemble of size 4. We also apply BatchEnsemble to lifelong learning, where on Split-CIFAR-100, BatchEnsemble yields comparable performance to progressive neural networks while having a much lower computational and memory costs. We further show that BatchEnsemble can easily scale up to lifelong learning on Split-ImageNet which involves 100 sequential learning tasks.