关于深层合奏的权力法律

论文标题

关于深层合奏的权力法律

On Power Laws in Deep Ensembles

论文作者

Lobacheva, Ekaterina, Chirkova, Nadezhda, Kodryan, Maxim, Vetrov, Dmitry

论文摘要

已知深度神经网络的合奏可以在不确定性估计中实现最新性能并导致准确性提高。在这项工作中，我们专注于分类问题，并研究深层集合的未校准和校准的负模样（CNLL）的行为，这是整体大小和成员网络大小的函数。我们指出CNLL遵循权力法W.R.T.的条件集合大小或成员网络大小，并分析发现功率定律参数的动力学。我们重要的实际发现是，一个大型网络的性能可能比几个具有相同总参数总数的中型网络的集合更糟（我们称此合奏为内存拆分）。使用检测到的类似于功率定律的依赖性，我们可以预测（1）基于相对较少的训练有素的网络，从具有给定结构的网络的结合中，最佳内存分配给定内存预算。我们在ARXIV中的更多详细信息中描述了内存拆分优势效应：2005.07292

Ensembles of deep neural networks are known to achieve state-of-the-art performance in uncertainty estimation and lead to accuracy improvement. In this work, we focus on a classification problem and investigate the behavior of both non-calibrated and calibrated negative log-likelihood (CNLL) of a deep ensemble as a function of the ensemble size and the member network size. We indicate the conditions under which CNLL follows a power law w.r.t. ensemble size or member network size, and analyze the dynamics of the parameters of the discovered power laws. Our important practical finding is that one large network may perform worse than an ensemble of several medium-size networks with the same total number of parameters (we call this ensemble a memory split). Using the detected power law-like dependencies, we can predict (1) the possible gain from the ensembling of networks with given structure, (2) the optimal memory split given a memory budget, based on a relatively small number of trained networks. We describe the memory split advantage effect in more details in arXiv:2005.07292

下载PDF全文

下载文献需遵守相关版权规定

论文标题