使用Frank-Wolfe对神经网络的压缩感知培训

论文标题

使用Frank-Wolfe对神经网络的压缩感知培训

Compression-aware Training of Neural Networks using Frank-Wolfe

论文作者

Zimmer, Max, Spiegel, Christoph, Pokutta, Sebastian

论文摘要

许多现有的神经网络修剪方法依赖于重新培训或诱发强大的偏见，以便在整个训练中融合稀疏的解决方案。第三个范式“压缩意识”训练旨在获得最先进的密集模型，这些模型使用单个密集的训练运行，同时避免再培训，从而稳定地对较大的压缩比。我们提出了一个框架，该框架围绕一个多功能的规范约束家族和随机的Frank-Wolfe（SFW）算法，该算法鼓励融合到表现良好的解决方案，同时诱使稳健性降临卷积过滤器修剪和低 /分基矩阵分解。我们的方法能够胜过现有的压缩感知方法，并且在低级别矩阵分解的情况下，与基于核电 - 正则化的方法相比，它的计算资源也要少得多。我们的发现表明，如Pokutta等人的建议，动态调整SFW的学习率。（2020），对于SFW训练模型的收敛性和鲁棒性至关重要，我们为该实践建立了理论基础。

Many existing Neural Network pruning approaches rely on either retraining or inducing a strong bias in order to converge to a sparse solution throughout training. A third paradigm, 'compression-aware' training, aims to obtain state-of-the-art dense models that are robust to a wide range of compression ratios using a single dense training run while also avoiding retraining. We propose a framework centered around a versatile family of norm constraints and the Stochastic Frank-Wolfe (SFW) algorithm that encourage convergence to well-performing solutions while inducing robustness towards convolutional filter pruning and low-rank matrix decomposition. Our method is able to outperform existing compression-aware approaches and, in the case of low-rank matrix decomposition, it also requires significantly less computational resources than approaches based on nuclear-norm regularization. Our findings indicate that dynamically adjusting the learning rate of SFW, as suggested by Pokutta et al. (2020), is crucial for convergence and robustness of SFW-trained models and we establish a theoretical foundation for that practice.

下载PDF全文

下载文献需遵守相关版权规定

论文标题