论文标题

通过数值梯度更新联合多维修剪

Joint Multi-Dimension Pruning via Numerical Gradient Update

论文作者

Liu, Zechun, Zhang, Xiangyu, Shen, Zhiqiang, Li, Zhe, Wei, Yichen, Cheng, Kwang-Ting, Sun, Jian

论文摘要

我们提出了联合多维修剪(缩写为联合修理),这是一种在三个关键方面进行修剪网络的有效方法:同时,空间,深度和渠道。为了解决这三个自然不同的维度,我们通过将修剪定义为寻求最佳的修剪向量(即,层的通道数,空间大小,深度的数值),提出了一个一般框架,并构建了从修剪向量到修剪网络结构的唯一映射。然后,我们通过梯度更新和模型关节修剪作为数值梯度优化过程来优化修剪矢量。为了克服挑战,即损失和修剪向量之间没有明确的功能,我们提出了自我适应的随机梯度估计,以通过网络损失构建梯度路径,从而通过网络损失到修剪向量并启用有效的梯度更新。我们表明,联合策略比以前的研究发现了仅关注单个维度的研究的状态,因为我们的方法在单个端到端训练中在三个维度上进行了协作优化,并且比以前的详尽方法更有效。在各种网络体系结构Mobilenet V1&V2&V3和Resnet上进行大规模成像网数据集的广泛实验证明了我们提出的方法的有效性。例如,在极大的压缩比下,我们在已经紧凑的Mobilenet V1&V2上的最新方法比最先进的方法获得了2.5%和2.6%的幅度。

We present joint multi-dimension pruning (abbreviated as JointPruning), an effective method of pruning a network on three crucial aspects: spatial, depth and channel simultaneously. To tackle these three naturally different dimensions, we proposed a general framework by defining pruning as seeking the best pruning vector (i.e., the numerical value of layer-wise channel number, spacial size, depth) and construct a unique mapping from the pruning vector to the pruned network structures. Then we optimize the pruning vector with gradient update and model joint pruning as a numerical gradient optimization process. To overcome the challenge that there is no explicit function between the loss and the pruning vectors, we proposed self-adapted stochastic gradient estimation to construct a gradient path through network loss to pruning vectors and enable efficient gradient update. We show that the joint strategy discovers a better status than previous studies that focused on individual dimensions solely, as our method is optimized collaboratively across the three dimensions in a single end-to-end training and it is more efficient than the previous exhaustive methods. Extensive experiments on large-scale ImageNet dataset across a variety of network architectures MobileNet V1&V2&V3 and ResNet demonstrate the effectiveness of our proposed method. For instance, we achieve significant margins of 2.5% and 2.6% improvement over the state-of-the-art approach on the already compact MobileNet V1&V2 under an extremely large compression ratio.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源