矩阵产品运营商压缩LSTM网络

论文标题

矩阵产品运营商压缩LSTM网络

Compressing LSTM Networks by Matrix Product Operators

论文作者

Gao, Ze-Feng, Sun, Xingwei, Gao, Lan, Li, Junfeng, Lu, Zhong-Yi

论文摘要

长期记忆（LSTM）模型是许多最先进的自然语言处理（NLP）和语音增强（SE）算法的基础。但是，LSTM模型中有大量参数。这通常会消耗大量资源来培训LSTM模型。同样，LSTM模型在推理阶段的计算效率低下。现有的模型压缩方法（例如，模型修剪）只能根据模型参数的幅度来区分，而忽略了基于模型信息的重要性分布问题。在这里，我们介绍了MPO分解，该分解描述了量子多体物理学中量子状态的局部相关性，并用于表示神经网络中的大型模型参数矩阵，该矩阵可以通过截断权重矩阵中的不重要信息来压缩神经网络。在本文中，我们提出了基于矩阵的产品操作员（MPO）的神经网络体系结构，以替换LSTM模型。 MPO对神经网络的有效表示可以有效地减少训练LSTM模型的计算消耗，另一方面，在模型的推理阶段加快计算。我们将基于MPO-LSTM模型的压缩模型与传统LSTM模型与传统的LSTM模型进行了比较，并在我们的实验中使用了序列分类，序列预测和语音增强任务的修剪方法。实验结果表明，我们提出的基于MPO方法的神经网络体系结构极大地超过了修剪方法。

Long Short Term Memory(LSTM) models are the building blocks of many state-of-the-art natural language processing(NLP) and speech enhancement(SE) algorithms. However, there are a large number of parameters in an LSTM model. This usually consumes a large number of resources to train the LSTM model. Also, LSTM models suffer from computational inefficiency in the inference phase. Existing model compression methods (e.g., model pruning) can only discriminate based on the magnitude of model parameters, ignoring the issue of importance distribution based on the model information. Here we introduce the MPO decomposition, which describes the local correlation of quantum states in quantum many-body physics and is used to represent the large model parameter matrix in a neural network, which can compress the neural network by truncating the unimportant information in the weight matrix. In this paper, we propose a matrix product operator(MPO) based neural network architecture to replace the LSTM model. The effective representation of neural networks by MPO can effectively reduce the computational consumption of training LSTM models on the one hand, and speed up the computation in the inference phase of the model on the other hand. We compare the MPO-LSTM model-based compression model with the traditional LSTM model with pruning methods on sequence classification, sequence prediction, and speech enhancement tasks in our experiments. The experimental results show that our proposed neural network architecture based on the MPO approach significantly outperforms the pruning approach.

下载PDF全文

下载文献需遵守相关版权规定

论文标题