层冻结和数据筛分：缺少用于稀疏训练的通用框架的部分

论文标题

层冻结和数据筛分：缺少用于稀疏训练的通用框架的部分

Layer Freezing & Data Sieving: Missing Pieces of a Generic Framework for Sparse Training

论文作者

Yuan, Geng, Li, Yanyu, Li, Sheng, Kong, Zhenglun, Tulyakov, Sergey, Tang, Xulong, Wang, Yanzhi, Ren, Jian

论文摘要

最近，稀疏训练已成为有希望的范式，可在边缘设备上有效地深入学习。当前的研究主要致力于通过进一步增加模型稀疏性来降低培训成本。但是，增加的稀疏性并不总是理想的，因为它不可避免地会在极高的稀疏度下引入严重的准确性降解。本文打算探索其他可能的方向，以有效有效地降低稀疏培训成本，同时保持准确性。为此，我们研究了两种技术，即层冻结和数据筛分。首先，层冻结方法在密集的模型训练和微调方面取得了成功，但在稀疏训练领域从未采用过。然而，稀疏训练的独特特征可能会阻碍层冷冻技术的结合。因此，我们分析了在稀疏培训中使用层冷冻技术的可行性和潜力，并发现它有可能节省大量培训成本。其次，我们提出了一种用于数据集有效培训的数据筛分方法，该方法通过确保在整个培训过程中仅使用部分数据集来进一步降低培训成本。我们表明，这两种技术都可以很好地整合到稀疏训练算法中，以形成一个通用框架，我们将其配置为SPFDE。我们的广泛实验表明，SPFDE可以显着降低培训成本，同时从三个维度中保留准确性：体重稀疏，层冷冻和数据集筛分。

Recently, sparse training has emerged as a promising paradigm for efficient deep learning on edge devices. The current research mainly devotes efforts to reducing training costs by further increasing model sparsity. However, increasing sparsity is not always ideal since it will inevitably introduce severe accuracy degradation at an extremely high sparsity level. This paper intends to explore other possible directions to effectively and efficiently reduce sparse training costs while preserving accuracy. To this end, we investigate two techniques, namely, layer freezing and data sieving. First, the layer freezing approach has shown its success in dense model training and fine-tuning, yet it has never been adopted in the sparse training domain. Nevertheless, the unique characteristics of sparse training may hinder the incorporation of layer freezing techniques. Therefore, we analyze the feasibility and potentiality of using the layer freezing technique in sparse training and find it has the potential to save considerable training costs. Second, we propose a data sieving method for dataset-efficient training, which further reduces training costs by ensuring only a partial dataset is used throughout the entire training process. We show that both techniques can be well incorporated into the sparse training algorithm to form a generic framework, which we dub SpFDE. Our extensive experiments demonstrate that SpFDE can significantly reduce training costs while preserving accuracy from three dimensions: weight sparsity, layer freezing, and dataset sieving.

下载PDF全文

下载文献需遵守相关版权规定

论文标题