使用随机特征近似的有效数据集蒸馏

论文标题

使用随机特征近似的有效数据集蒸馏

Efficient Dataset Distillation Using Random Feature Approximation

论文作者

Loo, Noel, Hasani, Ramin, Amini, Alexander, Rus, Daniela

论文摘要

数据集蒸馏将大型数据集压缩到较小的合成核心中，以保留性能，以减少处理整个数据集的存储和计算负担。当今表现最佳的算法，\ textit {kernel诱导点}（KIP），它利用了无限宽度神经网络和核 - ridge回归之间的对应关系，由于精确计算了神经固定的核矩阵，缩放$ O（scaling $ o（| s | s | s | s | s | s | s | s | s | s | s | |为了改善这一点，我们提出了一种新型算法，该算法使用神经网络高斯过程（NNGP）内核的随机特征近似（RFA），该算法将内核矩阵计算降低到$ O（| s |）$。我们的算法至少在KIP上提供100倍的速度，并且可以在单个GPU上运行。我们的新方法称为RFA蒸馏（RFAD），在一系列大尺度数据集中，在内核回归和有限的宽度网络培训中，精度与KIP和其他数据集凝结算法具有竞争性。我们证明了我们的方法对涉及模型可解释性和隐私保护的任务的有效性。

Dataset distillation compresses large datasets into smaller synthetic coresets which retain performance with the aim of reducing the storage and computational burden of processing the entire dataset. Today's best-performing algorithm, \textit{Kernel Inducing Points} (KIP), which makes use of the correspondence between infinite-width neural networks and kernel-ridge regression, is prohibitively slow due to the exact computation of the neural tangent kernel matrix, scaling $O(|S|^2)$, with $|S|$ being the coreset size. To improve this, we propose a novel algorithm that uses a random feature approximation (RFA) of the Neural Network Gaussian Process (NNGP) kernel, which reduces the kernel matrix computation to $O(|S|)$. Our algorithm provides at least a 100-fold speedup over KIP and can run on a single GPU. Our new method, termed an RFA Distillation (RFAD), performs competitively with KIP and other dataset condensation algorithms in accuracy over a range of large-scale datasets, both in kernel regression and finite-width network training. We demonstrate the effectiveness of our approach on tasks involving model interpretability and privacy preservation.

下载PDF全文

下载文献需遵守相关版权规定

论文标题