论文标题

通过二进制嵌入和三元模型系数的内存和计算有效的内核SVM

Memory and Computation-Efficient Kernel SVM via Binary Embedding and Ternary Model Coefficients

论文作者

Lei, Zijian, Lan, Liang

论文摘要

内核近似被广泛用于扩展内核SVM训练和预测。但是,如果要在内存限制的设备(例如手机,智能手表和IoT设备)上部署内核近似模型的内存和计算成本仍然太高。为了应对这一挑战,我们通过使用二进制嵌入和二进制模型系数提出了一种新颖的内存和计算效率的内核SVM模型。首先,我们提出了一种生成数据的紧凑型二进制嵌入的有效方法,从而保留了内核相似性。其次,我们提出了一种简单但有效的算法,以学习具有三元系数的线性分类模型,该模型可以支持不同类型的损耗函数和正常化器。我们的算法比在培训阶段允许系数为$ -1 $,$ 0 $或$ 1 $的现有工作的现有工程可以实现更好的概括精度,并且在二进制分类模型推论期间,可以删除系数$ 0 $。此外,我们对算法的收敛性和模型的推理复杂性提供了详细的分析。分析表明,可以保证融合到局部最佳限度,并且我们模型的推理复杂性远低于其他竞争方法。我们对五个大型现实世界数据集的实验结果表明,我们提出的方法可以构建准确的非线性SVM模型,而内存成本小于30kb。

Kernel approximation is widely used to scale up kernel SVM training and prediction. However, the memory and computation costs of kernel approximation models are still too high if we want to deploy them on memory-limited devices such as mobile phones, smartwatches, and IoT devices. To address this challenge, we propose a novel memory and computation-efficient kernel SVM model by using both binary embedding and binary model coefficients. First, we propose an efficient way to generate compact binary embedding of the data, preserving the kernel similarity. Second, we propose a simple but effective algorithm to learn a linear classification model with ternary coefficients that can support different types of loss function and regularizer. Our algorithm can achieve better generalization accuracy than existing works on learning binary coefficients since we allow coefficient to be $-1$, $0$, or $1$ during the training stage, and coefficient $0$ can be removed during model inference for binary classification. Moreover, we provide a detailed analysis of the convergence of our algorithm and the inference complexity of our model. The analysis shows that the convergence to a local optimum is guaranteed, and the inference complexity of our model is much lower than other competing methods. Our experimental results on five large real-world datasets have demonstrated that our proposed method can build accurate nonlinear SVM models with memory costs less than 30KB.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源