在对抗标签污染下支持向量机

论文标题

在对抗标签污染下支持向量机

Support Vector Machines under Adversarial Label Contamination

论文作者

Xiao, Huang, Biggio, Battista, Nelson, Blaine, Xiao, Han, Eckert, Claudia, Roli, Fabio

论文摘要

机器学习算法越来越多地应用于安全相关的任务，例如垃圾邮件和恶意软件检测，尽管尚未广泛理解其针对故意攻击的安全性。智能和自适应攻击者确实可以利用机器学习技术暴露的特定漏洞来违反系统安全。因此，对对抗数据操作的强大是重要的，对于在对抗设置中成功运行的机器学习算法的另一个重要要求。在这项工作中，我们将支持向量机（SVM）的安全性评估为精心制作的对抗标签噪声攻击。特别是，我们考虑了一个攻击者，旨在通过在培训数据中翻转许多标签来最大化SVM的分类错误。我们对相应的最佳攻击策略进行形式化，并通过启发式方法来解决计算复杂性。我们报告了对合成和现实世界数据集对线性和非线性SVM的攻击的有效性的广泛实验分析。我们最终认为，我们的方法还可以提供有用的见解，以开发更安全的SVM学习算法，并在许多相关研究领域（例如半监督和积极学习）中采用新颖的技术。

Machine learning algorithms are increasingly being applied in security-related tasks such as spam and malware detection, although their security properties against deliberate attacks have not yet been widely understood. Intelligent and adaptive attackers may indeed exploit specific vulnerabilities exposed by machine learning techniques to violate system security. Being robust to adversarial data manipulation is thus an important, additional requirement for machine learning algorithms to successfully operate in adversarial settings. In this work, we evaluate the security of Support Vector Machines (SVMs) to well-crafted, adversarial label noise attacks. In particular, we consider an attacker that aims to maximize the SVM's classification error by flipping a number of labels in the training data. We formalize a corresponding optimal attack strategy, and solve it by means of heuristic approaches to keep the computational complexity tractable. We report an extensive experimental analysis on the effectiveness of the considered attacks against linear and non-linear SVMs, both on synthetic and real-world datasets. We finally argue that our approach can also provide useful insights for developing more secure SVM learning algorithms, and also novel techniques in a number of related research areas, such as semi-supervised and active learning.

下载PDF全文

下载文献需遵守相关版权规定

论文标题