论文标题
更少的是:使用联合学习的Android恶意软件分类器
Less is More: A privacy-respecting Android malware classifier using Federated Learning
论文作者
论文摘要
在本文中,我们介绍了LIM(“ Limes Is More”),这是一个恶意软件分类框架,利用联盟学习以隐私尊重的方式检测和对恶意应用进行分类。有关新安装的应用程序的信息在本地保存在用户的设备上,因此提供商无法推断用户安装了哪些应用程序。同时,在联合学习过程中考虑了所有用户的输入,并且它们都从更好的分类性能中受益。这种设置的关键挑战是用户无法访问地面真相(即他们无法正确识别应用程序是否是恶意的)。为了解决这个问题,LIM使用安全的半监督合奏,可最大程度地利用由服务提供商培训的基线分类器(即云)。我们实现LIM并表明云服务器的F1分数为95%,而客户端使用25k Clean Clean Apps和25K恶意应用程序,200用户和50轮联邦的数据集中只有1个误报,在> 100个应用中只有1个误报。此外,我们进行了安全分析,并证明LIM与控制一半客户的对手以及由诚实但充满幽默的云服务器执行的推理攻击相对强大。 Mamadroid的数据集进行了进一步的实验,证实了对中毒攻击的抵抗力,并且由于联邦而导致的绩效提高。
In this paper we present LiM ("Less is More"), a malware classification framework that leverages Federated Learning to detect and classify malicious apps in a privacy-respecting manner. Information about newly installed apps is kept locally on users' devices, so that the provider cannot infer which apps were installed by users. At the same time, input from all users is taken into account in the federated learning process and they all benefit from better classification performance. A key challenge of this setting is that users do not have access to the ground truth (i.e. they cannot correctly identify whether an app is malicious). To tackle this, LiM uses a safe semi-supervised ensemble that maximizes classification accuracy with respect to a baseline classifier trained by the service provider (i.e. the cloud). We implement LiM and show that the cloud server has F1 score of 95%, while clients have perfect recall with only 1 false positive in >100 apps, using a dataset of 25K clean apps and 25K malicious apps, 200 users and 50 rounds of federation. Furthermore, we conduct a security analysis and demonstrate that LiM is robust against both poisoning attacks by adversaries who control half of the clients, and inference attacks performed by an honest-but-curious cloud server. Further experiments with MaMaDroid's dataset confirm resistance against poisoning attacks and a performance improvement due to the federation.