一种用于检测移动伪造应用程序的多模式神经嵌入方法：Google Play商店的案例研究

论文标题

一种用于检测移动伪造应用程序的多模式神经嵌入方法：Google Play商店的案例研究

A Multi-modal Neural Embeddings Approach for Detecting Mobile Counterfeit Apps: A Case Study on Google Play Store

论文作者

Karunanayake, Naveen, Rajasegaran, Jathushan, Gunathillake, Ashanie, Seneviratne, Suranga, Jourjon, Guillaume

论文摘要

假冒应用程序冒充现有流行的应用程序，以误导用户安装它们，以安装它们，例如收集个人信息或传播恶意软件。一旦安装，就可以识别出许多假冒物，但是即使是精通技术的用户也可能在安装之前很难检测到它们。为此，本文提议利用深度学习方法的最新进展来创建图像和文本嵌入，以便在提交提交出版时可以有效地识别假冒应用程序。我们表明，将内容嵌入和样式嵌入的一种新颖的方法优于图像相似性的基线方法，例如筛分，冲浪和各种图像哈希方法。我们首先在两个众所周知的数据集上评估了所提出的方法的性能，以评估图像相似性方法，并表明在检索五个最近的邻居时，内容，样式和组合的嵌入式分别提高了Precision@K，并分别将@K提高到10％-15％和12％-25％。其次，特别是针对应用程序假冒检测问题，与基线方法相比，Precision@k和Recker@K的组合内容和样式嵌入分别提高了12％和14％。第三，我们对Google Play商店的大约120万个应用程序进行了分析，并确定了一组潜在的伪造者，用于10,000个流行应用程序。在一个保守的假设下，我们能够找到2,040个潜在的伪造者，其中包含一组49,608个应用程序中的恶意软件，这些应用程序与Google Play商店中的前10,000个流行应用中的一个相似。我们还发现1,565个潜在的伪造者要求至少五个额外的危险权限，而原始应用程序和1,407个潜在的伪造者至少有五个额外的第三方广告库。

Counterfeit apps impersonate existing popular apps in attempts to misguide users to install them for various reasons such as collecting personal information or spreading malware. Many counterfeits can be identified once installed, however even a tech-savvy user may struggle to detect them before installation. To this end, this paper proposes to leverage the recent advances in deep learning methods to create image and text embeddings so that counterfeit apps can be efficiently identified when they are submitted for publication. We show that a novel approach of combining content embeddings and style embeddings outperforms the baseline methods for image similarity such as SIFT, SURF, and various image hashing methods. We first evaluate the performance of the proposed method on two well-known datasets for evaluating image similarity methods and show that content, style, and combined embeddings increase precision@k and recall@k by 10%-15% and 12%-25%, respectively when retrieving five nearest neighbours. Second, specifically for the app counterfeit detection problem, combined content and style embeddings achieve 12% and 14% increase in precision@k and recall@k, respectively compared to the baseline methods. Third, we present an analysis of approximately 1.2 million apps from Google Play Store and identify a set of potential counterfeits for top-10,000 popular apps. Under a conservative assumption, we were able to find 2,040 potential counterfeits that contain malware in a set of 49,608 apps that showed high similarity to one of the top-10,000 popular apps in Google Play Store. We also find 1,565 potential counterfeits asking for at least five additional dangerous permissions than the original app and 1,407 potential counterfeits having at least five extra third party advertisement libraries.

下载PDF全文

下载文献需遵守相关版权规定

论文标题