论文标题
在没有上下文信息的情况下检测未知的DGA
Detecting Unknown DGAs without Context Information
论文作者
论文摘要
新恶意软件以快速的速度出现,并且经常结合域的生成算法(DGAS),以避免阻止恶意软件与命令和控制服务器的连接。当前的最新分类器能够将良性与恶意域(二进制分类)分开,并将其质量很高归因于生成它们的DGA(多类分类)。尽管二进制分类器可以将尚未知道的DGA的域标记为恶意,但多类分类器只能为训练时已知的DGA分配域,从而限制了发现新恶意软件系列的能力。在这项工作中,我们对发现新DGA的检测进行了全面研究,其中包括对59,690个分类器的评估。我们检查了15种不同配置中的四种不同方法,并根据软玛克斯分类器和正则表达式(REGEXES)的组合提出了一种简单而有效的方法,以检测具有高概率的多个未知的DGA。同时,我们的方法保留了已知DGA的最先进的分类性能。我们的评估是基于一个总共有94个DGA家族的一组群体交叉验证。通过使用最大数量的已知DGA,我们的评估场景特别困难,并且接近现实世界。所检查的所有方法都是保护隐私的,因为它们在没有上下文的情况下运行,并且仅在要分类的单个领域上。我们通过对课堂学习策略进行了详尽的讨论来汇总我们的研究,这些学习策略可以使现有分类器适应新发现的课程。
New malware emerges at a rapid pace and often incorporates Domain Generation Algorithms (DGAs) to avoid blocking the malware's connection to the command and control (C2) server. Current state-of-the-art classifiers are able to separate benign from malicious domains (binary classification) and attribute them with high probability to the DGAs that generated them (multiclass classification). While binary classifiers can label domains of yet unknown DGAs as malicious, multiclass classifiers can only assign domains to DGAs that are known at the time of training, limiting the ability to uncover new malware families. In this work, we perform a comprehensive study on the detection of new DGAs, which includes an evaluation of 59,690 classifiers. We examine four different approaches in 15 different configurations and propose a simple yet effective approach based on the combination of a softmax classifier and regular expressions (regexes) to detect multiple unknown DGAs with high probability. At the same time, our approach retains state-of-the-art classification performance for known DGAs. Our evaluation is based on a leave-one-group-out cross-validation with a total of 94 DGA families. By using the maximum number of known DGAs, our evaluation scenario is particularly difficult and close to the real world. All of the approaches examined are privacy-preserving, since they operate without context and exclusively on a single domain to be classified. We round up our study with a thorough discussion of class-incremental learning strategies that can adapt an existing classifier to newly discovered classes.