论文标题
可维护的日志数据集评估入侵检测系统
Maintainable Log Datasets for Evaluation of Intrusion Detection Systems
论文作者
论文摘要
入侵检测系统(IDS)监视系统日志和网络流量,以识别计算机网络中的恶意活动。因此,评估和比较IDS相对于其检测精度,对于它们在特定用例中的选择至关重要。尽管非常需要,但几乎没有任何标记的入侵检测数据集公开可用。因此,评估通常是在来自真实基础架构的数据集上进行的,分析师无法控制系统参数或生成可靠的地面真相或私有数据集,以防止结果可重复可重复。作为解决方案,我们提供了代表小型企业的测试台中收集的可维护日志数据集的集合。因此,我们采用广泛的状态机来模拟正常的用户行为并注入多步攻击。对于可扩展的测试床的部署,我们使用模型驱动工程的概念,可以自动生成和标记任意数量的数据集,这些数据集构成了具有参数变化的攻击执行的重复。我们总共提供了8个包含20种不同类型的日志文件的数据集,其中我们为10个唯一的攻击步骤标记了8个文件。我们将标记的日志数据集和代码发布用于在线测试床设置和仿真作为开源,以使其他人能够复制和扩展我们的结果。
Intrusion detection systems (IDS) monitor system logs and network traffic to recognize malicious activities in computer networks. Evaluating and comparing IDSs with respect to their detection accuracies is thereby essential for their selection in specific use-cases. Despite a great need, hardly any labeled intrusion detection datasets are publicly available. As a consequence, evaluations are often carried out on datasets from real infrastructures, where analysts cannot control system parameters or generate a reliable ground truth, or private datasets that prevent reproducibility of results. As a solution, we present a collection of maintainable log datasets collected in a testbed representing a small enterprise. Thereby, we employ extensive state machines to simulate normal user behavior and inject a multi-step attack. For scalable testbed deployment, we use concepts from model-driven engineering that enable automatic generation and labeling of an arbitrary number of datasets that comprise repetitions of attack executions with variations of parameters. In total, we provide 8 datasets containing 20 distinct types of log files, of which we label 8 files for 10 unique attack steps. We publish the labeled log datasets and code for testbed setup and simulation online as open-source to enable others to reproduce and extend our results.