论文标题
向Docker学习,理解和支持DevOps工件
Learning from, Understanding, and Supporting DevOps Artifacts for Docker
论文作者
论文摘要
随着DevOps工具和框架的日益增长,对支持更多支持代码的工具和技术的需求越来越多。静态开发人员在诸如Docker等工具的静态开发人员协助中的最新最新技术仅限于浅层句法验证。我们在学习,理解和支持开发人员撰写DevOps文物的领域中确定了三个核心挑战:(i)DevOps文物中的嵌套语言,(ii)规则挖掘以及(iii)缺乏基于语义规则的分析。为了应对这些挑战,我们引入了一个工具集Bnnacle,该工具集使我们能够摄入900,000 GitHub存储库。 专注于Docker,我们提取了大约178,000个独特的Dockerfiles,还确定了Docker Experts撰写的一组黄金码头。我们通过一种通过我们称为分阶段解析的技术减少了80%以上的有效无法解释的节点来解决挑战(i)。为了应对挑战(II),我们引入了一种新颖的规则开采技术,能够在我们策划的基准中恢复三分之二的规则。通过这种自动采矿,我们能够恢复手动规则收集期间未找到的16个新规则。为了解决挑战(III),我们手动从投入到黄金集中的文件中手动收集了一组Dockerfiles的规则。这些规则封装了最佳实践,避免建立故障,并改善图像大小并建立潜伏期。我们创建了一个使用这些规则的分析仪,发现Github上的Dockerfiles违反规则的频率是我们黄金集中的Dockerfiles五倍。我们还发现,工业码头犬的表现并不比从Github那里获得的码头表现更好。 在创建Dockerfiles以及以事后的方式来识别现有的dockerfiles中的问题时,可以使用Binnacle中学习的规则和分析仪来帮助IDE中的开发人员。
With the growing use of DevOps tools and frameworks, there is an increased need for tools and techniques that support more than code. The current state-of-the-art in static developer assistance for tools like Docker is limited to shallow syntactic validation. We identify three core challenges in the realm of learning from, understanding, and supporting developers writing DevOps artifacts: (i) nested languages in DevOps artifacts, (ii) rule mining, and (iii) the lack of semantic rule-based analysis. To address these challenges we introduce a toolset, binnacle, that enabled us to ingest 900,000 GitHub repositories. Focusing on Docker, we extracted approximately 178,000 unique Dockerfiles, and also identified a Gold Set of Dockerfiles written by Docker experts. We addressed challenge (i) by reducing the number of effectively uninterpretable nodes in our ASTs by over 80% via a technique we call phased parsing. To address challenge (ii), we introduced a novel rule-mining technique capable of recovering two-thirds of the rules in a benchmark we curated. Through this automated mining, we were able to recover 16 new rules that were not found during manual rule collection. To address challenge (iii), we manually collected a set of rules for Dockerfiles from commits to the files in the Gold Set. These rules encapsulate best practices, avoid docker build failures, and improve image size and build latency. We created an analyzer that used these rules, and found that, on average, Dockerfiles on GitHub violated the rules five times more frequently than the Dockerfiles in our Gold Set. We also found that industrial Dockerfiles fared no better than those sourced from GitHub. The learned rules and analyzer in binnacle can be used to aid developers in the IDE when creating Dockerfiles, and in a post-hoc fashion to identify issues in, and to improve, existing Dockerfiles.