论文标题
高级网络基础设施平台的工作流环境
Workflow environments for advanced cyberinfrastructure platforms
论文作者
论文摘要
科学的进步与有效使用高性能计算基础架构以及有效地从大量数据中提取知识。此类数据来自不同的来源,这些来源遵循由预处理步骤组成的周期,以进行数据策展和随后的计算步骤的准备,以及后来应用于结果的分析和分析步骤。但是,科学工作流程当前在多个组件中分散,具有用于计算和数据管理的不同过程,并且在涉及的用户配置文件的观点中存在差距。我们的愿景是,未来的工作流程环境和开发科学工作流程的工具应遵循整体方法,在该方法中,数据和计算都集成在一个基于简单,高级接口的单个流中。我们提出的研究主题涉及表达整合不同数据和计算过程的工作流程的新方法,动态运行时间,以在绩效和能源方面以有效的方式支持复杂和异构计算基础架构的工作流程。这些基础架构包括高度分布的资源,从传感器和仪器以及边缘的设备到高性能计算和云计算资源。本文提出了我们开发这些工作流环境的愿景,以及我们目前正在遵循的步骤。
Progress in science is deeply bound to the effective use of high-performance computing infrastructures and to the efficient extraction of knowledge from vast amounts of data. Such data comes from different sources that follow a cycle composed of pre-processing steps for data curation and preparation for subsequent computing steps, and later analysis and analytics steps applied to the results. However, scientific workflows are currently fragmented in multiple components, with different processes for computing and data management, and with gaps in the viewpoints of the user profiles involved. Our vision is that future workflow environments and tools for the development of scientific workflows should follow a holistic approach, where both data and computing are integrated in a single flow built on simple, high-level interfaces. The topics of research that we propose involve novel ways to express the workflows that integrate the different data and compute processes, dynamic runtimes to support the execution of the workflows in complex and heterogeneous computing infrastructures in an efficient way, both in terms of performance and energy. These infrastructures include highly distributed resources, from sensors and instruments, and devices in the edge, to High-Performance Computing and Cloud computing resources. This paper presents our vision to develop these workflow environments and also the steps we are currently following to achieve it.