论文标题
Flowviz:系统发育处理的框架
FLOWViZ: Framework for Phylogenetic Processing
论文作者
论文摘要
流行病的风险和快速增长的世界人口的日益增长的风险为系统发育分析提供了巨大的投资,以追踪多种疾病并怀孕有效的药物和治疗方法。 系统发育分析需要使用合适的技术以及如今的特定软件和算法来分析和处理知识提取的大量信息,以尽可能高效,快速地提供结果。这些算法和技术已经由几种免费且可用的框架和工具提供。通常,系统发育分析的过程包括几个处理步骤,这些步骤定义了管道。某些系统发育框架已获得多个处理步骤,例如推断系统发育树,数据整合和可视化,但是由于涉及的数据量的持续增长,每个步骤可能会持续数小时或几天。 科学工作流程系统可以同时使用高性能计算设施(如果有)来处理大量数据。但是,这些科学工作流程系统中的大多数无法轻松安装和配置,可以作为集中服务提供,并且通常不容易整合在系统发育框架中可用的工具和处理步骤。 本文总结了Flowviz框架的论文文档,该论文的主要目标是在系统发育框架和科学工作流系统之间提供软件集成框架。该框架使建立具有更少代码行的自定义集成,同时通过工作流构建和执行提供现有的系统发育框架,以管理大量数据的处理。 该项目得到了资金的支持,用于FCT -NGPHYLO PTDC/CCI -BIO/29676/2017和IPL项目-IPL/2021/Diva的学生资助。
The increasing risk of epidemics and a fast-growing world population has contributed to a great investment in phylogenetic analysis, in order to track numerous diseases and conceive effective medication and treatments. Phylogenetic analysis requires large quantities of information to be analyzed and processed for knowledge extraction, using suitable techniques and, nowadays, specific software and algorithms, to deliver results as efficiently and fast as possible. These algorithms and techniques are already provided by several free and available frameworks and tools. Usually, the process of phylogenetic analysis consists of several processing steps, which define a pipeline. Some phylogenetic frameworks have available more than one processing step, such as inferring phylogenetic trees, data integration, and visualization, but due to the continuous growth in involved data amounts, each step may last several hours or days. Scientific workflow systems may use high performance computing facilities, if available, for processing large volumes of data, concurrently. But most of these scientific workflow systems cannot be easily installed and configured, are available as centralized services, and, usually, it is not easy to integrate tools and processing steps available in phylogenetic frameworks. This paper summarizes the thesis document of the FLOWViZ framework, which main goal is to provide a software integration framework between a phylogenetic framework and a scientific workflow system. This framework makes it possible to build a customized integration with much fewer lines of code, while providing existing phylogenetic frameworks with workflow building and execution, to manage the processing of great amounts of data. The project was supported by funds, for a student grant of FCT - NGPHYLO PTDC/CCI-BIO/29676/2017 and an IPL project - IPL/2021/DIVA.