论文标题
通过ILP快速,灵活和精确的最小流量分解
Fast, Flexible, and Exact Minimum Flow Decompositions via ILP
论文作者
论文摘要
最小流量分解(MFD)(找到完美分解流的最小路径集的问题)是计算机科学中的经典问题,它的变体是生物信息学中多重组问题的强大模型(例如,RNA组装)。但是,由于此问题及其变体是NP硬化,因此实用的多重组合工具要么使用启发式方法,要么求解了问题的简单,多项式时间可溶解版本,这可能会产生不是迷你 - 麦尔巴尔的解决方案,或者不完全分解流量。许多RNA组装程序还使用此类实用变体的整数线性编程(ILP)公式,其主要限制是他们需要编码所有可能指数的许多解决方案路径所需的主要限制。此外,MFD的唯一确切求解器不会扩展到大型实例,也不能有效地推广到实用的MFD变体。在这项工作中,我们为MFD提供了第一个实用的ILP公式(因此,是MFD的第一个快速,精确的求解器),基于仅使用二次数量变量的所有指数级编码许多解决方案路径。在模拟和真实流程图上,我们的方法在13秒内求解了任何实例。我们还表明,我们的ILP公式可以轻松有效地适用于许多实际变体,例如合并更长或配对的末端读取或最小化流误差。我们希望我们的结果可以消除当前的多组装模型的复杂性及其障碍性之间的权衡,并且可以属于未来实用的RNA组装工具的核心。
Minimum flow decomposition (MFD) (the problem of finding a minimum set of paths that perfectly decomposes a flow) is a classical problem in Computer Science, and variants of it are powerful models in multiassembly problems in Bioinformatics (e.g. RNA assembly). However, because this problem and its variants are NP-hard, practical multiassembly tools either use heuristics or solve simpler, polynomial-time solvable versions of the problem, which may yield solutions that are not mini-mal or do not perfectly decompose the flow. Many RNA assemblers also use integer linear programming(ILP) formulations of such practical variants, having the major limitation they need to encode all the potentially exponentially many solution paths. Moreover, the only exact solver for MFD does not scale to large instances and cannot be efficiently generalized to practical MFD variants. In this work, we provide the first practical ILP formulation for MFD (and thus the first fast and exact solver for MFD), based on encoding all of the exponentially many solution paths using only a quadratic number of variables. On both simulated and real flow graphs, our approach solves any instance in under 13 seconds. We also show that our ILP formulation can be easily and efficiently adapted for many practical variants, such as incorporating longer or paired-end reads or minimizing flow errors. We hope that our results can remove the current tradeoff between the complexity of a multi assembly model and its tractability and can lie at the core of future practical RNA assembly tools.