论文标题
线性时间不完整的定向系统发育
Incomplete Directed Perfect Phylogeny in Linear Time
论文作者
论文摘要
重建一组物种的进化历史是计算生物学的核心任务。在实际数据中,通常情况下缺少某些信息:鉴于一组具有一些未知状态的二元字符所描述的物种的集合,不完整的定向系统发育(IDPP)问题提出了问题,以使丢失的状态以这种方式可以通过完美的定向系统发育来解释结果。 Pe'er等。提出了一种解决方案,该解决方案将$ \ tilde {o}(nm)$ $ n $种类和$ m $字符的时间。它们的算法依赖于先前存在的动态连通性数据结构:Fern {Á} ndez-baca最近进行的一项计算研究表明,在这种情况下,复杂的数据结构的性能比渐近差的简单数据差。 这使我们有动力去研究此设置中动态连接问题的特定属性,以免将复杂的数据结构用作黑框。我们不仅成功地做到了,并给出了IDPP问题的更简单的$ \ tilde {o}(nm)$ - 时间算法;我们对问题的特定结构的洞察力导致了一种以最佳$ O(NM)$时间运行的渐近算法。
Reconstructing the evolutionary history of a set of species is a central task in computational biology. In real data, it is often the case that some information is missing: the Incomplete Directed Perfect Phylogeny (IDPP) problem asks, given a collection of species described by a set of binary characters with some unknown states, to complete the missing states in such a way that the result can be explained with a perfect directed phylogeny. Pe'er et al. proposed a solution that takes $\tilde{O}(nm)$ time for $n$ species and $m$ characters. Their algorithm relies on pre-existing dynamic connectivity data structures: a computational study recently conducted by Fern{á}ndez-Baca and Liu showed that, in this context, complex data structures perform worse than simpler ones with worse asymptotic bounds. This gives us the motivation to look into the particular properties of the dynamic connectivity problem in this setting, so as to avoid the use of sophisticated data structures as a blackbox. Not only are we successful in doing so, and give a much simpler $\tilde{O}(nm)$-time algorithm for the IDPP problem; our insights into the specific structure of the problem lead to an asymptotically faster algorithm, that runs in optimal $O(nm)$ time.