论文标题

Sopang 2:在线搜索泛基因组,没有误报

SOPanG 2: online searching over a pan-genome without false positives

论文作者

Cisłak, Aleksander, Grabowski, Szymon

论文摘要

动机:泛基因组可以存储为弹性脱位(ED)字符串,这是最近引入的多个重叠序列的紧凑表示。但是,对ED字符串的搜索并未指示哪个个人(如果有)匹配整个查询。 结果:我们使用源(个人的索引)增强ED字符串,并提出Sopang(pan-genome)工具的扩展,以仅报告真正的积极匹配,从而省略了任何单倍型中没有发生的匹配。检查比赛的额外阶段在实践中的相对速度小于3.5%的罚款,这意味着Sopang 2能够以泛基因组报告模式匹配,并以实际数据的430 MB/s的单线程吞吐量将其映射到个体上。 可用性和实施​​:Sopang 2可以在此处下载:github.com/mralexsee/sopang

Motivation: The pan-genome can be stored as elastic-degenerate (ED) string, a recently introduced compact representation of multiple overlapping sequences. However, a search over the ED string does not indicate which individuals (if any) match the entire query. Results: We augment the ED string with sources (individuals' indexes) and propose an extension of the SOPanG (Shift-Or for Pan-Genome) tool to report only true positive matches, omitting those not occurring in any of the haplotypes. The additional stage for checking the matches yields a penalty of less than 3.5% relative speed in practice, which means that SOPanG 2 is able to report pattern matches in a pan-genome, mapping them onto individuals, at the single-thread throughput of above 430 MB/s on real data. Availability and implementation: SOPanG 2 can be downloaded here: github.com/MrAlexSee/sopang

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源