论文标题
扩展GFF3规范以改善基因组数据的互操作性的建议
Recommendations for extending the GFF3 specification for improved interoperability of genomic data
论文作者
论文摘要
GFF3格式是代表基因或其他映射特征的结构和功能(https://github.com/the-sequence-sequence-yoncence-oncorence-ontology/specifications/blob/master/master/gff3.md)的常见,灵活的选项卡 - 删除格式。但是,随着注释数据的重复使用,这种灵活性已成为标准化下游处理的障碍。在GFF3格式中导出注释的通用软件包模型相同的数据和元数据在不同的符号中,这使最终用户负担负担来解释数据模型。 Agbiodata联盟是一组基因组学,遗传学和育种数据库,以及致力于共同实践和标准的合作伙伴。提供有关生成GFF3的具体准则,并为最常见的生物数据类型创建标准表示,将为AGBIODATA数据库和在日常操作中使用GFF3格式的基因组研究界提供效率的重大提高。 Agbiodata GFF3工作组已开发出建议以解决GFF3格式的常见问题。我们建议每个GFF3字段以及建模功能注释的特殊情况以及标准蛋白质编码基因的改进。我们欢迎进一步讨论这些建议。我们要求基因组学和生物信息学界利用GitHub存储库(https://github.com/nal-i5k/agbiodata_gff3_recommendation)通过问题或拉请请求提供反馈。
The GFF3 format is a common, flexible tab-delimited format representing the structure and function of genes or other mapped features (https://github.com/The-Sequence-Ontology/Specifications/blob/master/gff3.md). However, with increasing re-use of annotation data, this flexibility has become an obstacle for standardized downstream processing. Common software packages that export annotations in GFF3 format model the same data and metadata in different notations, which puts the burden on end-users to interpret the data model. The AgBioData consortium is a group of genomics, genetics and breeding databases and partners working towards shared practices and standards. Providing concrete guidelines for generating GFF3, and creating a standard representation of the most common biological data types would provide a major increase in efficiency for AgBioData databases and the genomics research community that use the GFF3 format in their daily operations. The AgBioData GFF3 working group has developed recommendations to solve common problems in the GFF3 format. We suggest improvements for each of the GFF3 fields, as well as the special cases of modeling functional annotations, and standard protein-coding genes. We welcome further discussion of these recommendations. We request the genomics and bioinformatics community to utilize the github repository (https://github.com/NAL-i5K/AgBioData_GFF3_recommendation) to provide feedback via issues or pull requests.