论文标题

生物多样性研究中的数据集搜索:数据存储库中的元数据是否反映了学术信息需求?

Dataset Search In Biodiversity Research: Do Metadata In Data Repositories Reflect Scholarly Information Needs?

论文作者

Löffler, Felicitas, Wesp, Valentin, König-Ries, Birgitta, Klan, Friederike

论文摘要

越来越多的研究数据为链接和集成数据提供了创建新假设,重复实验或将最新数据与在不同时间或地点收集的数据进行比较的机会。但是,最近的研究表明,检索数据重用的相关数据是日常研究实践中耗时的任务。在这项研究中,我们探讨了生物多样性研究中的数据集检索的原因,该领域会产生大量的异质数据。我们分析了数据集搜索中的主要来源 - 元数据 - 并确定它们是否反映了学术搜索兴趣。我们检查元数据标准是否提供了与搜索兴趣相对应的要素,我们检查选定的数据存储库是否使用代表学术兴趣的元数据标准,我们确定填充了使用的元数据标准的数量。为了确定生物多样性研究中的搜索兴趣,我们收集了169个问题,研究人员的目的是在检索到的数据,确定的生物实体并将其分为13个类别的帮助下回答。我们的发现表明,环境,材料和化学物质,物种,生物学和化学过程,位置,数据参数和数据类型是生物多样性研究中的重要搜索兴趣。与现有元数据标准的比较表明,特定于域的标准很好地涵盖了搜索兴趣,而一般标准并不明确包含反映搜索兴趣的元素。我们从五个大数据存储库中检查元数据。我们的结果证实,目前元数据反映了生物多样性研究中的搜索兴趣。从这些发现中,我们为研究人员和数据存储库提出了建议,如何弥合搜索兴趣和所提供的元数据之间的差距。

The increasing amount of research data provides the opportunity to link and integrate data to create novel hypotheses, to repeat experiments or to compare recent data to data collected at a different time or place. However, recent studies have shown that retrieving relevant data for data reuse is a time-consuming task in daily research practice. In this study, we explore what hampers dataset retrieval in biodiversity research, a field that produces a large amount of heterogeneous data. We analyze the primary source in dataset search - metadata - and determine if they reflect scholarly search interests. We examine if metadata standards provide elements corresponding to search interests, we inspect if selected data repositories use metadata standards representing scholarly interests, and we determine how many fields of the metadata standards used are filled. To determine search interests in biodiversity research, we gathered 169 questions that researchers aimed to answer with the help of retrieved data, identified biological entities and grouped them into 13 categories. Our findings indicate that environments, materials and chemicals, species, biological and chemical processes, locations, data parameters and data types are important search interests in biodiversity research. The comparison with existing metadata standards shows that domain-specific standards cover search interests quite well, whereas general standards do not explicitly contain elements that reflect search interests. We inspect metadata from five large data repositories. Our results confirm that metadata currently poorly reflect search interests in biodiversity research. From these findings, we derive recommendations for researchers and data repositories how to bridge the gap between search interest and metadata provided.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源