Academic Journal

Open science products.

التفاصيل البيبلوغرافية
العنوان: Open science products.
المؤلفون: Heidi J. Imker, Kenneth E. Schackart III, Ana-Maria Istrate, Charles E. Cook
سنة النشر: 2023
مصطلحات موضوعية: Molecular Biology, Pharmacology, Evolutionary Biology, Ecology, Marine Biology, Science Policy, Plant Biology, Biological Sciences not elsewhere classified, Information Systems not elsewhere classified, minimal human intervention, includes automated pipelines, bidirectional encoder representations, 2011 &# 8211, enabled incredible research, resources archive difficult, global biodata coalition, many biodata resources, identify biodata resources, biodata resource inventory, biodata resource, biodata resources, research funders, resource infrastructure, global inventory, global infrastructure, individual resources, data resources, value aggregation, sustained support, scientific literature
الوصف: Modern biological research depends on data resources. These resources archive difficult-to-reproduce data and provide added-value aggregation, curation, and analyses. Collectively, they constitute a global infrastructure of biodata resources. While the organic proliferation of biodata resources has enabled incredible research, sustained support for the individual resources that make up this distributed infrastructure is a challenge. The Global Biodata Coalition (GBC) was established by research funders in part to aid in developing sustainable funding strategies for biodata resources. An important component of this work is understanding the scope of the resource infrastructure; how many biodata resources there are, where they are, and how they are supported. Existing registries require self-registration and/or extensive curation, and we sought to develop a method for assembling a global inventory of biodata resources that could be periodically updated with minimal human intervention. The approach we developed identifies biodata resources using open data from the scientific literature. Specifically, we used a machine learning-enabled natural language processing approach to identify biodata resources from titles and abstracts of life sciences publications contained in Europe PMC. Pretrained BERT (Bidirectional Encoder Representations from Transformers) models were fine-tuned to classify publications as describing a biodata resource or not and to predict the resource name using named entity recognition. To improve the quality of the resulting inventory, low-confidence predictions and potential duplicates were manually reviewed. Further information about the resources were then obtained using article metadata, such as funder and geolocation information. These efforts yielded an inventory of 3112 unique biodata resources based on articles published from 2011–2021. The code was developed to facilitate reuse and includes automated pipelines. All products of this effort are released under permissive licensing, including ...
نوع الوثيقة: article in journal/newspaper
اللغة: unknown
Relation: https://figshare.com/articles/journal_contribution/Open_science_products_/24652969
DOI: 10.1371/journal.pone.0294812.s011
الاتاحة: https://doi.org/10.1371/journal.pone.0294812.s011
https://figshare.com/articles/journal_contribution/Open_science_products_/24652969
Rights: CC BY 4.0
رقم الانضمام: edsbas.E86D05BF
قاعدة البيانات: BASE
الوصف
DOI:10.1371/journal.pone.0294812.s011