A clinical trials corpus annotated with UMLS entities to enhance the access to evidence-based medicine

التفاصيل البيبلوغرافية
العنوان:	A clinical trials corpus annotated with UMLS entities to enhance the access to evidence-based medicine
المؤلفون:	Leonardo Campillos-Llanos, Ana Valverde-Mateos, Adrián Capllonch-Carrión, Antonio Moreno-Sandoval
المصدر:	BMC Medical Informatics and Decision Making, Vol 21, Iss 1, Pp 1-19 (2021)
بيانات النشر:	BMC, 2021.
سنة النشر:	2021
المجموعة:	LCC:Computer applications to medicine. Medical informatics
مصطلحات موضوعية:	Clinical Trials, Evidence-Based Medicine, Semantic Annotation, Inter-Annotator Agreement, Natural Language Processing, Computer applications to medicine. Medical informatics, R858-859.7
الوصف:	Abstract Background The large volume of medical literature makes it difficult for healthcare professionals to keep abreast of the latest studies that support Evidence-Based Medicine. Natural language processing enhances the access to relevant information, and gold standard corpora are required to improve systems. To contribute with a new dataset for this domain, we collected the Clinical Trials for Evidence-Based Medicine in Spanish (CT-EBM-SP) corpus. Methods We annotated 1200 texts about clinical trials with entities from the Unified Medical Language System semantic groups: anatomy (ANAT), pharmacological and chemical substances (CHEM), pathologies (DISO), and lab tests, diagnostic or therapeutic procedures (PROC). We doubly annotated 10% of the corpus and measured inter-annotator agreement (IAA) using F-measure. As use case, we run medical entity recognition experiments with neural network models. Results This resource contains 500 abstracts of journal articles about clinical trials and 700 announcements of trial protocols (292 173 tokens). We annotated 46 699 entities (13.98% are nested entities). Regarding IAA agreement, we obtained an average F-measure of 85.65% (±4.79, strict match) and 93.94% (±3.31, relaxed match). In the use case experiments, we achieved recognition results ranging from 80.28% (±00.99) to 86.74% (±00.19) of average F-measure. Conclusions Our results show that this resource is adequate for experiments with state-of-the-art approaches to biomedical named entity recognition. It is freely distributed at: http://www.lllf.uam.es/ESP/nlpmedterm_en.html . The methods are generalizable to other languages with similar available sources.
نوع الوثيقة:	article
وصف الملف:	electronic resource
اللغة:	English
تدمد:	1472-6947
Relation:	https://doaj.org/toc/1472-6947
DOI:	10.1186/s12911-021-01395-z
URL الوصول:	https://doaj.org/article/23ec51d897d64bcd879152e951fea49c
رقم الانضمام:	edsdoj.23ec51d897d64bcd879152e951fea49c
قاعدة البيانات:	Directory of Open Access Journals

View record in DOAJ

Full Text Finder

الوصف
تدمد:	14726947
DOI:	10.1186/s12911-021-01395-z