Academic Journal

Accuracy Analysis of the End-to-End Extraction of Related Named Entities from Russian Drug Review Texts by Modern Approaches Validated on English Biomedical Corpora

التفاصيل البيبلوغرافية
العنوان: Accuracy Analysis of the End-to-End Extraction of Related Named Entities from Russian Drug Review Texts by Modern Approaches Validated on English Biomedical Corpora
المؤلفون: Alexander Sboev, Roman Rybka, Anton Selivanov, Ivan Moloshnikov, Artem Gryaznov, Alexander Naumov, Sanna Sboeva, Gleb Rylkov, Soyora Zakirova
المصدر: Mathematics, Vol 11, Iss 2, p 354 (2023)
بيانات النشر: MDPI AG, 2023.
سنة النشر: 2023
المجموعة: LCC:Mathematics
مصطلحات موضوعية: Russian Drug Review Corpus, deep learning, language models, named-entity recognition, relation extraction, joint model, Mathematics, QA1-939
الوصف: An extraction of significant information from Internet sources is an important task of pharmacovigilance due to the need for post-clinical drugs monitoring. This research considers the task of end-to-end recognition of pharmaceutically significant named entities and their relations in texts in natural language. The meaning of “end-to-end” is that both of the tasks are performed within a single process on the “raw” text without annotation. The study is based on the current version of the Russian Drug Review Corpus—a dataset of 3800 review texts from the Russian segment of the Internet. Currently, this is the only corpus in the Russian language appropriate for research of the mentioned type. We estimated the accuracy of the recognition of the pharmaceutically significant entities and their relations in two approaches based on neural-network language models. The first core approach is to sequentially solve tasks of named-entities recognition and relation extraction (the sequential approach). The second one solves both tasks simultaneously with a single neural network (the joint approach). The study includes a comparison of both approaches, along with the hyperparameters selection to maximize resulting accuracy. It is shown that both approaches solve the target task at the same level of accuracy: 52–53% macro-averaged F1-score, which is the current level of accuracy for “end-to-end” tasks on the Russian language. Additionally, the paper presents the results for English open datasets ADE and DDI based on the joint approach, and hyperparameter selection for the modern domain-specific language models. The result is that the achieved accuracies of 84.2% (ADE) and 73.3% (DDI) are comparable or better than other published results for the datasets.
نوع الوثيقة: article
وصف الملف: electronic resource
اللغة: English
تدمد: 2227-7390
Relation: https://www.mdpi.com/2227-7390/11/2/354; https://doaj.org/toc/2227-7390
DOI: 10.3390/math11020354
URL الوصول: https://doaj.org/article/ebe029668f60475aa64c7f3482f810ac
رقم الانضمام: edsdoj.be029668f60475aa64c7f3482f810ac
قاعدة البيانات: Directory of Open Access Journals
الوصف
تدمد:22277390
DOI:10.3390/math11020354