Academic Journal

End-to-end pseudonymization of fine-tuned clinical BERT models

التفاصيل البيبلوغرافية
العنوان: End-to-end pseudonymization of fine-tuned clinical BERT models
المؤلفون: Thomas Vakili, Aron Henriksson, Hercules Dalianis
المصدر: BMC Medical Informatics and Decision Making, Vol 24, Iss 1, Pp 1-15 (2024)
بيانات النشر: BMC, 2024.
سنة النشر: 2024
المجموعة: LCC:Computer applications to medicine. Medical informatics
مصطلحات موضوعية: Natural language processing, Language models, BERT, Electronic health records, Clinical text, De-identification, Computer applications to medicine. Medical informatics, R858-859.7
الوصف: Abstract Many state-of-the-art results in natural language processing (NLP) rely on large pre-trained language models (PLMs). These models consist of large amounts of parameters that are tuned using vast amounts of training data. These factors cause the models to memorize parts of their training data, making them vulnerable to various privacy attacks. This is cause for concern, especially when these models are applied in the clinical domain, where data are very sensitive. Training data pseudonymization is a privacy-preserving technique that aims to mitigate these problems. This technique automatically identifies and replaces sensitive entities with realistic but non-sensitive surrogates. Pseudonymization has yielded promising results in previous studies. However, no previous study has applied pseudonymization to both the pre-training data of PLMs and the fine-tuning data used to solve clinical NLP tasks. This study evaluates the effects on the predictive performance of end-to-end pseudonymization of Swedish clinical BERT models fine-tuned for five clinical NLP tasks. A large number of statistical tests are performed, revealing minimal harm to performance when using pseudonymized fine-tuning data. The results also find no deterioration from end-to-end pseudonymization of pre-training and fine-tuning data. These results demonstrate that pseudonymizing training data to reduce privacy risks can be done without harming data utility for training PLMs.
نوع الوثيقة: article
وصف الملف: electronic resource
اللغة: English
تدمد: 1472-6947
Relation: https://doaj.org/toc/1472-6947
DOI: 10.1186/s12911-024-02546-8
URL الوصول: https://doaj.org/article/1c98eef1d97f416a9826056d2fc27306
رقم الانضمام: edsdoj.1c98eef1d97f416a9826056d2fc27306
قاعدة البيانات: Directory of Open Access Journals
الوصف
تدمد:14726947
DOI:10.1186/s12911-024-02546-8