Academic Journal

The Potential Clinical Utility of the Customized Large Language Model in Gastroenterology: A Pilot Study

التفاصيل البيبلوغرافية
العنوان: The Potential Clinical Utility of the Customized Large Language Model in Gastroenterology: A Pilot Study
المؤلفون: Eun Jeong Gong, Chang Seok Bang, Jae Jun Lee, Jonghyung Park, Eunsil Kim, Subeen Kim, Minjae Kimm, Seoung-Ho Choi
المصدر: Bioengineering, Vol 12, Iss 1, p 1 (2024)
بيانات النشر: MDPI AG, 2024.
سنة النشر: 2024
المجموعة: LCC:Technology
LCC:Biology (General)
مصطلحات موضوعية: large language model, artificial intelligence, gastroenterology, Technology, Biology (General), QH301-705.5
الوصف: Background: The large language model (LLM) has the potential to be applied to clinical practice. However, there has been scarce study on this in the field of gastroenterology. Aim: This study explores the potential clinical utility of two LLMs in the field of gastroenterology: a customized GPT model and a conventional GPT-4o, an advanced LLM capable of retrieval-augmented generation (RAG). Method: We established a customized GPT with the BM25 algorithm using Open AI’s GPT-4o model, which allows it to produce responses in the context of specific documents including textbooks of internal medicine (in English) and gastroenterology (in Korean). Also, we prepared a conventional ChatGPT 4o (accessed on 16 October 2024) access. The benchmark (written in Korean) consisted of 15 clinical questions developed by four clinical experts, representing typical questions for medical students. The two LLMs, a gastroenterology fellow, and an expert gastroenterologist were tested to assess their performance. Results: While the customized LLM correctly answered 8 out of 15 questions, the fellow answered 10 correctly. When the standardized Korean medical terms were replaced with English terminology, the LLM’s performance improved, answering two additional knowledge-based questions correctly, matching the fellow’s score. However, judgment-based questions remained a challenge for the model. Even with the implementation of ‘Chain of Thought’ prompt engineering, the customized GPT did not achieve improved reasoning. Conventional GPT-4o achieved the highest score among the AI models (14/15). Although both models performed slightly below the expert gastroenterologist’s level (15/15), they show promising potential for clinical applications (scores comparable with or higher than that of the gastroenterology fellow). Conclusions: LLMs could be utilized to assist with specialized tasks such as patient counseling. However, RAG capabilities by enabling real-time retrieval of external data not included in the training dataset, appear essential for managing complex, specialized content, and clinician oversight will remain crucial to ensure safe and effective use in clinical practice.
نوع الوثيقة: article
وصف الملف: electronic resource
اللغة: English
تدمد: 2306-5354
Relation: https://www.mdpi.com/2306-5354/12/1/1; https://doaj.org/toc/2306-5354
DOI: 10.3390/bioengineering12010001
URL الوصول: https://doaj.org/article/20b736aa29e344ec98f5dc3bd81f9d87
رقم الانضمام: edsdoj.20b736aa29e344ec98f5dc3bd81f9d87
قاعدة البيانات: Directory of Open Access Journals
الوصف
تدمد:23065354
DOI:10.3390/bioengineering12010001