Academic Journal

Development and Evaluation of a GPT4-Based Orofacial Pain Clinical Decision Support System

التفاصيل البيبلوغرافية
العنوان: Development and Evaluation of a GPT4-Based Orofacial Pain Clinical Decision Support System
المؤلفون: Charlotte Vueghs, Hamid Shakeri, Tara Renton, Frederic Van der Cruyssen
المصدر: Diagnostics, Vol 14, Iss 24, p 2835 (2024)
بيانات النشر: MDPI AG, 2024.
سنة النشر: 2024
المجموعة: LCC:Medicine (General)
مصطلحات موضوعية: validation, development, large language model, GPT4, clinical decision support system, Medicine (General), R5-920
الوصف: Background: Orofacial pain (OFP) encompasses a complex array of conditions affecting the face, mouth, and jaws, often leading to significant diagnostic challenges and high rates of misdiagnosis. Artificial intelligence, particularly large language models like GPT4 (OpenAI, San Francisco, CA, USA), offers potential as a diagnostic aid in healthcare settings. Objective: To evaluate the diagnostic accuracy of GPT4 in OFP cases as a clinical decision support system (CDSS) and compare its performance against treating clinicians, expert evaluators, medical students, and general practitioners. Methods: A total of 100 anonymized patient case descriptions involving diverse OFP conditions were collected. GPT4 was prompted to generate primary and differential diagnoses for each case using the International Classification of Orofacial Pain (ICOP) criteria. Diagnoses were compared to gold-standard diagnoses established by treating clinicians, and a scoring system was used to assess accuracy at three hierarchical ICOP levels. A subset of 24 cases was also evaluated by two clinical experts, two final-year medical students, and two general practitioners for comparative analysis. Diagnostic performance and interrater reliability were calculated. Results: GPT4 achieved the highest accuracy level (ICOP level 3) in 38% of cases, with an overall diagnostic performance score of 157 out of 300 points (52%). The model provided accurate differential diagnoses in 80% of cases (400 out of 500 points). In the subset of 24 cases, the model’s performance was comparable to non-expert human evaluators but was surpassed by clinical experts, who correctly diagnosed 54% of cases at level 3. GPT4 demonstrated high accuracy in specific categories, correctly diagnosing 81% of trigeminal neuralgia cases at level 3. Interrater reliability between GPT4 and human evaluators was low (κ = 0.219, p < 0.001), indicating variability in diagnostic agreement. Conclusions: GPT4 shows promise as a CDSS for OFP by improving diagnostic accuracy and offering structured differential diagnoses. While not yet outperforming expert clinicians, GPT4 can augment diagnostic workflows, particularly in primary care or educational settings. Effective integration into clinical practice requires adherence to rigorous guidelines, thorough validation, and ongoing professional oversight to ensure patient safety and diagnostic reliability.
نوع الوثيقة: article
وصف الملف: electronic resource
اللغة: English
تدمد: 2075-4418
Relation: https://www.mdpi.com/2075-4418/14/24/2835; https://doaj.org/toc/2075-4418
DOI: 10.3390/diagnostics14242835
URL الوصول: https://doaj.org/article/ff5c26270dec4818bab241fc94b2f415
رقم الانضمام: edsdoj.ff5c26270dec4818bab241fc94b2f415
قاعدة البيانات: Directory of Open Access Journals
الوصف
تدمد:20754418
DOI:10.3390/diagnostics14242835