الوصف: |
ImportanceElectronic health records (EHR) data are growing in importance as a source of evidence on real-world treatment effects. However, many clinical important measures are not directly captured as structured data by these systems, limiting their utility for research and quality improvement. Although this information can usually be manually abstracted from clinical notes, this process is expensive and subject to variability. Natural language processing (NLP) is a scalable alternative but has historically been subject to multiple limitations including insufficient accuracy, data hunger, technical complexity, poor generalizability, algorithmic unfairness, and an outsized carbon footprint.ObjectiveCompare different algorithmic approaches for classifying colonoscopy reports according to their ulcerative colitis Mayo endoscopic subscoresDesignOther observational study – NLP algorithm development and validationSettingAcademic medical center (UCSF) and safety-net hospital (ZSFG) in CaliforniaParticipantsPatients with ulcerative colitisExposuresColonoscopyMain Outcomes and MeasuresThe primary outcome was accuracy in identifying reports suitable for Mayo subscoring (binary yes/no) and then separately assigning a Mayo subscore where relevant (ordinal). Secondary outcomes included learning efficiency from training data, generalizability, computational costs, fairness, and sustainability.ResultsUsing automated machine learning (autoML) we trained a pair of classifiers that were 98% [91-99%] accurate at determining which reports to score and 97% [88-99%] accurate at assigning the correct Mayo endoscopic subscore. The binary classifiers trained on UCSF data achieved 96% accuracy on hold-out test data from ZSFG. Training these classifiers required 4 hours of computation on a standard laptop. Classification errors were not associated with either gender or area deprivation index. The carbon footprint of this approach was 24x less than current deep learning algorithms for clinical text classification.Conclusions and RelevanceWe identified autoML as an efficient and robust method for training clinical text classifiers. AutoML-trained classifiers demonstrated many favorable properties including generalizability, limited effort needed for data annotation and algorithm training, fairness, and sustainability. More generally, these results support the feasibility of using unstructured EHR data to generate real-world evidence and drive continuous improvements in learning health systems.Key PointsQuestionIs natural language processing (NLP) a viable alternative to manually abstracting disease activity from procedure notes?FindingsWe compared different methods for abstracting the ulcerative colitis Mayo endoscopic subscore from colonoscopy reports. Classifiers trained using automated machine learning (autoML) achieved the greatest accuracy (97%), recognized when to abstain, generalized well to other health systems, required limited effort for annotation and programming, demonstrated fairness, and had a small carbon footprint.MeaningNLP methods like autoML appear to be sufficiently mature technologies for clinical text classification, and thus are poised to enable many downstream endeavors using electronic health records data. |