Detection of Recurrent Ventricular Tachycardia in Medical Notes Using Natural Language Processing with Synthetic Notes
Purpose: We aim to overcome challenges posed by inconsistent clinical terminology and ambiguous documentation by using ChatGPT-generated synthetic notes and standardizing medical synonyms, thereby enhancing Natural Language Processing (NLP) accuracy for detecting recurrent ventricular tachycardia (VT) in Electronic Health Records (EHR).
Material and Methods: This study analyzed 499 full-text clinical notes (474.6 ± 164.3 words) from 125 patients (32.0% female, LVEF 48.9 ± 13.9%, age 61.0 ± 14.0 years) using Bio_ClinicalBERT to establish a baseline performance. Experiment 1 standardized VT terms with expert keywords, such as unifying “TdP”, “V>A”, and “Vtach” as “VT”. Experiment 2 augmented the positive class (recurrent VT expert-labeled as 'Yes') with GPT-3.5 Turbo from 33 to 99, reducing class imbalance from 14.1:1 to 4.7:1. Experiment 3 combined both methods for a comprehensive evaluation.
Results: The baseline performance of Bio_ClinicalBERT had an accuracy of 70.1% (13.3% f1; 8.2% precision, 39.0% recall). This increased using standardized VT synonyms (78.8% accuracy, 15.1% f1; 10.4% precision, 28.7% recall; p< 0.05), and ChatGPT augmentation (80.8% accuracy, 38.8% f1; 42.4% precision, 35.7% recall; p< 0.05). The optimal performance was achieved by ChatGPT augmentation plus standardized VT synonyms (82.8% accuracy, 46.0% f1; 52.7% precision, 42.1% recall; p< 0.05).
Conclusions: Using generative AI augmentation and standardized medical synonyms, we improved the ability of NLP models to identify recurrent VT in EHR. This method could refine automated diagnostics and aid patient comprehension of their medical records. Future studies will assess the potential of generative AI in streamlining medical terminology standardization and its application to various clinical endpoints.