Editor's Note: Dr. Claire Castellano (she/her/hers) is a resident physician in pediatrics at the Children’s Hospital of Philadelphia. In addition to her MD, Claire has a Master’s in Public Health, focusing on global epidemiology. Claire hopes to combine her interests in medical education and global health in her career as a pediatrician. -Rachel Y. Moon, MD, Associate Editor, Digital Media, Pediatrics
Artificial intelligence (AI) has many potential benefits, but how does it fare with the more “human” task of communication with the diversity of feeling humans? The onset of ChatGPT, an AI chatbot based on advanced large language models, has changed the landscape of AI, offering real alternatives to communication, including medical writing and clinical decision making, since its onset in 2022. But how does it fare in delivering “patient-specific, accurate, empathetic, and understandable answers” to families?
R. Brandon Hunter, MD, and colleagues from Baylor College of Medicine, Emory University, Harvard Medical School, and the University of Pennsylvania aim to address this question in their cross-sectional feasibility study, “Using ChatGPT to Provide Patient-Specific Answers to Parental Questions in the PICU,” being early released in Pediatrics this week (10.1542/peds.2024-066615).
The authors creatively designed their study, crafting:
- Three clinical scenarios common to the PICU: respiratory failure, septic shock, and status epilepticus, complete with assessment and plans
- Eight typical parental questions: asking about pathophysiology, diagnosis, management, and prognosis
The authors instructed ChatGPT-4 to answer the same questions to each scenario, acting as the PICU doctor. Six PICU physicians rated each response by the following:
- Accuracy: graded 1-6, with 6 being completely accurate
- Completeness: graded yes versus no
- Empathy: graded 1-6 based on how much the response acknowledged the family’s feelings or perspective, with 6 being completely empathetic
- Understandability: graded using a defined, reproducible patient education assessment tool and a calculated Flesch-Kincaid grade level, with a 6th grade reading level being the goal.
Overall, the ChatGPT responses had:
- Patient-specific information: present in every response and in 31% of the sentences within a response.
- Additional information to enhance understanding: 59% of sentences explained why medications were used, even when not directly asked to do so.
- Nearly complete accuracy: median score 5 (80–99% accurate), with fewer than 3% of responses rated as more inaccurate than accurate and none deemed to cause harm.
- Very Complete: 97% scored yes, answered completely.
- Mostly Empathetic: median score 5 (80–99% empathetic) and all graded as at least more empathetic than not.
- Very high understandability: 100% based on assessment tool, and at an 8–9th grade reading level.
These results suggest that ChatGPT not only has an ability to accurately convey information in a pediatric subspecialty, but also to do so in a way that may “addres[s] the emotional needs of parents in the PICU,” by including patient-specific, empathetic statements at an appropriate level. Although the grading criteria were specific, it is important to note the limitations in having fictional scenarios grading by PICU physicians. However, when compared to a generic, jargon-ridden online webpage with questionable accuracy, these results suggest that ChatGPT or other large language models may offer an exciting alternative to help support families as they navigate the often scary, confusing, and uncertain healthcare landscape.