Artificial intelligence (AI) technologies are increasingly used in pediatrics and have the potential to help inpatient physicians provide high-quality care for critically ill children.
We aimed to describe the use of AI to improve any health outcome(s) in neonatal and pediatric intensive care.
PubMed, IEEE Xplore, Cochrane, and Web of Science databases.
We used peer-reviewed studies published between June 1, 2010, and May 31, 2020, in which researchers described (1) AI, (2) pediatrics, and (3) intensive care. Studies were included if researchers assessed AI use to improve at least 1 health outcome (eg, mortality).
Data extraction was conducted independently by 2 researchers. Articles were categorized by direct or indirect impact of AI, defined by the European Institute of Innovation and Technology Health joint report.
Of the 287 publications screened, 32 met inclusion criteria. Approximately 22% (n = 7) of studies revealed a direct impact and improvement in health outcomes after AI implementation. Majority were in prototype testing, and few were deployed into an ICU setting. Among the remaining 78% (n = 25) AI models outperformed standard clinical modalities and may have indirectly influenced patient outcomes. Quantitative assessment of health outcomes using statistical measures, such as area under the receiver operating curve (56%; n = 18) and specificity (38%; n = 12), revealed marked heterogeneity in metrics and standardization.
Few studies have revealed that AI has directly improved health outcomes for pediatric critical care patients. Further prospective, experimental studies are needed to assess AI’s impact by using established implementation frameworks, standardized metrics, and validated outcome measures.
Artificial intelligence (AI) is making inroads into health care with a mix of hope and hype. AI is broadly defined as a computer program or intelligent system capable of mimicking human cognitive function.1 It operates with predefined rules, relies on if-then statements, and uses models derived from statistical analyses of large data sets. Historically, AI has had impressive achievements, such as assisting humans drive cars and conduct surveillance with aerial drones.2 More recently, the US Food and Drug Administration has approved several AI-based products, signaling the entry of AI into health care.3,4
AI encompasses several different technologies, of which machine learning (ML) has the most immediate relevance to the health care field. ML relies on structured or unstructured data to identify hidden information.5 Structured data can be easily organized into predefined structures, such as an excel worksheet, whereas unstructured data are not readily organized by using predefined structures (eg, clinical notes). ML is used to develop either “locked” or “adaptive” software algorithms to augment decision support systems.6 Thus far, the vast majority of AI systems approved by the US Food and Drug Administration are based on locked algorithms, which generate the same result each time for the same input. Decision support systems assist clinicians in analyzing large amounts of information by identifying potential health risks and improving diagnostic accuracy.7–10 Other AI technologies, such as natural language processing (NLP), help computers understand and interpret human language.11 NLP converts complex, unstructured data, such as clinical notes, into structured information that can be used to help clinicians make more effective decisions12 and improve learning from adverse events.13,14
Hospitals are ideal settings for AI use. Hospitalized children are at an increased risk of rapid and fatal decompensation, and they rely on clinicians to quickly and effectively process large volumes of medical information to diagnose and treat severe diseases. AI has the potential to help inpatient teams process these big data sets, reduce the burden of medical decision-making, and ultimately, facilitate higher quality care while improving patient safety.15 Currently, there are several applications of AI in pediatric general care and neonatal and pediatric ICUs. For example, AI-based models have been used to predict critical diagnoses, such as hemodynamic shock,16 cardiac arrest,17,18 and traumatic brain injury.19,20 Other models have helped intensivists diagnose conditions like sepsis21–24 and cognitive dysfunction,25,26 and improve survival outcomes27–29 and baseline health status30–32 among neonates.
Although recent reviews have focused on the increasing use of AI in pediatrics in general33 and AI in critical care more broadly,34,35 there is a need to identify the use of AI technologies, particularly ML and NLP, that improve health outcomes for critically ill children.2,15,36,37 In this systematic review, we aim to describe the use of AI to improve any health outcome for NICU and PICU patients. We then assess readiness for real-world use and discuss potential barriers to implementation and areas for future investigation.
Methods
Protocol Registration
Information Sources and Search Strategy
We searched the PubMed, IEEE Xplore, Cochrane, and Web of Science databases to identify available, peer-reviewed articles within the scope and eligibility criteria of this systematic review. All databases were last accessed in January 2021. Search terms and phrases were selected to identify studies in which researchers described (1) AI, (2) pediatrics and (3) intensive care. Specifically, AI terms included deep learning, reinforcement learning, and supervised or unsupervised ML–supervised ML learning models are trained by using labeled structure data typically for classification and regression type problems, and unsupervised models are used to extract patterns from unlabeled data typically for clustering analysis. Keywords were initially chosen on the basis of a preliminary analysis of the literature and Medical Subject Headings (MeSH) terms. They were then modified on the basis of feedback from subject experts as well as our institution’s librarian. Our completed search strategy is shown in Supplemental Table 3.
Inclusion and Exclusion Criteria
We included articles that satisfied the following criteria: (1) use of AI algorithms in a NICU or PICU setting, (2) description of the impact of AI on at least 1 patient health outcome (eg, mortality), (3) pediatric-only populations (defined as patients <18 years of age) and (4) publications printed in English from June 1, 2010, to May 31, 2020. Studies involving adult patients (defined as >18 years of age) or that included secondary or gray literature (eg, editorials) were excluded.
Study Selection and Quality Assurance
Using a multidatabase query, we identified 287 publications, of which 32 met inclusion criteria after the title, abstract, and full-text review (Fig 1). After removing duplicates, 2 authors (C.A. and A.C.) independently evaluated all publications for eligibility. The initial title and abstract screening of each publication was followed by a full-text review. To minimize selection bias, all discrepancies were resolved in a stepwise fashion, through discussion, and required consensus from 2 additional authors (O.A. and M.K.).
Two authors (C.A. and A.C.) used a data abstraction form to independently record details from each publication, including study aim(s) and/or objective(s), design, setting, patient population, disease process and health outcome(s), AI type and area of impact. We also captured data types and sources, attributes defining the AI classification and learning types, and whether results were cross-validated and compared with clinicians. Studies were then categorized based on AI impact on outcomes (direct or indirect) as well as by impact focus area, all defined below.
The studies in our review had marked heterogeneity in design and outcomes. Coupled with the multiple classification and mixed learning types, this level of heterogeneity precluded our ability to combine results. Given that statistical and study heterogeneity reflects a variability in outcomes so great that it cannot be explained by measurement error alone, a meta-analysis of pooled data was not performed.
Defining AI’s Impact on Health Outcomes
The impact of AI on health outcomes was assessed by using a definition and focus area framework developed by the European Institute of Innovation and Technology (EIT) Health joint report.40 In this report, the authors defined direct impact as AI that augmented any patient health outcome after clinical diagnosis or that aided physicians by allowing them to “spend more time in direct patient care (while reducing provider burnout).”40 We derived the term indirect impact as the inverse of this concept and defined it as AI that did not directly, or actively, change the patient health outcome after clinical diagnosis or aid physicians in spending more time in direct care. One such example involves early detection of critical events in infants with single-ventricle physiology between first- and second-stage cardiac repair.41 Although this AI modality had a “statistically significant higher performance” compared with expert-defined assessment tools, the authors did not report an improvement in patient health outcomes following its implementation.
Technology Readiness Level
In another European technical report, the authors detail a methodology for assessing AI real-world applications using technology readiness levels (TRLs) (Fig 2). TRL is used to classify any given technology into 9 categories. TRL 1 is assigned to technologies with that have principles observed and reported (lowest level to technology readiness); TRL 2 is assigned to technologies with a ready concept and/or application formulated; TRL 3 identifies technologies with a proof of concept; TRL 4 and 5 are for those technologies with tests carried out in the laboratory or components validated in the relevant environment, respectively; TRL 6 and 7 identifies technologies with a prototype functioning in a relevant environment or operational environment; TRL 8 is for technologies that are completed and qualified through test and demonstration; and finally TRL 9 is for technologies that have been proven through successful mission operations. These 9 TRLs are used to evaluate the stages of AI real-world application ranging from laboratory to operational environments, with research to implementation goals and evaluations from the prototype to deployment phases.42 We classified all included studies using the TRL framework. For AI applications in some stage of bedside or clinical implementation, we further categorized them into the 6 impact focus areas in accordance with the EIT Health joint report: self-care/prevention/wellness, triage and early diagnosis, diagnostics, clinical decision support, care delivery, and chronic management.40
Summary of TRLs according to several characteristics. Reprinted with permission from Martinez-Plumed F, Gomez E, Hernandez-Orallo J. Futures of artificial intelligence through technology readiness levels. Telematics and Informatics. 2021;(58);101525. Reprinted with permission under CC-BY-4.0 license.
Summary of TRLs according to several characteristics. Reprinted with permission from Martinez-Plumed F, Gomez E, Hernandez-Orallo J. Futures of artificial intelligence through technology readiness levels. Telematics and Informatics. 2021;(58);101525. Reprinted with permission under CC-BY-4.0 license.
Results
Study Characteristics
In Table 1, we summarize study characteristics. The majority of study participants were premature infants or newborns hospitalized at tertiary academic centers. In several studies, researchers used AI to provide early diagnosis or improvement of outcomes among diseases like sepsis and severe respiratory infections. Researchers used AI models, particularly deep learning frameworks and neural networks, to enhance survival prediction, hone patient risk stratification, and modify diagnostic approaches. Our analysis included 22 NICUs and 10 PICUs. Approximately one-half (53%; n = 17) of studies took place in ICUs outside of the United States, of which 12 studies originated from high-income countries and 5 from low- to middle-income countries. Among the 32 studies, 14 had a retrospective cohort design, whereas 5 were described by authors as prospective longitudinal, 4 as atypical, 3 as exploratory, 2 as clinical trials, 2 as case-control, and 2 as observational.
Characteristics of Included Studies (N= 32) Categorized by AI’s Direct or Indirect Impact on Health Outcome(s)
Author . | Study Aim/Objective . | Setting . | Type of AI Technology/Algorithm . | Outcome(s) . | Impact . | Focus Area of Impact . | TRL . |
---|---|---|---|---|---|---|---|
Saria et al (2010)30 | Develop a tool that reflects physiologic status and predicts future illness severity in newborns | NICU | PhysiScore | Morbidity and mortality | Direct | Triage and early diagnosis | 7 |
Saadah et al (2014)44 | Identify premature infants who are likely to benefit from palivizumab prophylaxis during nosocomial outbreaks of respiratory syncytial virus | NICU | Artificial neural network | Mortality, days of supplemental oxygen, and length of NICU stay | Direct | Clinical decision support | 7 |
Matic et al (2016)19 | Develop an algorithm to quantify background electroence phalography dynamics in term neonates with hypoxic ischemic encephalopathy | NICU | Least-square SVM | Neurodevelop- mental outcome | Direct | Clinical decision support | 7 |
Caparros-Gonzalez et al (2018)46 | Clarify the effects of a music therapy intervention on the respiratory rate, oxygen saturation, blood pressure, and heart rate of premature infants | NICU | Classification and regression tree | Stress-related state characterized by autonomic changes (eg, heart rate and blood pressure elevations) | Direct | Prevention and wellness | 7 |
He et al (2018)26 | Propose a framework based on resting state fMRI functional connectome data to predict cognitive deficits/outcomes in preterm infants | NICU | Artificial neural network | Cognitive outcome | Direct | Triage and early diagnosis | 7 |
Podda et al (2018)27 | Develop a Preterm Infants Survival Assessment tool to predict survival of very preterm and very low birth wt infants | NICU | Artificial neural network | Mortality before NICU discharge | Direct | Triage and early diagnosis | 9 |
Clark et al (2019) 70 | Use rapid whole-genome sequencing to diagnose genetic diseases to predict mortality and morbidity risk in seriously ill children | NICU/PICU | NLP, decision trees, Bayesian models, and neural networks | Morbidity and mortality | Direct | Diagnostics | 9 |
Temko et al (2011)25 | Present a multichannel patient-independent system to detect seizures, which are often a precursor of brain injury in neonates | NICU | SVM | Brain damage | Indirect | Triage and early diagnosis | 7 |
Wang et al (2013)21 | Establish an approach to identify a minimum set of predictive biomarkers and, ultimately, improve early detection of sepsis in infants | NICU | Sparse SVM and Lasso linear regression | Morbidity and mortality | Indirect | Triage and early diagnosis | 8 |
Chaves, et al (2014)28 | Build a linguistic model to estimate the risk of death in neonates | NICU | Fuzzy logic model | Mortality before hospital discharge | Indirect | Triage and early diagnosis | 9 |
Mani et al (2014)22 | Develop models from “off-the-shelf” medical data to predict late-onset sepsis and, ultimately, decrease the incidence of mortality in neonates | NICU | Naïve Bayes, classification and regression tree, AODE, and RF | Mortality | Indirect | Triage and early diagnosis | 3 |
Wang et al (2014) 71 | Identify diagnostic biomarkers (ie, angiopoietin-1, angiopoietin-2, and bicarbonate) to predict mortality in children with severe sepsis | PICU | SVM | Mortality | Indirect | Clinical decision support | 7 |
Kennedy CE, et al (2015)17 | Build and test cardiac arrest prediction models to measure changes in prediction accuracy among pediatric patients at risk for disability and death | PICU | SVM | Morbidity and mortality | Indirect | Triage and early diagnosis | 8 |
Toltzis et al (2015) 72 | Devise a Crisis Standards of Care triage allocation scheme to determine the need for ventilation, length of stay, and mortality risk in children | PICU | Linear regression | Length of PICU stay, mechanical ventilation, and mortality | Indirect | Care delivery and chronic management | 7 |
Campbell et al (2016) 73 | Use quantitative image analysis to identify vascular features of the retina to diagnose plus disease and possible blindness in premature infants | NICU | iROP | Percentage accuracy of iROP classification of plus disease | Indirect | Triage and early diagnosis | 7 |
Carlin et al (2018)7 4 | Predict individual physiologically acceptable states at PICU discharge to determine the total length of stay | PICU | Recursive neural network | Length of stay | Indirect | Care delivery and chronic management | 8 |
Irles et al (2018)43 | Forecast intestinal perforation related to necrotizing enterocolitis and investigate variables that may predict neurodevelopmental impairment and mortality in neonates | NICU | Artificial neural network | Development of colitis; however, mortality and neuro developmental outcomes mentioned | Indirect | Triage and early diagnosis | 8 |
Lamping et al (2018) 75 | Develop and validate a diagnostic model based on routinely available parameters to discriminate sepsis and noninfectious systemic inflammatory response syndrome in children | PICU | RF | Presence of noninfectious systemic inflammatory response syndrome or sepsis | Indirect | Triage and early diagnosis | 9 |
Shirwaikar et al (2018) 76 | Use classification models to predict the adequacy and effectiveness of caffeine to treat apneic episodes in neonates | NICU | Deep belief network and multilayered perceptron | Apneic episodes, drug effectiveness, and mortality | Indirect | Clinical decision support | 7 |
Williams et al (2018)63 | Explore medical data that may predict the length of stay, use of ventilation and inotropes, and mortality risk in PICU patients | PICU | k-mean clustering | Length of stay, the use of ventilation, inotropes and intubation, and mortality | Indirect | Clinical decision support | 7 |
Chaichulee et al (2019)77 | Use frameworks to detect time periods and skin regions of interest to estimate vital signs in NICU patients | NICU | Convolutional neural network | Cardiorespiratory signal | Indirect | Diagnostics | 7 |
Kayhanian et al (2019)20 | Identify admission laboratory variables correlated with outcomes after traumatic brain injury in children | PICU | SVM | Favorable versus unfavorable outcomes not clearly defined; mortality mentioned in background | Indirect | Triage and early diagnosis | 7 |
Kim et al (2019) 78 | Describe the development and evaluation of the Pediatric Risk of Mortality Prediction Tool for real-time mortality prediction in PICU patients | PICU | Convolutional neural network | All-cause PICU mortality | Indirect | Triage and early diagnosis | 7 |
Masino et al (2019)24 | Develop a model capable of recognizing sepsis at least 4 h before clinical recognition, with the goal of decreasing mortality in neonates | NICU | AdaBoost, GB, Gaussian process, k-NN, LR, Naïve Bayes, RF, and SVM | Mortality | Indirect | Triage and early diagnosis | 9 |
Matam et al (2019)18 | Evaluate if an automated analysis of multivariate physiologic data would enable early identification and prediction of cardiac arrests and reduce mortality in children | PICU | Nonlinear signal-processing algorithms | Mortality | Indirect | Triage and early diagnosis | 7 |
Moccia et al (2019)32 | Propose a new approach to limb pose estimation to assess health status and detect cognitive/motor disorders in preterm infants | NICU | Convolutional neural network | Ability to detect both spatial and temporal features from video recordings of neonatal limb movement; theoretically aimed at assessing health status and early detection cognitive/motor disorders | Indirect | Triage and early diagnosis | 7 |
Nagori et al (2019)16 | Construct a noninvasive and automated model to predict shock in children aged 0 to 12 y | PICU | Generalized linear model and RF | Assessment of hemodynamic shock, shock index, and model learning | Indirect | Triage and early diagnosis | 7 |
Ornek et al (2019)31 | Use Infrared Thermography to detect health status and diagnose disease to reduce mortality in neonates | NICU | Convolutional neural network | Detection of neonatal health status and reduction of mortality in the setting of timely diagnosis of disease and anomalies | Indirect | Triage and early diagnosis | 7 |
Ruiz et al (2019)42 | Achieve early prediction of critical events (eg, cardiopulmonary resuscitation) to reduce the length of stay, morbidity, and mortality in infants aged <6 mo with single-ventricle physiology before second-stage surgery | NICU/PICU/CICU | Naïve Bayes | Ability to predict critical events and patient deterioration within up to 8 h; theoretically aimed at reducing morbidity, mortality, length of stay, and health care costs | Indirect | Triage and early diagnosis | 7 |
Fraiwan et al (2020)79 | Investigate the use of a long short-term memory learning system in automatic sleep stage scoring to optimize healthy brain development in neonates | NICU | Long short-term memory neural network | Associated with healthy brain development | Indirect | Triage and early diagnosis | 9 |
Hamilton et al (2020)80 | Estimate the risk of severe neonatal morbidity (ie, death, intraventricular hemorrhage, ≥28 d on a ventilator, periventricular leukomalacia, or stage III necrotizing enterocolitis) in preterm births <32 wk gestation | NICU | X | Severe neonatal morbidity was defined by the presence of any of 5 outcomes: death, grade 3 or 4 intraven tricular hemorrhage, and ≥28 d on ventilator, periven tricular leukomalacia, or stage III necrotizing enterocolitis | Indirect | Clinical decision support | 7 |
Scott et al (2020)81 | Identify and compare altered metabolites and metabolic pathways in infants with culture-proven bacterial meningitis | NICU | RF | Morbidity and mortality | Indirect | Triage and early diagnosis | 7 |
Author . | Study Aim/Objective . | Setting . | Type of AI Technology/Algorithm . | Outcome(s) . | Impact . | Focus Area of Impact . | TRL . |
---|---|---|---|---|---|---|---|
Saria et al (2010)30 | Develop a tool that reflects physiologic status and predicts future illness severity in newborns | NICU | PhysiScore | Morbidity and mortality | Direct | Triage and early diagnosis | 7 |
Saadah et al (2014)44 | Identify premature infants who are likely to benefit from palivizumab prophylaxis during nosocomial outbreaks of respiratory syncytial virus | NICU | Artificial neural network | Mortality, days of supplemental oxygen, and length of NICU stay | Direct | Clinical decision support | 7 |
Matic et al (2016)19 | Develop an algorithm to quantify background electroence phalography dynamics in term neonates with hypoxic ischemic encephalopathy | NICU | Least-square SVM | Neurodevelop- mental outcome | Direct | Clinical decision support | 7 |
Caparros-Gonzalez et al (2018)46 | Clarify the effects of a music therapy intervention on the respiratory rate, oxygen saturation, blood pressure, and heart rate of premature infants | NICU | Classification and regression tree | Stress-related state characterized by autonomic changes (eg, heart rate and blood pressure elevations) | Direct | Prevention and wellness | 7 |
He et al (2018)26 | Propose a framework based on resting state fMRI functional connectome data to predict cognitive deficits/outcomes in preterm infants | NICU | Artificial neural network | Cognitive outcome | Direct | Triage and early diagnosis | 7 |
Podda et al (2018)27 | Develop a Preterm Infants Survival Assessment tool to predict survival of very preterm and very low birth wt infants | NICU | Artificial neural network | Mortality before NICU discharge | Direct | Triage and early diagnosis | 9 |
Clark et al (2019) 70 | Use rapid whole-genome sequencing to diagnose genetic diseases to predict mortality and morbidity risk in seriously ill children | NICU/PICU | NLP, decision trees, Bayesian models, and neural networks | Morbidity and mortality | Direct | Diagnostics | 9 |
Temko et al (2011)25 | Present a multichannel patient-independent system to detect seizures, which are often a precursor of brain injury in neonates | NICU | SVM | Brain damage | Indirect | Triage and early diagnosis | 7 |
Wang et al (2013)21 | Establish an approach to identify a minimum set of predictive biomarkers and, ultimately, improve early detection of sepsis in infants | NICU | Sparse SVM and Lasso linear regression | Morbidity and mortality | Indirect | Triage and early diagnosis | 8 |
Chaves, et al (2014)28 | Build a linguistic model to estimate the risk of death in neonates | NICU | Fuzzy logic model | Mortality before hospital discharge | Indirect | Triage and early diagnosis | 9 |
Mani et al (2014)22 | Develop models from “off-the-shelf” medical data to predict late-onset sepsis and, ultimately, decrease the incidence of mortality in neonates | NICU | Naïve Bayes, classification and regression tree, AODE, and RF | Mortality | Indirect | Triage and early diagnosis | 3 |
Wang et al (2014) 71 | Identify diagnostic biomarkers (ie, angiopoietin-1, angiopoietin-2, and bicarbonate) to predict mortality in children with severe sepsis | PICU | SVM | Mortality | Indirect | Clinical decision support | 7 |
Kennedy CE, et al (2015)17 | Build and test cardiac arrest prediction models to measure changes in prediction accuracy among pediatric patients at risk for disability and death | PICU | SVM | Morbidity and mortality | Indirect | Triage and early diagnosis | 8 |
Toltzis et al (2015) 72 | Devise a Crisis Standards of Care triage allocation scheme to determine the need for ventilation, length of stay, and mortality risk in children | PICU | Linear regression | Length of PICU stay, mechanical ventilation, and mortality | Indirect | Care delivery and chronic management | 7 |
Campbell et al (2016) 73 | Use quantitative image analysis to identify vascular features of the retina to diagnose plus disease and possible blindness in premature infants | NICU | iROP | Percentage accuracy of iROP classification of plus disease | Indirect | Triage and early diagnosis | 7 |
Carlin et al (2018)7 4 | Predict individual physiologically acceptable states at PICU discharge to determine the total length of stay | PICU | Recursive neural network | Length of stay | Indirect | Care delivery and chronic management | 8 |
Irles et al (2018)43 | Forecast intestinal perforation related to necrotizing enterocolitis and investigate variables that may predict neurodevelopmental impairment and mortality in neonates | NICU | Artificial neural network | Development of colitis; however, mortality and neuro developmental outcomes mentioned | Indirect | Triage and early diagnosis | 8 |
Lamping et al (2018) 75 | Develop and validate a diagnostic model based on routinely available parameters to discriminate sepsis and noninfectious systemic inflammatory response syndrome in children | PICU | RF | Presence of noninfectious systemic inflammatory response syndrome or sepsis | Indirect | Triage and early diagnosis | 9 |
Shirwaikar et al (2018) 76 | Use classification models to predict the adequacy and effectiveness of caffeine to treat apneic episodes in neonates | NICU | Deep belief network and multilayered perceptron | Apneic episodes, drug effectiveness, and mortality | Indirect | Clinical decision support | 7 |
Williams et al (2018)63 | Explore medical data that may predict the length of stay, use of ventilation and inotropes, and mortality risk in PICU patients | PICU | k-mean clustering | Length of stay, the use of ventilation, inotropes and intubation, and mortality | Indirect | Clinical decision support | 7 |
Chaichulee et al (2019)77 | Use frameworks to detect time periods and skin regions of interest to estimate vital signs in NICU patients | NICU | Convolutional neural network | Cardiorespiratory signal | Indirect | Diagnostics | 7 |
Kayhanian et al (2019)20 | Identify admission laboratory variables correlated with outcomes after traumatic brain injury in children | PICU | SVM | Favorable versus unfavorable outcomes not clearly defined; mortality mentioned in background | Indirect | Triage and early diagnosis | 7 |
Kim et al (2019) 78 | Describe the development and evaluation of the Pediatric Risk of Mortality Prediction Tool for real-time mortality prediction in PICU patients | PICU | Convolutional neural network | All-cause PICU mortality | Indirect | Triage and early diagnosis | 7 |
Masino et al (2019)24 | Develop a model capable of recognizing sepsis at least 4 h before clinical recognition, with the goal of decreasing mortality in neonates | NICU | AdaBoost, GB, Gaussian process, k-NN, LR, Naïve Bayes, RF, and SVM | Mortality | Indirect | Triage and early diagnosis | 9 |
Matam et al (2019)18 | Evaluate if an automated analysis of multivariate physiologic data would enable early identification and prediction of cardiac arrests and reduce mortality in children | PICU | Nonlinear signal-processing algorithms | Mortality | Indirect | Triage and early diagnosis | 7 |
Moccia et al (2019)32 | Propose a new approach to limb pose estimation to assess health status and detect cognitive/motor disorders in preterm infants | NICU | Convolutional neural network | Ability to detect both spatial and temporal features from video recordings of neonatal limb movement; theoretically aimed at assessing health status and early detection cognitive/motor disorders | Indirect | Triage and early diagnosis | 7 |
Nagori et al (2019)16 | Construct a noninvasive and automated model to predict shock in children aged 0 to 12 y | PICU | Generalized linear model and RF | Assessment of hemodynamic shock, shock index, and model learning | Indirect | Triage and early diagnosis | 7 |
Ornek et al (2019)31 | Use Infrared Thermography to detect health status and diagnose disease to reduce mortality in neonates | NICU | Convolutional neural network | Detection of neonatal health status and reduction of mortality in the setting of timely diagnosis of disease and anomalies | Indirect | Triage and early diagnosis | 7 |
Ruiz et al (2019)42 | Achieve early prediction of critical events (eg, cardiopulmonary resuscitation) to reduce the length of stay, morbidity, and mortality in infants aged <6 mo with single-ventricle physiology before second-stage surgery | NICU/PICU/CICU | Naïve Bayes | Ability to predict critical events and patient deterioration within up to 8 h; theoretically aimed at reducing morbidity, mortality, length of stay, and health care costs | Indirect | Triage and early diagnosis | 7 |
Fraiwan et al (2020)79 | Investigate the use of a long short-term memory learning system in automatic sleep stage scoring to optimize healthy brain development in neonates | NICU | Long short-term memory neural network | Associated with healthy brain development | Indirect | Triage and early diagnosis | 9 |
Hamilton et al (2020)80 | Estimate the risk of severe neonatal morbidity (ie, death, intraventricular hemorrhage, ≥28 d on a ventilator, periventricular leukomalacia, or stage III necrotizing enterocolitis) in preterm births <32 wk gestation | NICU | X | Severe neonatal morbidity was defined by the presence of any of 5 outcomes: death, grade 3 or 4 intraven tricular hemorrhage, and ≥28 d on ventilator, periven tricular leukomalacia, or stage III necrotizing enterocolitis | Indirect | Clinical decision support | 7 |
Scott et al (2020)81 | Identify and compare altered metabolites and metabolic pathways in infants with culture-proven bacterial meningitis | NICU | RF | Morbidity and mortality | Indirect | Triage and early diagnosis | 7 |
TRL 1: Basic principles observed and reported (lowest level to technology readiness). TRL 2: Technology concept and/or application formulated. TRL 3: Analytical and experimental critical function and/or characteristic proof of concept. TRL 4: Component and/or breadboard validation in laboratory environment. TRL 5: Component and/or breadboard validation in relevant environment. TRL 6: System and/or subsystem model or prototype demonstration in a relevant environment. TRL 7: System prototype demonstration in an operational environment. TRL 8: Actual system completed and qualified through test and demonstration. TRL 9: Actual system has proven through successful mission operations. AdaBoost, Adaptive Boosting; AODE, averaged one dependence estimators; CICU, cardiac intensive care unit; GB, gradient boosting; iROP, imaging and informatics in retinopathy of prematurity; k-NN, k nearest neighbor; LR, logistic regression; RF, random forest; SVM, support vector machine; X, not reported.
Impact of AI Use on Health Outcome(s) and Implementation Readiness
A direct impact of AI on at least 1 health outcome was described in 7 studies (22%). In each study, researchers documented an improvement in health outcome after AI implementation. These studies aligned with the following 4 impact focus areas: triage and early diagnosis (43%; n = 3), clinical decision support (29%; n = 2), diagnostics (14%; n = 1), self-care/prevention/wellness (14%; n = 1) (Table 1). For example, AI was used in predictive models to identify and reduce the risk of morbidity and mortality among children with clinical illness.30,41,43 In one study, trained, tested, and validated AI algorithms were used to successfully identify preterm infants with congenital heart disease who were at risk for respiratory syncytial virus and required vaccinations to prevent severe illnesses and increased mortality.44 These preterm infants were noted to have a decreased oxygen need and an overall improved health outcome following vaccination. In another study, AI was used to accurately diagnose 3 out of 7 critically ill PICU patients with genetic diseases.45 Finally, researchers from Spain used AI to develop relaxing music tunes for preterm infants in the NICU.46 These neonates had reduced stress levels, identified by changes in vital signs.
In the remaining 25 studies, researchers described AI algorithms with a more limited, indirect impact on health outcomes. In these studies, AI models outperformed current non-AI modalities (eg, expert human opinion and statistical analysis models), However, a change in outcome, such as reduced disease burden, improved recovery times, and decreased length of stay or mortality, was not reported. Other outcome measures, such as surgical outcomes or effects on morbidity or mortality, were also not reported. None of the studies suggested AI use was associated with worsened health outcomes or patient care.
Applying the TRL system to the articles in our review, only 1 AI model and/or algorithm was in the proof-of-concept phase. Majority were in prototype testing in an operational environment with near-implementation readiness. A handful of studies were in the implementation phase, and had been either certified for use or deployed into an ICU setting (Table 1).
AI Classifications and Metrics
In Table 2, we detail AI model data sources and classification metrics. Most relied on electronic medical record (EMR) data, and 29 had defined statistical classification types. In the remaining 3 articles, there were no clear details on the statistical method used to compare AI models to more traditional modalities. Binary classification was the predominant classification type and represented 47% (n = 15) of included studies. In addition to this statistical categorization, researchers also reported a measurable quantitative outcome to compare AI models to current gold standards. Quantitative outcomes or metrics ranged from area under the receiver operating curve (AUROC) (most common at 56% [n = 18] to sensitivity and/or recall [47%; n = 15] to specificity [38%; n = 12]) (Supplemental Table 4). Researchers used anywhere from 1 to 6 different types of quantitative outcomes or metrics. For instance, in 1 study, researchers used accuracy, AUROC, recall, precision, F-measure, and specificity, all to improve reporting quality. Two articles discussed the analysis of AI’s impact on health status, but did not clearly define the quantitative metric or outcome.
Data Source, Classification and Learning Types, and Validation for Included Studies
Author . | Institution(s) . | No. Patients . | Data Source and Type . | Classification Type . | Learning Type . | Cross Validation . | Compared With Clinicians . |
---|---|---|---|---|---|---|---|
Saria et al (2010)30 | X | 138 | EHR: Numerical | Regression | Supervised | X | No |
Saadah et al (2014)44 | X | 176 | Research database: Numerical | Binary | Supervised | X | No |
Matic et al (2016)19 | Sophia Children's Hospital, Erasmus University Medical Center | 53 | Polygraphic EEG video monitoring system: EEG signals | Multiclass | Supervised | fivefold | Yes |
Caparros-Gonzalez et al (2018)46 | 2 public hospitals | 1039 | Observation and vitals: Numerical | Regression | Supervised | fivefold | No |
He et al (2018)26 | Autism Brain Imaging Data Exchange Database | 28 | Research database: Images | Binary | Unsupervised | 10-fold | No |
Podda et al (2018)27 | Italian Neonatal Network | 23 747 | Research database: Numerical | Binary | Supervised | fivefold | No |
Clark et al (2019)70 | Rady Children’s Hospital Epic EHR | 401 | EHR: Textual | Multiclass | Supervised | X | Yes |
Temko et al (2011)25 | NICU of Cork University Maternity Hospital | 17 | Clinical database: EEG signals | Binary | Supervised | fivefold | No |
Wang et al (2013)21 | Hematologic data set | 647 | Clinical database: Biomarkers | Binary | Supervised | X | No |
Chaves, et al (2014)28 | NICU of Taubaté | 92 | EHR: Numerical | X | X | X | No |
Mani et al (2014)22 | Monroe Carell Jr Children’s Hospital | 299 | EHR: Numerical | Binary | Supervised | fivefold | Yes |
Wang et al (2014)7 1 | Tertiary PICU care center | 45 | Clinical database: Biomarkers | Binary | Supervised | X | No |
Kennedy et al (2015)17 | Academic PICU | 212 | Research database: Numerical | Multiclass | Supervised | 10-fold | No |
Toltzis et al (2015)7 2 | X | 150 000 | Virtual PICU database: Numerical | Regression | Supervised | X | No |
Campbell et al (2016)7 3 | 8 academic institutions | X | Clinical database: Images | Multiclass | Supervised | X | Yes |
Carlin et al (2018)7 4 | Children's hospital | 7256 | EHR: Numerical | Regression | Supervised | X | No |
Irles et al (2018)43 | Tertiary care hospital | 76 | EHR: Numerical | Regression | Supervised | X | No |
Lamping et al (2018)7 5 | German tertiary care PICU | 296 | EHR: Numerical | Binary | Supervised | threefold | No |
Shirwaikar et al (2018)7 6 | Population PK study | x | Research database: Numerical | Regression | Supervised | X | No |
Williams et al (2018)63 | Children’s Hospital Los Angeles; Cerner Millennium EMR; and The VPS Database | x | Clinical database and EHR: Numerical | X | Unsupervised | X | No |
Chaichulee et al (2019)78 | X | 15 | Video camera: Videos | Binary | Supervised | twofold | No |
Kayhanian et al (2019)20 | Cambridge University | 94 | EHR: Numerical | Binary | Supervised | fivefold | No |
Kim SY, et al (2019)7 8 | Severance Hospital and Samsung Medical Center | 1723 | EHR: Numerical | Binary | Supervised | fivefold | No |
Masino et al (2019)24 | Children’s Hospital of Philadelphia | 618 | EHR: Numerical | Binary | Supervised | 10-fold | No |
Matam et al (2019)18 | Young Lives Project | 538 | Research database: Numerical and EEG signals | Binary | Supervised | X | Yes |
Moccia et al (2019)32 | X | 16 | Video camera: Videos | Regression | Supervised | No | No |
Nagori et al (2019)16 | X | X | Thermal camera: Images | Regression | Supervised | X | Yes |
Ornek et al (2019)31 | X | 38 | Thermal camera: Images | Binary | Supervised | 10-fold | No |
Ruiz et al (2019)42 | University hospital EHR | 93 | EHR: Numerical | Multiclass | Supervised | fivefold | Yes |
Fraiwan et al (2020)79 | University of Pittsburgh | 37 | Research database: EEG signals | Multiclass | Supervised | 10-fold | No |
Hamilton et al (2020)80 | 10 hospitals, Pediatrix/Obsterix Perinatal Collaborative Research Network | X | Research database: Numerical | X | X | X | No |
Scott et al (2020)81 | X | 38 | EHR: Numerical | Binary | Supervised | X | No |
Author . | Institution(s) . | No. Patients . | Data Source and Type . | Classification Type . | Learning Type . | Cross Validation . | Compared With Clinicians . |
---|---|---|---|---|---|---|---|
Saria et al (2010)30 | X | 138 | EHR: Numerical | Regression | Supervised | X | No |
Saadah et al (2014)44 | X | 176 | Research database: Numerical | Binary | Supervised | X | No |
Matic et al (2016)19 | Sophia Children's Hospital, Erasmus University Medical Center | 53 | Polygraphic EEG video monitoring system: EEG signals | Multiclass | Supervised | fivefold | Yes |
Caparros-Gonzalez et al (2018)46 | 2 public hospitals | 1039 | Observation and vitals: Numerical | Regression | Supervised | fivefold | No |
He et al (2018)26 | Autism Brain Imaging Data Exchange Database | 28 | Research database: Images | Binary | Unsupervised | 10-fold | No |
Podda et al (2018)27 | Italian Neonatal Network | 23 747 | Research database: Numerical | Binary | Supervised | fivefold | No |
Clark et al (2019)70 | Rady Children’s Hospital Epic EHR | 401 | EHR: Textual | Multiclass | Supervised | X | Yes |
Temko et al (2011)25 | NICU of Cork University Maternity Hospital | 17 | Clinical database: EEG signals | Binary | Supervised | fivefold | No |
Wang et al (2013)21 | Hematologic data set | 647 | Clinical database: Biomarkers | Binary | Supervised | X | No |
Chaves, et al (2014)28 | NICU of Taubaté | 92 | EHR: Numerical | X | X | X | No |
Mani et al (2014)22 | Monroe Carell Jr Children’s Hospital | 299 | EHR: Numerical | Binary | Supervised | fivefold | Yes |
Wang et al (2014)7 1 | Tertiary PICU care center | 45 | Clinical database: Biomarkers | Binary | Supervised | X | No |
Kennedy et al (2015)17 | Academic PICU | 212 | Research database: Numerical | Multiclass | Supervised | 10-fold | No |
Toltzis et al (2015)7 2 | X | 150 000 | Virtual PICU database: Numerical | Regression | Supervised | X | No |
Campbell et al (2016)7 3 | 8 academic institutions | X | Clinical database: Images | Multiclass | Supervised | X | Yes |
Carlin et al (2018)7 4 | Children's hospital | 7256 | EHR: Numerical | Regression | Supervised | X | No |
Irles et al (2018)43 | Tertiary care hospital | 76 | EHR: Numerical | Regression | Supervised | X | No |
Lamping et al (2018)7 5 | German tertiary care PICU | 296 | EHR: Numerical | Binary | Supervised | threefold | No |
Shirwaikar et al (2018)7 6 | Population PK study | x | Research database: Numerical | Regression | Supervised | X | No |
Williams et al (2018)63 | Children’s Hospital Los Angeles; Cerner Millennium EMR; and The VPS Database | x | Clinical database and EHR: Numerical | X | Unsupervised | X | No |
Chaichulee et al (2019)78 | X | 15 | Video camera: Videos | Binary | Supervised | twofold | No |
Kayhanian et al (2019)20 | Cambridge University | 94 | EHR: Numerical | Binary | Supervised | fivefold | No |
Kim SY, et al (2019)7 8 | Severance Hospital and Samsung Medical Center | 1723 | EHR: Numerical | Binary | Supervised | fivefold | No |
Masino et al (2019)24 | Children’s Hospital of Philadelphia | 618 | EHR: Numerical | Binary | Supervised | 10-fold | No |
Matam et al (2019)18 | Young Lives Project | 538 | Research database: Numerical and EEG signals | Binary | Supervised | X | Yes |
Moccia et al (2019)32 | X | 16 | Video camera: Videos | Regression | Supervised | No | No |
Nagori et al (2019)16 | X | X | Thermal camera: Images | Regression | Supervised | X | Yes |
Ornek et al (2019)31 | X | 38 | Thermal camera: Images | Binary | Supervised | 10-fold | No |
Ruiz et al (2019)42 | University hospital EHR | 93 | EHR: Numerical | Multiclass | Supervised | fivefold | Yes |
Fraiwan et al (2020)79 | University of Pittsburgh | 37 | Research database: EEG signals | Multiclass | Supervised | 10-fold | No |
Hamilton et al (2020)80 | 10 hospitals, Pediatrix/Obsterix Perinatal Collaborative Research Network | X | Research database: Numerical | X | X | X | No |
Scott et al (2020)81 | X | 38 | EHR: Numerical | Binary | Supervised | X | No |
EHR, electronic health record; PK, pharmacokinetics; VPS, virtual pediatric systems; X, not reported.
Discussion
AI has the potential to transform pediatric care by improving clinical decision-making and optimizing care delivery.47 This is particularly true in fast-paced, critcal care environments where physicians regularly make life-saving decisions. This systematic review is the first in which researchers explore the use of AI to improve health outcomes for patients in NICUs and PICUs. Bedside application of AI was identified in 7 studies, and all were associated with an improved health outcome. Additionally, the articles in this review, were categorized into specific impact focus areas, and appear consistent with the makeup of health care AI today: applications to support clinical decision-making, patient-triage, and early disease detection.
Technology Readiness and AI Impact
Although some experts believe AI will change the future of health care, most emerging AI technologies have yet to be adopted by the general medical community.48 Technological breakthroughs do not always translate into tools that are ready for real-world use. AI has certainly made strides in adult medicine, particularly in inpatient and ICU settings,49,50 but its use in pediatric medicine lags behind.51 AI health care applications, like most other tools and medical interventions, must undergo rigorous testing before being adapted into everyday clinical practice.48 AI use in neonatal and pediatric critical care may lag behind other fields of medicine because of uncertainties surrounding the safety and efficacy of these tools and techniques among particularly vulnerable populations like extremely preterm infants. There is a dearth of research on the use of AI in low- to middle-income countries in which many neonates and children require intensive care treatment. The impact of AI in such settings is an important area for future investigation.
Applying the TRL system to the articles in our review, few were in the implementation phase or deployed into an ICU setting. The AI models in these advanced phases were focused on narrow tasks, which is consistent with the original European Commission report, revealing that higher TRLs are achievable for AI technologies that are focused on specific capabilities.42 Because it is difficult to predict which technologies will become an integral part of everyday care in the NICU and PICU, the TRL system may provide some insight into which AI models are making headway into pediatric critical care.
In majority of studies, researchers described AI algorithms that outperformed conventional modalities, and may have indirectly impacted health outcomes. Whether these AI modalities will lead to actual changes in health outcomes for patients is unknown. Future AI research will need to use prospective, experimental study designs to assess the impact of AI on outcomes (eg, disease burden, length of stay, readmissions, and mortality), and use reliable, validated measures and implementation frameworks.
AI Metrics and Reporting Heterogeneity
It was difficult to assess the AI models in our review because of marked heterogeneity in reporting. For instance, in 1 study, researchers measured AI model performance on the basis of the AUROC quantitative outcome while others used recall. Metrics like AUROC may be deemed superior52–54 ; however, they require that researchers have extensive technical knowledge to provide appropriate analysis.8,9,55 Although it is reasonable for some AI models to use different performance metrics, this level of heterogeneity in reporting makes the comparison and evaluation of AI algorithms challenging. Moreover, the translation of these algorithms into clinical practice is also impaired because of heterogeneity in the algorithms used, the data used for training, and a lack of spectral or geographical validation. AI models need to be comparable when analyzing data from the same target population and when using metrics to assess overall AI performance. Otherwise, medical decisions made by using inappropriate AI algorithms could lead to poor patient outcomes.
Barriers and Next Steps of AI in Pediatric Critical Care
AI technologies must be able to decipher and translate gigabytes of clinical data to function as effective decision support systems and aid intensivists in real-time. Notably, though, the subject of this review is focused on intensive care, many of these challenges may be broadly applicable across pediatric inpatient settings. A significant challenge in applying AI technologies is that the inpatient data needed for translation and training are typically complex, heterogenous across institutions50 and largely unstructured.9,39,50,56–58 Although digital data has become more easily accessible for analysis, it is not regularly sampled or accurately measured and recorded. Unstructured text data, like clinical notes in most inpatient settings, necessitate the use of NLP.59 Unfortunately, annotating large chunks of text to train any model is expensive and time consuming.59 In our review, there was only 1 study in which researchers used NLP, suggesting that this technology remains a more nascent tool among AI developers in settings of neonatal- and pediatric critical care.
Structuring, annotating, and analyzing unstructured data is often cumbersome. Very few studies60,61 developed researchers have been able to develop AI-based approaches for extracting unstructured data from the EMR. This poses a dilemma in adopting AI into hospital-based practice because physicians rely on unstructured EMR data to understand complex disease processes. In addition, AI models trained on unstructured and/or potentially biased input data might also generate misleading and biased outputs and negatively affect patient care.62–64 The studies in our review neither check for bias in data nor discuss specific techniques used to handle unbalanced or biased data, which contributes to the lack of reproducibility of results.
Algorithms trained on structured data can make AI input more user-friendly, ensure the extraction of quality data, and improve AI performance.65 Future directions should explore a concept known as “representation learning.” In representation learning, AI models automatically learn features and use a predictive model to produce an abstract representation of individual patient data for clinical data extraction.66,67 Apart from the technical limitations, adopting AI into neonatal and pediatric intensive care can also be hindered by several human factors and system-integration barriers. Future studies should focus on factors that influence clinician-trust in AI and explore the impact of AI on clinical workflow. The use of AI over time may also induce cognitive biases among providers. For instance, clinicians may begin to blindly trust the information generated by the AI system (because of its consistent and highly efficient performance) or reject AI output data without ultimately considering its effect on outcome (because of past experiences). From a quantitative perspective, AI trained on a particular subset of patients may be biased and produce results that hold good only for that particular patient demographic. To address such issues, future researchers will need to implement the aid of computer scientists, human factors engineers, and medical experts.
In addition to the barriers surrounding AI and data extraction from the EMR, AI algorithms are often unable to decipher the complexities and comorbidities involved in pediatric inpatient care. AI algorithms use all potential and available signals to achieve the best possible performance. Unreliable confounders, however, may be included and, as a result, impair the algorithm’s ability to generalize to new data sets.9,39,68 It is, thus, necessary to understand the input data on which an AI has been trained and compared. In studies where algorithms were compared with physicians, the assumption that every physician is a gold standard may not necessarily be true. Several studies had a retrospective research design, meaning that researchers used historically labeled data to train and test algorithms (supervised learning). Importantly, AI's utility can only be realized through prospective studies because AI performance is likely to worsen when encountering real-world data that differs from algorithm training.
Finally, most studies simplified pediatric complications into binary classification types: assigning an individual to 1 of 2 categories (eg, disease versus no disease) by measuring a series of attributes. However, such an approach neglects that a patient might suffer from multiple comorbidities, each having a different severity or interdependency. Given that barriers to adopting AI are seldom discussed in the methodology of these studies, we did not include a summary or analysis of these factors in our review. This is another important area for future investigation.
Limitations
This review has important limitations that should be considered. Our scope was limited to the last 10 years; coupled with the fact that AI technology and its conceptual definition, are rapidly evolving, it is possible that all relevant articles may not have been captured. To assess AI's impact on health outcomes, we used the EIT Health joint report, which was designed with adult health care settings in mind. This categorization system, however, remained applicable to our patient population in a majority of the focus areas. We excluded non-English articles, which may have impaired our ability to describe the use of AI in an even broader population. Finally, we were unable to conduct a meta-analysis because of the heterogeneity in study designs and outcomes measured.
Conclusions
There is growing literature supporting the use of AI technologies to improve pediatric health outcomes; however, it will be some time before AI will achieve widespread implementation into everyday pediatric hospital practice. Barriers remain to using supervised AI models for inpatient data. Input training on supervised models does not allow for effective application of AI in real-world scenarios (eg, complex critical illnesses with comorbidities), and AI typically has difficulty processing unstructured data for clinical data extraction. Although there is a need for more research investigating the use of AI technologies, particularly around handling unstructured data and unsupervised analysis, the potential benefits of AI in pediatric hospital-based care are yet to be fully realized.
FUNDING: No external funding.
Dr Adegboro was responsible for study design, literature review, data analysis and interpretation, and drafting and editing; Mr Choudhury was responsible for concept and study design, literature review, data analysis and interpretation, and writing and editing; Dr Asan was responsible for concept and study design, data interpretation, and editing; Dr Kelly was responsible for study design, data interpretation, and editing; and all authors approved the final manuscript as submitted and agree to be accountable for all aspects of the work.
The protocol for this systematic literature review was registered with the Open Science Framework (identifier DOI 10.17605/OSF.IO/UJFVG).
References
Competing Interests
FINANCIAL DISCLOSURE: All authors indicate they have no financial relationships relevant to this article to disclose.
POTENTIAL CONFLICT OF INTEREST: The authors have indicated they have no potential conflicts of interest to declare.
Comments