BACKGROUND

Identifying children at high risk with complex health needs (CCHN) who have intersecting medical and social needs is challenging. This study’s objectives were to (1) develop and evaluate an electronic health record (EHR)-based clinical predictive model (“model”) for identifying high-risk CCHN and (2) compare the model’s performance as a clinical decision support (CDS) to other CDS tools available for identifying high-risk CCHN.

METHODS

This retrospective cohort study included children aged 0 to 20 years with established care within a single health system. The model development/validation cohort included 33 months (January 1, 2016–September 30, 2018) and the testing cohort included 18 months (October 1, 2018–March 31, 2020) of EHR data. Machine learning methods generated a model that predicted probability (0%–100%) for hospitalization within 6 months. Model performance measures included sensitivity, positive predictive value, area under receiver-operator curve, and area under precision-recall curve. Three CDS rules for identifying high-risk CCHN were compared: (1) hospitalization probability 10% (model-predicted); (2) complex chronic disease classification (using Pediatric Medical Complexity Algorithm [PMCA]); and (3) previous high hospital utilization.

RESULTS

Model development and testing cohorts included 116 799 and 27 087 patients, respectively. The model demonstrated area under receiver-operator curve = 0.79 and area under precision-recall curve = 0.13. PMCA had the highest sensitivity (52.4%) and classified the most children as high risk (17.3%). Positive predictive value of the model-based CDS rule (19%) was higher than CDS based on the PMCA (1.9%) and previous hospital utilization (15%).

CONCLUSIONS

A novel EHR-based predictive model was developed and validated as a population-level CDS tool for identifying CCHN at high risk for future hospitalization.

Children with complex health needs (CCHN) are an important high-need, high-cost population1  that includes children and youth with special health care needs2,3  and children with medical complexity4  and who are defined by the presence of high medical needs and intersecting social needs.5  Because their complex “neighborhood”6  of health services and providers is often fragmented, care coordination services7  and integrated systems of care2,8  are essential for CCHN and their families. However, CCHN often receive inadequate care coordination9  and most lack access to a well-functioning system of care.10 

A critical and challenging first step toward improving systems and facilitating care coordination for CCHN is population-level identification of patients at high risk for adverse outcomes. Child population risk stratification often relies on categorization by medical complexity level; multiple tools are available for this purpose.11  For example, Complex Chronic Conditions12  and the Pediatric Medical Complexity Algorithm (PMCA)13  are diagnosis-based complexity classification tools used by health systems and payers14 ; and previous patterns of high health-service utilization are commonly used to prioritize children for complex care coordination services.11,15  An important limitation of these complexity classification tools is exclusion of nonmedical health drivers (eg, socioeconomic).16  Furthermore, available tools rely on past administrative data to classify a patient’s current complexity level, which risks facilitating care coordination services reactive to established health needs. In contrast, tools that use past data to evaluate risk for future outcomes could facilitate services that proactively address anticipated health needs and mitigate risk for future adverse outcomes.

The breadth of data within electronic health records (EHRs) provides an opportunity to develop more granular risk prediction tools. EHR-based predictive models offer advantages, including lower cost assembly of large datasets, ability to analyze numerous variables and outcomes, and readiness for use in clinical settings.17  Machine learning methods have been applied to develop predictive models as clinical decision support (CDS) tools, defined as tools and systems that provide timely information to help inform decisions about patient care.1719  CDS informed by EHR-based clinical predictive models using machine learning methods are appealing because they use large datasets gathered during routine care, thereby enhancing real-world relevance.17,20 

Despite the potential value of EHR-based clinical predictive modeling, experience applying these methods in pediatrics for CCHN is limited. This study describes: (1) the development of a novel EHR-based, machine learning clinical predictive model as a CDS tool to identify high-risk CCHN; and (2) comparison of the predictive model to CDS based on two available complexity classification tools, the PMCA and prior high hospital utilization, for identification of high-risk CCHN.

We conducted this study at an academic health system that includes a tertiary children’s hospital with more than 6000 annual discharges and provides care across urban and rural communities in the southern United States. The health system has used an enterprise-wide EHR (Epic; Verona, WI) since 2014. A large pediatric primary care practice in the health system that sees >24 000 children and conducts >59 000 visits/y was selected a priori as a primary care testing site for the developed CDS tool.

Cohort Definitions

Patient eligibility for the study cohort for development and testing of the clinical predictive model (referred to as the “model”) included age 0 to 20 years old and attribution to a primary care physician within our health system. We abstracted EHR data on included patients from January 1, 2016 to March 31, 2020. To reflect how the tool would be used in practice to risk-stratify patients on a monthly basis, every month a patient was study-eligible, their EHR data contributed a unique person-month of patient-level data analyzed by the model. Person-months were excluded if the individual was pregnant to reduce confounding on model predictions introduced by including routine pregnancy-related hospitalizations. Children without clinical encounters in our health system in the previous 18 months were excluded to focus model predictions on patients more likely to have established care within our health system.

Outcome Definition

The primary outcome predicted by the model was all-cause hospitalization within 6 months of the current eligible patient-month. Model output was the probability (0%–100%) for each patient to be hospitalized in the next 6 months.

Model Features

We abstracted patient demographics (age, sex, race, ethnicity, payer), diagnosis codes, and health service utilization data (outpatient, emergency, inpatient) from the EHR. We grouped International Classification of Diseases, 10th revision (ICD-10) diagnosis codes by organ systems as defined by the PMCA.13  We supplemented clinical, demographic, and health service utilization data with Area Deprivation Index (ADI)21  scores for each patient’s EHR-documented home ZIP code. The ADI is a composite measure of community-level disadvantage and socioeconomic status (SES) based on 17 US census–based measures of education, housing, employment, and poverty.21  The ADI score served as a proxy for individual SES, with higher ADI scores correlating with higher community-level disadvantage and lower SES. We tabulated service utilization counts based on the previous 365 and 30 days from the first day of the current person-month. We included a total of 50 variables in the final predictive model (Supplemental Table 3). Model features were selected through an iterative process led by a team, including data scientists, an informaticist, and pediatricians.

To facilitate use of the model in clinical care, we developed an online dashboard (Fig 1; Tableau; Seattle, WA)22  to allow clinicians to visualize the model’s risk predictions. We created a workflow that identified all eligible patients on a monthly basis, applied the model to calculate 6-month hospitalization risk, and transferred data for all high-risk patients into the dashboard. The dashboard also displayed all model features at the individual patient level, categorized as sociodemographic, clinical, and health care utilization, so clinicians could see the contribution of each variable to summative risk scores. The layout and functionalities, including the ability to view patient-level details and query risk scores, were codesigned with clinicians to optimize real-world usability.

FIGURE 1

Screenshot from online dashboard built for clinicians to visualize data model predictions.

FIGURE 1

Screenshot from online dashboard built for clinicians to visualize data model predictions.

Close modal

Model Development and Evaluation

For eligible patients in the model development cohort, we divided included person-months between January 1, 2016 and September 30, 2018, into training and validation datasets; data were split in a 70:30 (training/validation) ratio at the patient level to maintain distinct data without overlap of training and validation datasets. Next, we analyzed eligible person-months between October 1, 2018 and September 30, 2019, as a testing cohort (Fig 2) followed by 6 months of follow-up through March 31, 2020, for outcomes analyses.

FIGURE 2

Data flow diagram. N refers to counts of individual patients in each sample.

FIGURE 2

Data flow diagram. N refers to counts of individual patients in each sample.

Close modal

We used gradient boosting machines (GBMs) to build the model. GBM is an ensemble machine learning algorithm that combines shallow trees, known as stumps.23  As a tree-based algorithm, it is effective for modeling nonlinear and heterogenous effects. Based on internal cross-validation, optimal model depth was 3 and consisted of 500 stumps. We used validation data to compare performance of LASSO regularized logistic regression and GBM and found that GBM had the best performance. We evaluated the model’s performance on testing data using area under the receiver operator curve (AUROC), area under the precision recall curve (AUPRC), and calibration slope. To account for individual repeated measurements, we calculated monthly performance metrics and computed a weighted average based on the sample size. Model parameters were not retrained beyond the initial training and validation period.

CDS Rule Development and Comparison

Translation of the model’s risk predictions into a CDS rule for identification of high-risk CCHN was informed by user-centered design methods.24,25  Because of the clinical relevance of care coordination (CC) to supporting CCHN, we considered CC as a use case and focused CDS rule development on the model’s potential to identify high-risk CCHN who could be prioritized for CC services. The first step in the CDS rule development process was partnership with primary care and health system clinicians to understand baseline CC workflows. As a result, we learned that the number of CCHN in need of CC was greater than the workload capacity of existing clinical staff, underscoring the role of a CDS rule to inform allocation of limited available CC services. Next, to align with potential workload capacity of clinical staff to prioritize CC delivery at the primary care test clinic to high-risk patients, a high-risk threshold and CDS rule for the model was established at 10% risk for future hospitalization.

Performance of the model-based CDS rule was assessed using sensitivity, specificity, and precision (measured as positive predictive value [PPV]). To contextualize performance of the model-based CDS rule, we compared it with Two other decision rules used in research and practice that are available to identify high-risk patients in need of CC: (1) PMCA classification as a patient with complex chronic disease (tier 3) and (2) patients with previous high hospital utilization, defined as 2 hospitalizations in the past 12 months. The PMCA tier 3 and previous high hospital utilization criteria were used as alternative thresholds to which we compared the model’s high-risk threshold of 10% risk for future hospitalization. Model development was performed in Python 3.7.6. This work was reviewed and approved by our institutional review board.

The model development cohort assembled between January 1, 2016 and September 30, 2018, included 116 799 patients accounting for 2 353 262 person-months (Fig 2). To assess baseline characteristics of the study sample, we stratified a cross-section of the model development cohort (n = 94 720 patients as of September 30, 2018) into those with and without hospitalizations, respectively, in the previous 12 months. Among all patients in this cross-section of the model development cohort, 2.2% (n = 2061) had 1 hospitalization in the past 12 months. In general, a higher proportion of children with previous hospitalizations were younger, male, publicly insured, and had higher medical complexity (eg, more organ systems involved, higher prevalence of technology dependence) (Table 1).

TABLE 1

Baseline Characteristics of Model Development Cohort (as of September 30, 2018)

Hospitalized in Past 12 Mo (N = 2061)Not Hospitalized in Past 12 Mo (N = 92 659)Total (N = 94 720)
Age, y    
 Mean (SD) 5.5 (6.7) 9.9 (6) 9.8 (6) 
 Median (IQR) 1 (0–11) 10 (5–15) 10 (4–15) 
 Range (0–20) (0–20) (0–20) 
Sex, n (%)    
 Female 981 (47.6) 45 870 (49.5) 46 851 (49.5) 
 Male 1080 (52.4) 46 789 (50.5) 47 869 (50.5) 
Race/ethnicity, n (%)    
 Hispanic 259 (12.6) 9773 (10.5) 10 032 (10.6) 
 Non-Hispanic white 890 (43.2) 42 760 (46.1) 43 650 (46.1) 
 Non-Hispanic Black 648 (31.4) 24 576 (26.5) 25 224 (26.6) 
 Other 264 (12.8) 15 550 (16.8) 15 814 (16.7) 
Insurance type, n (%)    
 Private 977 (47.4) 54 587 (58.9) 55 564 (58.7) 
 Public 998 (48.4) 32 063 (34.6) 33 061 (34.9) 
 Other 86 (4.2) 6009 (6.5) 6095 (6.4) 
Neighborhood-level SES,an (%)    
 High SES 877 (42.6) 49 166 (53.1) 50 043 (52.8) 
 Medium SES 782 (37.9) 31 729 (34.2) 32 511 (34.3) 
 Low SES 402 (19.5) 11 764 (12.7) 12 166 (12.8) 
Hospitalizations past 12 mob    
 Mean (SD) 1.5 (1.4) — 0 (0.3) 
 Median (IQR) 1 (1–1) — 0 (0–0) 
 Range (1–21) — (0–21) 
Hospital days past 12 mo    
 Mean (SD) 14.8 (28.7) — 0.3 (4.8) 
 Median (IQR) 4 (2.2–11.9) — 0 (0–0) 
 Range (0.1–263.1) — (0–263.1) 
ED visits past 12 mo    
 Mean (SD) 1.2 (4.3) 0.1 (0.5) 0.2 (0.8) 
 Median (IQR) 0 (0–1) 0 (0–0) 0 (0–0) 
 Range (0–182) (0–17) (0–182) 
Hospitalizations with ICU stay, past 12 mo    
 Mean (SD) 0.5 (0.7) — 0 (0.1) 
 Median (IQR) 0 (0–1) — 0 (0–0) 
 Range (0–5) — (0, 5) 
Total ICU days, past 12 mo    
 Mean (SD) 5.6 (17.4) — 0.1 (2.7) 
 Median (IQR) 0 (0–2.6) — 0 (0–0) 
 Range (0–218) — (0–218) 
Outpatient visits, past 12 moc    
 Mean (SD) 12.3 (16.2) 3.2 (4.9) 3.4 (5.5) 
 Median (IQR) 8 (4–15) 2 (1–4) 2 (1–4) 
 Range (0–184) (0–144) (0–184) 
Outpatient visit no-show count, past 12 mod    
 Mean (SD) 1.5 (2.6) 0.4 (1.1) 0.4 (1.2) 
 Median (IQR) 0 (0–2) 0 (0–0) 0 (0–0) 
 Range (0–21) (0–31) (0–31) 
Technology dependence,en (%)    
 Yes 54 (2.6) 85 (0.1) 139 (0.1) 
 No 2007 (97.4) 92 574 (99.9) 94 581 (99.9) 
Medical complexity level,fn (%)    
 Without complex disease, tier 1 560 (27.2) 52 982 (57.2) 53 542 (56.5) 
 Noncomplex, chronic disease, tier 2 447 (21.7) 25 043 (27.0) 25 490 (26.9) 
 Complex, chronic disease, tier 3 1054 (51.1) 14 634 (15.8) 15 688 (16.6) 
Organ systems involved,gn (%)    
 Cardiac 610 (29.6) 3557 (3.8) 4167 (4.4) 
 Pulmonary 456 (22.1) 10 334 (11.2) 10 790 (11.4) 
 Gastrointestinal 366 (17.8) 1187 (1.3) 1553 (1.6) 
 Renal 318 (15.4) 1367 (1.5) 1685 (1.8) 
 Other 1369 (66.4) 25 288 (27.3) 26 657 (28.1) 
Number of organ systems involved    
 Mean (SD) 2.8 (3) 0.6 (1.1) 0.6 (1.2) 
 Median (IQR) 2 (0–4) 0 (0–1) 0 (0–1) 
 Range (0–14) (0–15) (0–15) 
Hospitalized in Past 12 Mo (N = 2061)Not Hospitalized in Past 12 Mo (N = 92 659)Total (N = 94 720)
Age, y    
 Mean (SD) 5.5 (6.7) 9.9 (6) 9.8 (6) 
 Median (IQR) 1 (0–11) 10 (5–15) 10 (4–15) 
 Range (0–20) (0–20) (0–20) 
Sex, n (%)    
 Female 981 (47.6) 45 870 (49.5) 46 851 (49.5) 
 Male 1080 (52.4) 46 789 (50.5) 47 869 (50.5) 
Race/ethnicity, n (%)    
 Hispanic 259 (12.6) 9773 (10.5) 10 032 (10.6) 
 Non-Hispanic white 890 (43.2) 42 760 (46.1) 43 650 (46.1) 
 Non-Hispanic Black 648 (31.4) 24 576 (26.5) 25 224 (26.6) 
 Other 264 (12.8) 15 550 (16.8) 15 814 (16.7) 
Insurance type, n (%)    
 Private 977 (47.4) 54 587 (58.9) 55 564 (58.7) 
 Public 998 (48.4) 32 063 (34.6) 33 061 (34.9) 
 Other 86 (4.2) 6009 (6.5) 6095 (6.4) 
Neighborhood-level SES,an (%)    
 High SES 877 (42.6) 49 166 (53.1) 50 043 (52.8) 
 Medium SES 782 (37.9) 31 729 (34.2) 32 511 (34.3) 
 Low SES 402 (19.5) 11 764 (12.7) 12 166 (12.8) 
Hospitalizations past 12 mob    
 Mean (SD) 1.5 (1.4) — 0 (0.3) 
 Median (IQR) 1 (1–1) — 0 (0–0) 
 Range (1–21) — (0–21) 
Hospital days past 12 mo    
 Mean (SD) 14.8 (28.7) — 0.3 (4.8) 
 Median (IQR) 4 (2.2–11.9) — 0 (0–0) 
 Range (0.1–263.1) — (0–263.1) 
ED visits past 12 mo    
 Mean (SD) 1.2 (4.3) 0.1 (0.5) 0.2 (0.8) 
 Median (IQR) 0 (0–1) 0 (0–0) 0 (0–0) 
 Range (0–182) (0–17) (0–182) 
Hospitalizations with ICU stay, past 12 mo    
 Mean (SD) 0.5 (0.7) — 0 (0.1) 
 Median (IQR) 0 (0–1) — 0 (0–0) 
 Range (0–5) — (0, 5) 
Total ICU days, past 12 mo    
 Mean (SD) 5.6 (17.4) — 0.1 (2.7) 
 Median (IQR) 0 (0–2.6) — 0 (0–0) 
 Range (0–218) — (0–218) 
Outpatient visits, past 12 moc    
 Mean (SD) 12.3 (16.2) 3.2 (4.9) 3.4 (5.5) 
 Median (IQR) 8 (4–15) 2 (1–4) 2 (1–4) 
 Range (0–184) (0–144) (0–184) 
Outpatient visit no-show count, past 12 mod    
 Mean (SD) 1.5 (2.6) 0.4 (1.1) 0.4 (1.2) 
 Median (IQR) 0 (0–2) 0 (0–0) 0 (0–0) 
 Range (0–21) (0–31) (0–31) 
Technology dependence,en (%)    
 Yes 54 (2.6) 85 (0.1) 139 (0.1) 
 No 2007 (97.4) 92 574 (99.9) 94 581 (99.9) 
Medical complexity level,fn (%)    
 Without complex disease, tier 1 560 (27.2) 52 982 (57.2) 53 542 (56.5) 
 Noncomplex, chronic disease, tier 2 447 (21.7) 25 043 (27.0) 25 490 (26.9) 
 Complex, chronic disease, tier 3 1054 (51.1) 14 634 (15.8) 15 688 (16.6) 
Organ systems involved,gn (%)    
 Cardiac 610 (29.6) 3557 (3.8) 4167 (4.4) 
 Pulmonary 456 (22.1) 10 334 (11.2) 10 790 (11.4) 
 Gastrointestinal 366 (17.8) 1187 (1.3) 1553 (1.6) 
 Renal 318 (15.4) 1367 (1.5) 1685 (1.8) 
 Other 1369 (66.4) 25 288 (27.3) 26 657 (28.1) 
Number of organ systems involved    
 Mean (SD) 2.8 (3) 0.6 (1.1) 0.6 (1.2) 
 Median (IQR) 2 (0–4) 0 (0–1) 0 (0–1) 
 Range (0–14) (0–15) (0–15) 

ED, emergency department; IQR, interquartile range; SES, socioeconomic status; —, not applicable.

a

State-level area deprivation index (ADI) scores used as proxy for neighborhood-level SES; high SES = ADI state rank 1–3; medium SES = ADI state rank 4–6; low SES = 7–10.

b

Hospitalization, ED visits, and ICU data limited to our health system.

c

Outpatient visits included all completed ambulatory clinical encounters within our health system.

d

No-shows defined as scheduled outpatient visits for which patient did not arrive and visit was not cancelled.

e

Technology dependence defined by Feudtner et al13  Complex Chronic Conditions classification system applied to preceding 12 mo of diagnosis codes.

f

Medical complexity level classified based on the PMCA v3.0.

g

Organ system defined by PMCA v3.0 codes.

Evaluation of the predictive model’s overall performance using the testing dataset demonstrated a weighted AUROC = 0.79 and a weighted AUPRC = 0.13 (Figs 3A,3B). Analysis of the testing dataset demonstrated that 1.7% (n = 456) of all children with primary care attributed to the test clinic site (n = 27 087) were ever hospitalized during the 18-month testing and outcomes assessment period (October 1, 2018–March 31, 2020). The model’s average sensitivity and positive predictive value were 24.9% and 19%, respectively (Table 2). To assess potential for health inequities, we evaluated model performance based on individual age, race, sex, and insurance status and found that the model performed similarly across all demographic strata (Supplemental Table 4).

FIGURE 3

(A) Area under receiver operator curve (AUROC) of hospitalization risk scores for primary care test clinic patients. (B) Area under precision recall curve (AUPRC) of hospitalization risk scores for primary care test clinic patients. AUROC calculated using data from 27 087 patients included in the primary care testing cohort as of 9/1/2019.

FIGURE 3

(A) Area under receiver operator curve (AUROC) of hospitalization risk scores for primary care test clinic patients. (B) Area under precision recall curve (AUPRC) of hospitalization risk scores for primary care test clinic patients. AUROC calculated using data from 27 087 patients included in the primary care testing cohort as of 9/1/2019.

Close modal
TABLE 2

Comparison of Performance of Data Model, PMCA, and Previous High Hospital Utilization to Identify High-Risk Patients

CDS Rule
MeasureaData Model
(Risk for Hospitalization Within 6 Mo ≥10%)
PMCA
(Classified as Having Complex Chronic Disease)
Prior High Hospital Utilization
(≥2 Hospitalizations in Past 12 Mo)
Prevalence of high-risk patients eligible for care coordination (%)b 0.84
(0.79–0.93) 
17.3
(17.1–17.8) 
0.55
(0.47–0.64) 
Sensitivity (%) 24.9
(20.5–28.7) 
52.4
(43.4–58.5) 
12.95
(9.7–15.3) 
Positive predictive value (%) 18.99
(13–22.8) 
1.96
(1.4–2.5) 
15.2
(10.4–21.6) 
Specificity (%) 99.3
(99.2–99.4) 
82.96
(82.5–83.2) 
99.5
(99.5–99.6) 
CDS Rule
MeasureaData Model
(Risk for Hospitalization Within 6 Mo ≥10%)
PMCA
(Classified as Having Complex Chronic Disease)
Prior High Hospital Utilization
(≥2 Hospitalizations in Past 12 Mo)
Prevalence of high-risk patients eligible for care coordination (%)b 0.84
(0.79–0.93) 
17.3
(17.1–17.8) 
0.55
(0.47–0.64) 
Sensitivity (%) 24.9
(20.5–28.7) 
52.4
(43.4–58.5) 
12.95
(9.7–15.3) 
Positive predictive value (%) 18.99
(13–22.8) 
1.96
(1.4–2.5) 
15.2
(10.4–21.6) 
Specificity (%) 99.3
(99.2–99.4) 
82.96
(82.5–83.2) 
99.5
(99.5–99.6) 

CDS, clinical decision support; PMCA, Pediatric Medical Complexity Algorithm.

a

All values presented as weighted average (min, max) during testing and outcome follow-up timeframe (October 1, 2018–September 1, 2019) for 27 087 patients in the testing cohort.

b

Prevalence was proportion of patients in the testing cohort who were identified as high risk (as defined by each CDS rule, respectively, under evaluation) and eligible for care coordination.

A total of 27 087 patients with primary care attributed to the test clinic site were included in the testing cohort, accounting for 274 678 person-months. Within this testing cohort, 0.8% (n = 2322), 17.3% (n = 47 444), and 0.6% (n = 1524) of the included person-months were classified as high risk by the data model, PMCA classification, and previous high hospitalization use, respectively. In a cross-sectional comparison (as of September 1, 2019; Fig 4), 3856 (14%) of all patients attributed to the primary care test clinic were classified as high risk by at least 1 of the CDS rules. The PMCA score classified the most patients as high risk (n = 3785; 98% of all patients classified high risk by any of the 3 CDS rules), followed by the data model (n = 177; 4.6%) and the previous high hospital utilization rule (n = 102; 2.6%). A total of 86% of patients classified as high risk by the data model were also identified by the PMCA; 28% of high-risk patients classified by the data model were also identified by previous high hospital utilization.

FIGURE 4

Overlap of high-risk patients identified by data model, PMCA, and previous high hospital utilization. Data represent a cross-section of 3856 children attributed to the primary care test clinic as of September 1, 2019, to which the 3 clinical decision support (CDS) rules under evaluation were applied.

FIGURE 4

Overlap of high-risk patients identified by data model, PMCA, and previous high hospital utilization. Data represent a cross-section of 3856 children attributed to the primary care test clinic as of September 1, 2019, to which the 3 clinical decision support (CDS) rules under evaluation were applied.

Close modal

Comparative evaluation of the predictive accuracy of the Three CDS rules is shown in Table 2. On average throughout the model testing period (October 1, 2018–March, 31, 2020), 17.3% of all children were classified as high risk by PMCA, 0.5% by previous high hospital utilization, and 0.8% by the model. High-risk classification using a PMCA-based CDS had the highest sensitivity (52.4%) and lowest PPV (1.9%), whereas the model demonstrated moderate sensitivity (24.9%) and the highest PPV (19%). Compared with the model, CDS based on previous high hospital utilization demonstrated lower sensitivity (13%) and PPV (15%). CDS based on the PMCA classified 17.3% of children as having complex, chronic disease but only 4.7% of those children were actually hospitalized in the next 6 months. Differences between high-risk classification and actual hospitalization rates by CDS based on the data model and previous high hospital utilization, respectively, were smaller (Supplemental Table 4).

We applied machine learning methods to the development and evaluation of a novel EHR-based clinical predictive model that identified CCHN at high risk for future hospitalization. Model evaluation demonstrated moderate to high predictive performance with an AUROC of 0.79. Approximately 1 in 7 children (14%) within a primary care population was classified as high risk by at least one of the two CDS rules. Among the two CDS rules evaluated, sensitivity was low (13%–52%), specificity was high (83%–99%), and precision was highest for the model (PPV 19%).

Performance of the model should be interpreted in the context of the model’s data source, predicted outcome, and other published predictive models. For example, although precision for our model was relatively low compared with PPV for typical laboratory diagnostic tests, the model’s precision was more than 11-fold higher than other EHR data-based predictive models (median PPV among 26 previously published models was 1.7%).17  The high accuracy of the model was similar to other EHR data–based models predicting hospitalization (median C-statistic 0.71 for 10 published models)17  and comparable to published administrative data-based models predicting readmissions among hospitalized children.26,27  Low sensitivity of our model implied a higher proportion of high-risk patients “missed.” However, accuracy of administrative and EHR data–based risk models to predict hospitalizations is limited.17,28  These challenges are magnified for a complex population like CCHN with clinical heterogeneity and varying hospitalization risks.

The number of patients identified and missed are only one part of the assessment of a model’s overall value. A number needed to benefit (NNB) framework is one approach to estimate a predictive model’s value in practice.29  For our model, one component of NNB, the number needed to screen (number of patients flagged as high risk by the model to find one patient who will actually be hospitalized) – was five. A second component of NNB – number needed to treat (NNT), depends on the intervention linked to an implemented model’s predictions. In this case, we hypothesized the model could be applied to inform prioritization of care coordination for identified high-risk patients. If applied in practice, the NNT might be estimated to be between 5 and 9 patients based on published trials of pediatric complex care coordination interventions.15,30  However, NNT estimates of care coordination for CCHN are imprecise because of the small number of trials and variation in child complex care interventions31 ; sites also might apply the model to non-care coordination interventions, thereby altering NNT estimates and limiting the ability to accurately estimate overall NNB for our model at this time. Furthermore, costs of the model and intervention are site-specific factors for NNB and overall value that should be considered if the model is ultimately implemented in practice.

Comparable complexity classification tools11  available for application as CDS rules, such as PMCA and previous high hospital utilization, are associated with health service use27,32 ; however, these tools were not designed to predict future hospitalization. The challenge with using PMCA alone to prospectively identify children at high risk for hospitalization was demonstrated in our study, in which for every 20 patients classified as high risk by the PMCA, one was actually hospitalized. Nonetheless, use of PMCA is expanding at the population level as a tool to identify children who might benefit from additional resources.14  Growth in its use along with PMCA’s established validity informed its selection as a pragmatic CDS tool to compare with our data model.

Although the model was not clinically implemented in this study, our plan is to apply model predictions as CDS to inform delivery of CC and evaluate the potential benefits of using this EHR data–based model approach. If applied clinically in the future to allocation of CC, higher relative precision gained by use of the model may translate into more patients correctly classified as high risk (true positives) who receive CC services. Because CC is resource intensive, a CDS rule with higher PPV may yield large population-level cost savings by more precisely matching limited CC services to CCHN at highest risk for negative outcomes. Future prospective testing of this approach is warranted to clarify the impacts of a model-based strategy in clinical practice.

This study had several strengths. First, the model was co-developed with clinicians to enhance usability and potential for clinical translation.33  Second, we developed the model using multiple years of EHR data and designed it to be broadly applicable to children across the complexity spectrum. In contrast to adults with complex needs whose illnesses and health service use are often well established,34  childhood chronic illnesses often take time to establish (eg, genetic syndromes diagnosed after years of symptomatology, high health service use). Regular, monthly updates to our model’s source data (eg, diagnoses, encounters) aligned with this longer course of pediatric chronic illness and enabled understanding the child’s longitudinal health trajectory. Third, we accounted for health-related social needs via inclusion of ADI as a model variable. Despite limited pediatric studies that integrate measures like ADI within EHR-based models, recent studies demonstrated associations between multidimensional measures of neighborhood-level socioeconomic factors (eg, Child Opportunity Index) and health service outcomes,35,36  thus supporting relationships between neighborhood context and child health. We hypothesize that our model’s inclusion of medical and social risk factors contributed to more precise identification of children who were hospitalized (ie, highest PPV among three CDS rules evaluated). Finally, we conducted silent testing over an 18-month period to evaluate the model’s performance with real-world data before implementing the model within routine workflows.

Several limitations of this study should be acknowledged. First, generalizability of our findings is limited by the single-center, single-EHR study design. Operationalization of our site-specific EHR data demonstrated the potential value of the EHR data modeling approach, not necessarily the need to replicate the exact model and its features. More work is needed to adapt our approach and the model in other centers/EHRs and to explore augmenting the model with non-EHR data (eg, payer claims). Second, the ADI score relies on an accurate home address in the EHR, which may be inaccurate because of the inherent “messiness” of EHR data,18  and may not fully reflect individual-level social needs. Individual-level adverse social determinants of health (SDH; eg, poverty, limited transportation) and adverse social conditions associated with poor health (eg, housing instability) are social risk factors37  identified via validated SDH screening instruments38 ; Within EHR systems with universal SDH screening, these could be explored as incremental data model features that might offer a more nuanced assessment of social needs. Third, model predictions focused on risk for future hospitalization has its limitations. For example, our model predictions did not account for preventability of hospitalizations nor did it include health service use external to our center. Although acute care use is an important health service outcome and feasible to measure in EHRs, future adaptation of this data model should incorporate patient-/family-reported health outcomes. Finally, emphasis on diagnoses and encounters as model features updated monthly rather than real-time laboratory values and vital sign trends limited detection of acute health status changes. However, development of multi-morbidity with establishment of co-occurring chronic conditions occurs over years for many children with complex chronic conditions39 ; this incremental natural course is better aligned with our model’s framework.

The data model is now being implemented in a real-world primary care setting to test the feasibility and impact of using it as a tool for prioritizing CC services for CCHN. We are also expanding access to the model’s predictions and visualizations (online dashboard) so that clinicians and population health staff across our health system can use the model as a care management tool. Future studies could explore the impact of dynamic changes in hospitalization risk (eg, at what risk level should care intensity be titrated upward or downward or stopped; should risk scores from multiple past time points be considered when deciding on intervention delivery; and what are the patient-level impacts of these approaches?).

A novel EHR data-based predictive model was developed and validated as a CDS tool to identify CCHN at high risk for future hospitalization. The overall favorable performance of the predictive model as a CDS rule highlighted potential opportunities for an EHR data–based risk modeling approach to help inform future population-level clinical initiatives.

We wish to acknowledge staff at Duke Children’s Primary Care and Duke University Health System Population Health Management Office, including Barbara Donadio, RN, BSN; Mary McKeveny, RN, BSN; Lauren Goslin, MSW, LCSW; Reneé Amos, MSW, LCSW; Regina Virgle, RN, BSN; Jodi Wert, CHW; Marlyn Wells, NBC-HWC; and Atalaysha Churchwell, PhD, for partnership on reviewing care coordination clinical workflows that was essential for developing a predictive model aligned with routine practice.

FUNDING: All phases of this study were supported by the Duke AI Health and Duke Learning Health Units Program. Dr Ming’s contributions were supported in part by the National Institutes of Health, National Heart, Lung, and Blood Institute (K12HL13830). The content is solely the responsibility of the authors and does not necessarily represent the official views of the funding organizations. The funding organizations had no role in the design, preparation, review, or approval of this paper.

CONFLICT OF INTEREST DISCLOSURES: The authors have indicated they have no potential conflicts of interest to disclose.

All authors have contributed to the analysis, design and oversight, abstract conception and drafting, statistical analysis, and/or editorial review. Drs Ming and Chung conceptualized and designed the study, secured funding, drafted the initial manuscript, reviewed and interpreted data, and reviewed and revised the manuscript. Dr Goldstein and Mss Zhao, Tang, and Rogers collected data, conducted statistical analyses, reviewed and interpreted data, and reviewed and revised the manuscript. Dr Economou-Zavlanos and Mr Stirling reviewed and interpreted data and reviewed and revised the manuscript. All authors approve the final manuscript as submitted and agree to be accountable for all aspects of the work.

COMPANION PAPER: A companion to this article can be found online at www.hosppeds.org/cgi/doi/10.1542/hpeds.2023-007224.

1.
Blumenthal
D
,
Chernof
B
,
Fulmer
T
,
Lumpkin
J
,
Selberg
J
.
Caring for high-need, high-cost patients - an urgent priority
.
N Engl J Med
.
2016
;
375
(
10
):
909
911
2.
National Academy for State Health Policy
.
National care coordination standards for children and youth with special health care needs
.
3.
Bethell
CD
,
Blumberg
SJ
,
Stein
RE
,
Strickland
B
,
Robertson
J
,
Newacheck
PW
.
Taking stock of the CSHCN screener: a review of common questions and current reflections
.
Acad Pediatr
.
2015
;
15
(
2
):
165
176
4.
Cohen
E
,
Kuo
DZ
,
Agrawal
R
, et al
.
Children with medical complexity: an emerging population for clinical and research initiatives
.
Pediatrics
.
2011
;
127
(
3
):
529
538
5.
Sandhu
S
,
Ming
DY
,
Crew
C
, et al
.
Identifying priorities to improve the system of care for children with complex health needs in North Carolina: process and outcomes of systematic stakeholder engagement
.
Acad Pediatr
.
2022
;
22
(
6
):
1041
1048
6.
Greenberg
JO
,
Barnett
ML
,
Spinks
MA
,
Dudley
JC
,
Frolkis
JP
.
The “medical neighborhood”: integrating primary and specialty care for ambulatory patients
.
JAMA Intern Med
.
2014
;
174
(
3
):
454
457
7.
Council on Children with Disabilities and Medical Home Implementation Project Advisory Committee
.
Patient- and family-centered care coordination: a framework for integrating care for children and youth across multiple systems
.
Pediatrics
.
2014
;
133
(
5
):
e1451
e1460
8.
The Association of Maternal and Child Health Programs, National Academy for State Health P
.
Standards for systems of care for children and youth with special health care needs
.
9.
Cordeiro
A
,
Davis
RK
,
Antonelli
R
, et al
.
Care coordination for children and youth with special health care needs: national survey results
.
Clin Pediatr (Phila)
.
2018
;
57
(
12
):
1398
1408
10.
McLellan
SE
,
Mann
MY
,
Scott
JA
,
Brown
TW
.
A blueprint for change: guiding principles for a system of services for children and youth with special health care needs and their families
.
Pediatrics
.
2022
;
149
(
suppl 7
):
e2021056150C
11.
Berry
JG
,
Hall
M
,
Cohen
E
,
O’Neill
M
,
Feudtner
C
.
Ways to identify children with medical complexity and the importance of why
.
J Pediatr
.
2015
;
167
(
2
):
229
237
12.
Feudtner
C
,
Feinstein
JA
,
Zhong
W
,
Hall
M
,
Dai
D
.
Pediatric complex chronic conditions classification system version 2: updated for ICD-10 and complex medical technology dependence and transplantation
.
BMC Pediatr
.
2014
;
14
:
199
13.
Simon
TD
,
Cawthon
ML
,
Stanford
S
, et al
;
Center of Excellence on Quality of Care Measures for Children with Complex Needs (COE4CCN) Medical Complexity Working Group
.
Pediatric medical complexity algorithm: a new method to stratify children by medical complexity
.
Pediatrics
.
2014
;
133
(
6
):
e1647
e1654
14.
Reuland
CP
,
Collins
J
,
Chiang
L
, et al
.
Oregon’s approach to leveraging system-level data to guide a social determinants of health-informed approach to children’s healthcare
.
15.
Mosquera
RA
,
Avritscher
EB
,
Samuels
CL
, et al
.
Effect of an enhanced medical home on serious illness and cost of care among high-risk children with chronic illness: a randomized clinical trial
.
JAMA
.
2014
;
312
(
24
):
2640
2648
16.
Remington
PL
,
Catlin
BB
,
Gennuso
KP
.
The County Health Rankings: rationale and methods
.
Popul Health Metr
.
2015
;
13
:
11
17.
Goldstein
BA
,
Navar
AM
,
Pencina
MJ
,
Ioannidis
JP
.
Opportunities and challenges in developing risk prediction models with electronic health records data: a systematic review
.
J Am Med Inform Assoc
.
2017
;
24
(
1
):
198
208
18.
Goldstein
BA
,
Cerullo
M
,
Krishnamoorthy
V
, et al
.
Development and performance of a clinical decision support tool to inform resource utilization for elective operations
.
JAMA Netw Open
.
2020
;
3
(
11
):
e2023547
19.
Agency for Healthcare Research and Quality
.
Clinical decision support
.
20.
Sharma
V
,
Ali
I
,
van der Veer
S
,
Martin
G
,
Ainsworth
J
,
Augustine
T
.
Adoption of clinical risk prediction tools is limited by a lack of integration with electronic health records
.
BMJ Health Care Inform
.
2021
;
28
(
1
):
e100253
21.
Kind
AJH
,
Buckingham
WR
.
Making neighborhood-disadvantage metrics accessible - The Neighborhood Atlas
.
N Engl J Med
.
2018
;
378
(
26
):
2456
2458
22.
Stirling
A
,
Tubb
T
,
Reiff
ES
, et al
.
Identified themes of interactive visualizations overlayed onto EHR data: an example of improving birth center operating room efficiency
.
J Am Med Inform Assoc
.
2020
;
27
(
5
):
783
787
23.
Freund
Y
,
Schapire
RE
.
A decision-theoretic generalization of on-line learning and an application to boosting
.
J Comput Syst Sci
.
1997
;
55
(
1
):
119
139
24.
Dopp
AR
,
Parisi
KE
,
Munson
SA
,
Lyon
AR
.
Aligning implementation and user-centered design strategies to enhance the impact of health services: results from a concept mapping study
.
Implement Sci Commun
.
2020
;
1
:
17
25.
Lyon
AR
,
Koerner
K
.
User-centered design for psychosocial intervention development and implementation
.
Clin Psychol (New York)
.
2016
;
23
(
2
):
180
200
26.
Leary
JC
,
Price
LL
,
Scott
CER
,
Kent
D
,
Wong
JB
,
Freund
KM
.
Developing prediction models for 30-day unplanned readmission among children with medical complexity
.
Hosp Pediatr
.
2019
;
9
(
3
):
201
208
27.
Feudtner
C
,
Levin
JE
,
Srivastava
R
, et al
.
How well can hospital readmission be predicted in a cohort of hospitalized children? A retrospective, multicenter study
.
Pediatrics
.
2009
;
123
(
1
):
286
293
28.
Kansagara
D
,
Englander
H
,
Salanitro
A
, et al
.
Risk prediction models for hospital readmission: a systematic review
.
JAMA
.
2011
;
306
(
15
):
1688
1698
29.
Liu
VX
,
Bates
DW
,
Wiens
J
,
Shah
NH
.
The number needed to benefit: estimating the value of predictive analytics in healthcare
.
J Am Med Inform Assoc
.
2019
;
26
(
12
):
1655
1659
30.
Coller
RJ
,
Klitzner
TS
,
Lerner
CF
, et al
.
Complex care hospital use and postdischarge coaching: a randomized controlled trial
.
Pediatrics
.
2018
;
142
(
2
):
e20174278
31.
Pordes
E
,
Gordon
J
,
Sanders
LM
,
Cohen
E
.
Models of care delivery for children with medical complexity
.
Pediatrics
.
2018
;
141
(
Suppl 3
):
S212
S223
32.
Leyenaar
JK
,
Schaefer
AP
,
Freyleue
SD
, et al
.
Prevalence of children with medical complexity and associations with health care utilization and in-hospital mortality
.
JAMA Pediatr
.
2022
;
176
(
6
):
e220687
33.
Goldstein
BA
,
Carlson
D
,
Bhavsar
NA
.
Subject matter knowledge in the age of big data and machine learning
.
JAMA Netw Open
.
2018
;
1
(
4
):
e181568
34.
Schor
EL
,
Cohen
E
.
Apples and oranges: serious chronic illness in adults and children
.
J Pediatr
.
2016
;
179
:
256
258
35.
Krager
MK
,
Puls
HT
,
Bettenhausen
JL
, et al
.
The Child Opportunity Index 2.0 and hospitalizations for ambulatory care sensitive conditions
.
Pediatrics
.
2021
;
148
(
2
):
e2020032755
36.
Beck
AF
,
Huang
B
,
Wheeler
K
,
Lawson
NR
,
Kahn
RS
,
Riley
CL
.
The child opportunity index and disparities in pediatric asthma hospitalizations across one Ohio Metropolitan Area, 2011-2013
.
J Pediatr
.
2017
;
190
:
200
206.e1
37.
Alderwick
H
,
Gottlieb
LM
.
Meanings and misunderstandings: a social determinants of health lexicon for health care systems
.
Milbank Q
.
2019
;
97
(
2
):
407
419
38.
Sokol
R
,
Austin
A
,
Chandler
C
, et al
.
Screening children for social determinants of health: a systematic review
.
Pediatrics
.
2019
;
144
(
4
):
e20191622
39.
Thomson
J
,
Hall
M
,
Nelson
K
, et al
.
Timing of co-occurring chronic conditions in children with neurologic impairment
.
Pediatrics
.
2021
;
147
(
2
):
e2020009217

Supplementary data