Children with complex health needs (CCHN), also referred to as children with medical complexity, are at high risk of illness requiring care in the hospital. Though they do not comprise the majority of hospitalized children, they do account for about 40% of all pediatric hospital charges,1 13% to 20% are hospitalized each year,2,3 and 50% to 80% of spending for this high-cost population goes toward hospital care.3–5
The focus on hospitalization as a key outcome for CCHN is naturally rooted in their growing prevalence in children’s hospitals,1 the substantial cost and disruption that hospitalization imparts on families and health systems, and risks of healthcare-associated adverse events when hospitalized.6 Care models, such as complex care, care coordination, and other integrative services, aim to improve such outcomes, and programmatic investment may be offset by potential savings achieved from reduced hospital use.7 However, determining which patients would most benefit from these limited resources (and even how “benefit” is defined), remains an ongoing challenge resulting in a wide range of care models for this population.8
In this issue of Hospital Pediatrics, Ming et al offer a thought-provoking link across these concepts by developing an electronic health record (EHR) model to predict all-cause pediatric hospitalization within 6 months and proposing to assign care coordination services to those with greatest risk.9 Model variables predicting hospitalization included demographics, neighborhood socioeconomic disadvantage, diagnosis codes, and prior health service utilization. The researchers trained a gradient-boosting machine learning model on over 100 000 pediatric patients and demonstrated an Area Under the Receiver Operating Characteristics (AUROC) curve of 0.79 and Area under the Precision-Recall Curve of 0.13, with a high specificity and an average sensitivity of 25% and positive predictive value (PPV) of 19%. This model performed better than 2 other metrics resembling common complex care program eligibility criteria - prior high hospital utilization and the Pediatric Medical Complexity Algorithm. A proof-of-concept online dashboard provided clinicians with a snapshot view of patients’ admission risk scores and clinically relevant explanations behind model prediction.
At least 2 critical overarching questions are raised by this article. (1) How well can CCHN hospitalizations be predicted from EHR data? (2) How can such predictions be put to use?
The authors astutely recognize the challenges of prediction in this population. In our era of machine-learning models and Big Data, reporting sensitivity, specificity, and AUROC works well at the population level, however, can be challenging at the individual level where a measure of precision or PPV can be necessary for bedside use.10 The striking discordance between the AUROC and Area under the Precision-Recall Curve can be explained by the imbalance in the measured outcome: a ∼1% to 2% hospitalization rate in their cohort. In a hypothetical cohort of 100 000 patients, for example, much of the model’s accuracy comes from the >98 500 identified as low risk who will not be hospitalized. Among the 840 patients flagged as at risk for future hospitalization, 160 of those would truly be hospitalized, whereas 680 would not be hospitalized (false positives). Among those hospitalized, 481 would not be flagged as high risk (false negatives). The authors deserve recognition that this is as good or better than other published models predicting hospital admission.
How one views the strength of these numbers is greatly influenced by the model’s intended application. Depending on model goals, a low PPV can be acceptable. High stakes outcomes with low-cost, low-risk interventions, e.g., such as sepsis screening in the emergency department, can be beneficial.11 If many of the hospitalizations in the higher risk group identified by the model can be prevented through intervention, this could be valuable, even if it only represents a subset of all future CCHN hospitalizations. However, allocating scarce resources or high-intensity interventions typically requires a high degree of precision to maintain cost for scaling or sustaining. Moreover, if there are unanticipated adverse intervention effects (even if rare), then allocation to a group that screens positive but infrequently experiences the outcome because of a low screening threshold could become problematic. And if the data used in screening tools introduce bias into population identification or risk attribution, e.g., if patients from a racial group are less likely to have accurate or complete data inputs for an important variable, then eligibility to receive an intervention may inadvertently exacerbate health disparities.12
Part of the reason that greater precision in these predictions is so hard lies in limitations of EHR data for model inputs and outputs. It is becoming more apparent that the limited input variables found in an EHR snapshot may not fully capture important driving factors of hospitalization for pediatric patients, particularly those with complex needs.7 Family comfort, resources, and capacity, as well as social determinants of health are powerful predictors of admission or unanticipated readmission to the hospital.13,14 It is unfair to expect an EHR-based model to integrate the nuance of these relationships when data from such constructs are not routinely accessible currently.
How do we improve the Number Needed to Screen, and, by proxy, the Number Needed to Treat in EHR-based models? An incremental approach could focus on an enriched population first (e.g., chronic lung disease in the expremature infant, the patient with chronic ventilator support, or the patient with a severe traumatic brain injury). Narrowing the scope of the studied population may narrow reasons for admission sufficiently to allow deployment of increasingly precise models as modeling techniques become more refined. Granular data may also help by capturing variables, such as prescription drug usage patterns, clinic phone call or electronic communication to represent family concerns at home in real-time, and other quantitative measures of child health, complexity, and family capacity.15 Integration of EHR data and novel data from sensors, applications, text messages, and family report may improve model performance.16,17 Very few predictive models use longitudinal data, despite its ready availability in the EHR and potential to capture worrisome trends.18 Natural Language Processing is also an increasing focus in many realms of healthcare predictive modeling: keyword tagging and sentiment analysis would also fill in knowledge gaps from EHR data about a patient’s condition.19 However, there are many new challenges to address with added data complexity, including bias, missing data, curse of high-dimensionality (i.e., as the number of variables grow, the number of training examples needed to ensure coverage of the search space grows exponentially), and the often bizarre, nonlinear relationships between variables.18
Predicting future hospitalization risk has several potentially useful applications, and one use proposed by this study is the allocation of care coordination services. A gold standard definition for CCHN remains elusive,20 as do the optimal criteria for enrollment into care coordination programs.8 If a primary goal of a program is to prevent hospitalizations, then it is natural to tie services to this risk factor. However, when one starts with a general population and applies machine learning to identify those at high risk for hospitalization, assumptions in that approach include that the high-risk group represents those with complex health needs, much of the identified risk is modifiable, and those identified are ripe to benefit from care coordination. The authors do observe that about 90% of their high-risk group overlapped with the complex chronic disease Pediatric Medical Complexity Algorithm category, suggesting that this approach may indeed be identifying primarily CCHN. It is important to note that since hospitalization risk is not necessarily a stable construct,21 CCHN may dynamically enter and exit the high-risk group as clinical and social conditions change. A potential dilemma could emerge if service eligibility, therefore, inconsistently waxes and wanes for individuals across relatively short time horizons.
Presumably, beneficial outcomes of care coordination extend beyond hospital risk. Though unknown, the authors’ proposed approach begs the question whether starting with children at higher risk for hospitalization might also enhance gains beyond hospitalization. The opportunity cost from not enrolling children falling outside of the high-risk category would be useful to attempt to quantify in future research.
Although it is unknown how far EHR-data can take us toward reliable and valid outcome prediction, the ample opportunities to build on these early successes offer optimism to continue advancing novel uses of Big Data and analytics with this population. If we can reliably identify CCHN that benefit from specific interventions, we empower clinicians referring appropriate patients to appropriate resources and health systems investing in the resources to match those needs. Such a change will require incremental gains and expanded data sources. As this very interesting study illustrates, continued refinement of these techniques should aid us in optimizing the care of this vulnerable population.
Drs Munjal and Coller and Ms Fleischman drafted the commentary and reviewed it critically for important intellectual content, and all authors approved the final manuscript as submitted and agree to be accountable for all aspects of the work.
FUNDING: No external funding.
CONFLICT OF INTEREST DISCLOSURES: The authors have indicated they have no conflicts of interest relevant to this article to disclose.
COMPANION PAPER: A companion to this article can be found at www.hosppeds.org/cgi/doi/10.1542/hpeds.2022-006861.
Comments