The Surviving Sepsis Campaign International Guideline recommends systematic sepsis screening as standard of care for pediatric hospitals.1 The Children’s Hospital Association’s national Improving Pediatric Sepsis Outcomes collaborative supports the recommendation and highlights institutional screening practices at participating sites.2 In this edition of Hospital Pediatrics, Stephen et al present 2 companion articles describing a comprehensive approach to inpatient sepsis screening tailored to their institution’s electronic health record (EHR), patient population, and acute care system.3,4 The first article describes their prediction model development and prospective validation within their local acute care patient population. The authors use sound methodology incorporating available clinical data from the EHR to develop their prediction model. The second article describes the implementation of that prediction model as a clinical decision support tool within their acute care clinical setting using process improvement science.
These articles have several strengths, including satisfactorily addressing the following previously described concerns about sepsis screening.5 First, the authors developed and validated their prediction model before implementation using patient populations and clinical teams agnostic to the model. Second, they trained the model using local clinical data from hospitalized children rather than applying a sepsis prediction model developed in emergency medicine or adult settings. Third, they trained their model on an established intention-to-treat case definition6 with criteria distinct from covariates included in the model. Fourth, they implemented the model in their acute care environment as a tiered system with 2 distinct score thresholds triggering interventions of different magnitude, optimizing in situ sensitivity and specificity. Finally, they described their work using established reporting guidelines, TRIPOD for the model development and SQUIRE for the process improvement and implementation,7,8 to clearly communicate the rigor with which they performed the research. Their approach represents a major improvement over existing literature on sepsis screening in acute care settings by accounting for some of the complexity inherent to acute care screening systems.
In addition to its strengths, this article offers the opportunity to discuss a primary challenge with the application of prediction models as continuous screening tests in acute care inpatient settings: the conspicuous malalignment of reported prediction model test characteristics like sensitivity, area under the curve, and positive predictive value with the experience of hospitals postimplementation. Although many justify screening test use by citing impressive test characteristics, these values often have limited applicability to in vivo performance given the impossibility of mirroring complex acute care environments in a statistical model. There exists a tension between (1) limitations of many statistical approaches when faced with repeat measures and (2) the reality that clinicians may interact over time with multiple additional score values not considered in the analysis, depending on how scores display in the EHR. In a clinical environment where new EHR data populate frequently throughout the day, the sepsis score for an individual patient is continuously recalculated in vivo. However, the authors use a logistic regression model and define their explanatory variable as the maximum score calculated from the single most severe collection of covariates preceding the sepsis event. These covariates may have been collected from several distinct assessments over time, making it unlikely that this calculated value was ever displayed to the clinical team in vivo. This oversimplifies the data and analysis compared with what occurs in vivo, where a clinician’s decision-making is influenced by seeing numerous, potentially divergent, values in a short period (Fig 1). Panels B and C from Fig 1 show hypothetical scenarios in which statistical and in vivo score performance are likely to be similar. Panels D and E present scenarios in which clinical teams may interact with multiple sepsis alerts in vivo that are not tested in the validation data set. These observations may explain the discrepancies observed by many institutions between reported test characteristics and actual model performance in vivo burdened by high rates of false-positive screening. Consideration of statistical approaches, such as time-dependent cox proportional hazard models, that allow for time varying coefficients might lesson this tension.
When applying a retrospective study design using a single maximum sepsis score as the explanatory variable, each of these in vivo clinical scenarios would appear identical in the in situ data set.
When applying a retrospective study design using a single maximum sepsis score as the explanatory variable, each of these in vivo clinical scenarios would appear identical in the in situ data set.
Even with a statistical approach that likely overestimates the association between their sepsis score and sepsis cases, the authors report a positive predictive value between 5% and 14%, which translates to a number needed to alert of 7 to 20 to identify 1 sepsis case. Although they report high sensitivity and area under the curve, the positive predictive value may be the most important test characteristic when evaluating a model designed to run continuously over a period of days to identify a rare event. These low positive predictive values suggest their model often does not accurately identify patients with sepsis, risking alert fatigue and mistrust from frontline clinicians. Fortunately, the authors did not report substantial alert fatigue from their clinician survey data, possibly owing to their decision to apply a 48-hour alert suppression in vivo after each sepsis alert. This in vivo system does not align with their statistical design, and Fig 2 displays a hypothetical scenario where their prediction model would have correctly predicted the outcome in the data set while the clinical team interacted with different scores in vivo. Had they identified their implementation strategy a priori, the authors could have more closely approximated it statistically by defining their explanatory variable as the first sepsis score surpassing their alert threshold in the 48 hours preceding a sepsis event. This example again highlights the practical challenges and inherent statistical inaccuracies of prediction model development for continuous screening of hospitalized children.
This chart compares the authors’ statistical approach in the data set to a possible clinical application using their implementation strategy with a 48-hour alert suppression after initial alert.
This chart compares the authors’ statistical approach in the data set to a possible clinical application using their implementation strategy with a 48-hour alert suppression after initial alert.
It is important to note the authors did not demonstrate improved patient outcomes despite the rigor with which they approached their work. They did not observe special cause variation in their analysis of sepsis-attributable emergency transfers, a rare event. They also did not provide data on the overall incidence of sepsis cases, and any significant increase in nonsevere sepsis case incidence after implementation of the screening system could represent misdiagnosis with associated overtreatment. Combined with important practical and statistical concerns regarding the generalizability of their findings, their experience further demonstrates the challenges and enduring uncertainty regarding the utility of implementing these continuous screening tools when caring for hospitalized children.
Although we have highlighted some important limitations, we also applaud the authors for their excellent contribution to the evidence base surrounding sepsis screening of hospitalized children. By training the prediction model in the same clinical environment in which they implemented it, the authors avoided many limitations born from variable clinical cultures, practices, and EHR use between institutions. In doing so, they also developed a custom fit screening system with limited generalizability to other institutions where varied clinical culture, practices, and EHR use may irreconcilably change the covariates included in the model. Experiences at multiple institutions have demonstrated a “plug-and-play” approach fails, even with deliberate adaptation of outside tools to local health systems. With appreciation of the irrelevancy of in situ test characteristics for prediction model implementation in acute care settings as described above, we believe hospital systems, collaboratives, and EHR vendors must not view sepsis screening tools as 1-size-fits-all without standardization of clinical culture, practices, and EHR use across institutions. Recognizing the practical impossibility of this standardization, institutions aiming to apply clinical EHR data from acute care settings to sepsis screening systems must instead invest in (1) the research necessary to rigorously develop a homegrown prediction model trained with local clinical EHR data and (2) the process improvement effort necessary to maximize the value of the model within the local acute care system, including rigorous postimplementation monitoring of patient outcomes and intervention sustainability.
COMPANION PAPERS: Companions to this article can be found online at www.hosppeds.org/cgi/doi/10.1542/hpeds.2022-006964 and www.hosppeds.org/cgi/doi/10.1542/hpeds.2023-007218.
Drs Lockwood and Harrison both contributed to the drafting of this commentary representing their shared perspectives.
FUNDING: No external funding.
CONFLICT OF INTEREST DISCLOSURES: The authors have indicated they have no potential conflicts of interest to disclose.
Comments