For febrile infants, predictive models to detect bacterial infections are available, but clinical adoption remains limited by implementation barriers. There is a need for predictive models using widely available predictors. Thus, we previously derived 2 novel predictive models (machine learning and regression) by using demographic and clinical factors, plus urine studies. The objective of this study is to refine and externally validate the predictive models.
This is a cross-sectional study of infants initially evaluated at one pediatric emergency department from January 2011 to December 2018. Inclusion criteria were age 0 to 90 days, temperature ≥38°C, documented gestational age, and insurance type. To reduce potential biases, we derived models again by using derivation data without insurance status and tested the ability of the refined models to detect bacterial infections (ie, urinary tract infection, bacteremia, and meningitis) in the separate validation sample, calculating areas-under-the-receiver operating characteristic curve, sensitivities, and specificities.
Of 1419 febrile infants (median age 53 days, interquartile range = 32–69), 99 (7%) had a bacterial infection. Areas-under-the-receiver operating characteristic curve of machine learning and regression models were 0.92 (95% confidence interval [CI] 0.89–0.94) and 0.90 (0.86–0.93) compared with 0.95 (0.91–0.98) and 0.96 (0.94–0.98) in the derivation study. Sensitivities and specificities of machine learning and regression models were 98.0% (94.7%–100%) and 54.2% (51.5%–56.9%) and 96.0% (91.5%–99.1%) and 50.0% (47.4%–52.7%).
Compared with the derivation study, the machine learning and regression models performed similarly. Findings suggest a clinical-based model can estimate bacterial infection risk. Future studies should prospectively test the models and investigate strategies to optimize clinical adoption.
Background
Prompting >500 000 clinical encounters each year,1,2 fever in young infants is a common clinical sign that may indicate a bacterial infection, including urinary tract infection (UTI), bacteremia, and meningitis.3 If undiagnosed, these infections result in sepsis and death. However, detection in febrile infants is challenging because infants may not demonstrate signs of severe illness.4 For decades, researchers have developed predictive models to identify infants at low risk of bacterial infections, characterized by high sensitivities, to avoid missing infants with infections.5–9 To enhance prediction, researchers have incorporated sophisticated biomarkers, such as procalcitonin, into diagnostic algorithms.10–12 However, these models reveal wide variation in clinical adoption.13–15 This is likely due to implementation barriers, including the presence of multiple predictive models with different inclusion and exclusion criteria, clinician differences in risk tolerance/aversion, and challenges in obtaining predictor variables, such as serum biomarkers.16,17 In addition, most children seek care in community hospitals and outpatient clinics,18–20 with varying capabilities of performing onsite and timely testing with procalcitonin and other serum biomarkers.21 Thus, infants may receive disparate care related to available institutional resources.
To address these problems, we selected widely available predictors from previous research,5,6,9,10,13,22–25 including demographic and clinical factors, plus urine studies, to derive bacterial infection predictive models using regression and machine learning methods.26 We derived these models with 1 overarching goal: to guide clinicians’ decision-making for febrile infants ≤90 days of age in diverse settings, including emergency departments (EDs) and outpatient clinics. These predictive models were as sensitive, yet more specific, compared with the Rochester Low Risk model. As demonstrated previously,27 it is important to externally validate models to ensure they perform reliably across diverse settings to avoid missed infections before proceeding to prospective validation. External validation also afforded an opportunity for refinement, focusing on 1 specific feature. We initially included insurance status as a predictor in the derivation model because it is often used as a proxy for socioeconomic status.28 Our preliminary work,29 and other studies in children and adults,30–32 indicate that low socioeconomic status is associated with bacterial infection risk. However, insurance status holds a dual role in the United States because it can be a proxy for both socioeconomic status and access to care. In addition, it may exhibit different effects across states because of differences in Children’s Health Insurance Program eligibility and other local considerations. Thus, to reduce potential biases related to insurance status, the objective of this study was to both refine and externally validate the predictive models in a separate population.
Methods
Setting
We performed a retrospective cross-sectional study of all infants ≤90 days old with fever brought to 1 pediatric ED in the northeast from January 1, 2011 to December 31, 2018. This population is different and geographically distinct from the sample used for the derivation study. We obtained institutional review board approval for this study.
Study Design
Study team members used a standard protocol and manually reviewed a complete data extract of ED encounters to identify eligible infants. We iteratively updated the protocol and study personnel met weekly to ensure that abstraction techniques were consistent. After abstraction, 1 study team member randomly coded 10% of predictor variable data in a blinded fashion to calculate interrater reliability. We used unweighted Cohen’s κ, weighted Cohen’s κ, and intraclass correlation coefficients for categorical, ordinal, and continuous predictor variables, respectively. Inclusion criteria consisted of (1) age 0 to 90 days old, (2) fever (ie, temperature ≥38°C) within 6 hours of ED arrival or reported by the caregiver from a measurement before arrival but during the current illness, (3) documented insurance type (eg, private), (4) documented gestational age, and (5) initial evaluation in the ED.
Predictor Variables
We previously derived machine learning (ie, super learner) and regression predictive models to detect bacterial infections in young, febrile infants.26 To reduce potential biases related to insurance status, we used the derivation data to derive the super learner and regression models again, this time without insurance status. Thus, for this external validation study, we used 9 predictor variables for each refined model.26 Categorical predictors included sex, presence of a chronic medical condition (yes or no), appearance (well, ill, or not documented), cough status (yes, no, or unknown), and urinary tract inflammation (yes, no, or not ordered), defined as ≥5 white blood cells/high power field of unspun urine or positive (≥trace) leukocyte esterase.24 Two authors reviewed all records and categorized chronic medical conditions as present or absent on the basis of the likelihood that the condition could be related to a bacterial infection.26 We calculated interrater reliability and discussed discrepant chronic medical conditions until we reached a consensus. To distinguish ill from well appearance, we used keywords such as toxic, limp, inconsolable, ill-appearing, listless, lethargic, irritable, and unresponsive.26,33 A “Review of Systems” template, which noted the presence or absence of cough, was consistently completed in the ED per standard practice. Ordinal variables included age (days), caregiver report of gestational age at birth (weeks), and illness duration (any symptom) in days. We used 37.5 weeks for infants recorded only as “full term.”
Outcome
The primary outcome was bacterial infection (UTI, bacteremia, or meningitis). We defined UTI as (1) a catheterized urine specimen that grew ≥10 000 CFU/mL of a pathogenic organism, (2) urinary tract inflammation (ie, ≥5 white blood cells/high power field of unspun urine or positive [≥trace] leukocyte esterase),5,10,11,34 and (3) clinical management as a pathogen.10,11,24,26 We defined bacteremia and meningitis as growth of a single pathogenic organism from blood and/or cerebrospinal fluid cultures that was treated clinically as a pathogen.26 On the basis of a combination of a priori categorization35 and review of the clinical course, two authors (a pediatric hospitalist and a pediatric infectious diseases physician) designated ambiguous culture results as pathogens or contaminants. We tracked infants for missed bacterial infections within 7 days of the index encounter through the electronic health record, which captures data from the other major health system.
Analysis
We used descriptive statistics to characterize the demographic and clinical characteristics of the sample. We tested the refined regression and super learner models without insurance status as a predictor, applying them to detect bacterial infections in this distinct sample. A super learner model is an ensemble learning method that combines several machine learning algorithms to produce a single predictive model that is superior to each individual algorithm.36 We used the “SuperLearner” R package (R Foundation for Statistical Computing, Vienna, Austria), which includes random forest, earth, generalized additive models, and a generalized linear model.26 The output of each model is a bacterial infection risk estimate. We determined the area-under-the-receiver operating characteristic curve (AUC) for each model and used prespecified risk estimate thresholds (0.005, 0.01, 0.03, and 0.05) to calculate sensitivity, specificity, positive predictive value (PPV), negative predictive value (NPV), and likelihood ratios. To further illustrate model performance, we calculated F1 scores because they represent the harmonic mean between precision (positive predictive value) and recall (sensitivity).37
Given the challenge of detecting bacterial infections and the varied approaches used to manage febrile infants, we performed exploratory analyses. First, because infants with invasive bacterial infections (IBIs) (ie, bacteremia and/or bacterial meningitis) are more likely to experience adverse outcomes,38 we assessed each model’s ability to detect IBIs. Second, to align with the new Clinical Practice Guideline from the American Academy of Pediatrics and other studies,11,39,40 we evaluated each model’s performance in full-term, well-appearing infants aged 8 to 60 days. Third, we calculated diagnostic characteristics for well-appearing infants because previous models have excluded ill-appearing infants.12,39 Fourth, because neonatal infants with bacterial infections do not reliably appear ill10 and are often considered “not low risk”, we applied the super learner model to infants > 30 days of age. We evaluated the overall performance of this 31- to 90-day approach by hard-coding all infants ≤30 days old as not low-risk and used the regression and machine learning models to detect bacterial infections in this older group. We then calculated diagnostic characteristics of the entire sample. Last, because leukocyte esterase is a sensitive urinary biomarker to detect UTIs that can be done quickly and is readily available,41 we repeated the derivation and validation studies with leukocyte esterase as the sole marker of urinary tract inflammation. We used “R” to perform the statistical analysis. Data and code are available at: https://github.com/jmiahjones/pediatric-bi-val-2021.
Results
Overall, 17 042 infants ≤90 days old were evaluated in the ED from 2011 to 2018 (Fig 1) and 1419 infants met inclusion criteria (median age 53 days [interquartile range (IQR) = 32–69]) (Table 1). Characteristics of the derivation sample are also shown in Table 1. The Cohen’s κ statistic for designating chronic medical conditions was 0.85, and we reached a consensus on the final designation of each condition (Supplemental Table 4). For manually abstracted predictors, interrater reliability scores ranged from 0.96 to 0.99. Of the 99 (7%) infants with bacterial infections, 22 (1.6%) had an IBI. Escherichia coli and Streptococcus agalactiae were the most common organisms isolated (Supplemental Table 5).
Flow diagram of included and excluded infants. ED, emergency department.
Subject Characteristics
. | Validation Study, n = 1419 . | Derivation Study, n = 877 . |
---|---|---|
Demographic characteristics | ||
Median age, d (IQR) | 53 (32–69) | 57 (38–73) |
Male | 717 (51) | 500 (57) |
Public insurance | 785 (55) | 801 (91) |
Race/ethnicity | ||
White | 910 (64) | 109 (12) |
Black | 318 (22) | 326 (37) |
Hispanic | 124 (9) | 424 (48) |
Clinical characteristics | — | — |
Maximum temperature, °C, median (IQR) | 38.5 (38.2–38.9) | 38.4 (38.2–38.9) |
Full-term | 1301 (92) | 801 (91) |
Chronic medical condition | 80 (6) | 29 (3) |
Ill-appearing | 252 (18) | 75 (9) |
Cough present | 611 (43) | 439 (50) |
Duration of symptoms, d, median (IQR) | 1 (0–2) | 1 (0–2) |
Lumbar puncture performed | 575 (41) | 313 (36) |
Hospitalized | 777 (55) | 549 (63) |
Outcome | — | — |
Bacterial infection | 99 (7) | 67 (8) |
Invasive bacterial infection | 22 (1.6) | 17 (1.9) |
. | Validation Study, n = 1419 . | Derivation Study, n = 877 . |
---|---|---|
Demographic characteristics | ||
Median age, d (IQR) | 53 (32–69) | 57 (38–73) |
Male | 717 (51) | 500 (57) |
Public insurance | 785 (55) | 801 (91) |
Race/ethnicity | ||
White | 910 (64) | 109 (12) |
Black | 318 (22) | 326 (37) |
Hispanic | 124 (9) | 424 (48) |
Clinical characteristics | — | — |
Maximum temperature, °C, median (IQR) | 38.5 (38.2–38.9) | 38.4 (38.2–38.9) |
Full-term | 1301 (92) | 801 (91) |
Chronic medical condition | 80 (6) | 29 (3) |
Ill-appearing | 252 (18) | 75 (9) |
Cough present | 611 (43) | 439 (50) |
Duration of symptoms, d, median (IQR) | 1 (0–2) | 1 (0–2) |
Lumbar puncture performed | 575 (41) | 313 (36) |
Hospitalized | 777 (55) | 549 (63) |
Outcome | — | — |
Bacterial infection | 99 (7) | 67 (8) |
Invasive bacterial infection | 22 (1.6) | 17 (1.9) |
Data presented as n (%) unless otherwise noted. IQR, interquartile range.
The AUCs of the super learner and regression models in detecting bacterial infections were 0.92 (95% confidence interval [CI] 0.89–0.94) and 0.90 (0.86–0.93) (Supplemental Fig 2) compared with 0.96 (0.94–0.98) and 0.95 (0.91–0.98) in the derivation study.26 Table 2 reveals the performance of the super learner and regression models at each designated bacterial infection risk estimate. For example, the sensitivity and specificity of the super learner model using a bacterial infection risk estimate of 0.005 was 98.0% (94.7%–100%) and 54.2% (51.5%–56.9%).
Performance of Super Learner and Regression Models to Detect Bacterial Infections
. | F1 Score . | Sensitivity (95% CI) . | Specificity (95% CI) . | PPV (95% CI) . | NPV (95% CI) . | + LR (95% CI) . | − LR (95% CI) . |
---|---|---|---|---|---|---|---|
SuperLearner | |||||||
Risk estimate = 0.005 | 24.2 | 0.980 (0.947–1) | 0.542 (0.515–0.569) | 0.138 (0.113–0.166) | 0.997 (0.993–1) | 2.14 (2.01–2.28) | 0.04 (0–0.10) |
Risk estimate = 0.01 | 28.9 | 0.960 (0.916–0.991) | 0.648 (0.622–0.674) | 0.170 (0.140–0.204) | 0.995 (0.990–0.999) | 2.72 (2.51–2.96) | 0.06 (0.01–0.13) |
Risk estimate = 0.03 | 34.5 | 0.949 (0.903–0.990) | 0.733 (0.710–0.757) | 0.211 (0.175–0.253) | 0.995 (0.990–0.999) | 3.56 (3.24–3.95) | 0.07 (0.01–0.13) |
Risk estimate = 0.05 | 36.4 | 0.939 (0.888–0.981) | 0.759 (0.736–0.783) | 0.226 (0.187–0.270) | 0.994 (0.989–0.998) | 3.90 (3.50–4.36) | 0.08 (0.02–0.15) |
Regression | |||||||
Risk estimate = 0.005 | 22.3 | 0.960 (0.915–0.991) | 0.500 (0.474–0.527) | 0.126 (0.102–0.193) | 0.994 (0.987–0.999) | 1.92 (1.79–2.05) | 0.04 (0–0.10) |
Risk estimate = 0.01 | 27.4 | 0.929 (0.870–0.977) | 0.636 (0.611–0.661) | 0.161 (0.132–0.193) | 0.992 (0.985–0.998) | 2.55 (2.32–2.78) | 0.11 (0.04–0.20) |
Risk estimate = 0.03 | 36.0 | 0.919 (0.858–0.970) | 0.761 (0.738–0.784) | 0.224 (0.183–0.268) | 0.992 (0.986–0.997) | 3.84 (3.43–4.30) | 0.11 (0.04–0.19) |
Risk estimate = 0.05 | 40.3 | 0.899 (0.835–0.955) | 0.808 (0.787–0.829) | 0.260 (0.214–0.310) | 0.991 (0.984–0.996) | 4.69 (4.15–5.35) | 0.13 (0.06–0.20) |
. | F1 Score . | Sensitivity (95% CI) . | Specificity (95% CI) . | PPV (95% CI) . | NPV (95% CI) . | + LR (95% CI) . | − LR (95% CI) . |
---|---|---|---|---|---|---|---|
SuperLearner | |||||||
Risk estimate = 0.005 | 24.2 | 0.980 (0.947–1) | 0.542 (0.515–0.569) | 0.138 (0.113–0.166) | 0.997 (0.993–1) | 2.14 (2.01–2.28) | 0.04 (0–0.10) |
Risk estimate = 0.01 | 28.9 | 0.960 (0.916–0.991) | 0.648 (0.622–0.674) | 0.170 (0.140–0.204) | 0.995 (0.990–0.999) | 2.72 (2.51–2.96) | 0.06 (0.01–0.13) |
Risk estimate = 0.03 | 34.5 | 0.949 (0.903–0.990) | 0.733 (0.710–0.757) | 0.211 (0.175–0.253) | 0.995 (0.990–0.999) | 3.56 (3.24–3.95) | 0.07 (0.01–0.13) |
Risk estimate = 0.05 | 36.4 | 0.939 (0.888–0.981) | 0.759 (0.736–0.783) | 0.226 (0.187–0.270) | 0.994 (0.989–0.998) | 3.90 (3.50–4.36) | 0.08 (0.02–0.15) |
Regression | |||||||
Risk estimate = 0.005 | 22.3 | 0.960 (0.915–0.991) | 0.500 (0.474–0.527) | 0.126 (0.102–0.193) | 0.994 (0.987–0.999) | 1.92 (1.79–2.05) | 0.04 (0–0.10) |
Risk estimate = 0.01 | 27.4 | 0.929 (0.870–0.977) | 0.636 (0.611–0.661) | 0.161 (0.132–0.193) | 0.992 (0.985–0.998) | 2.55 (2.32–2.78) | 0.11 (0.04–0.20) |
Risk estimate = 0.03 | 36.0 | 0.919 (0.858–0.970) | 0.761 (0.738–0.784) | 0.224 (0.183–0.268) | 0.992 (0.986–0.997) | 3.84 (3.43–4.30) | 0.11 (0.04–0.19) |
Risk estimate = 0.05 | 40.3 | 0.899 (0.835–0.955) | 0.808 (0.787–0.829) | 0.260 (0.214–0.310) | 0.991 (0.984–0.996) | 4.69 (4.15–5.35) | 0.13 (0.06–0.20) |
+ LR, positive likelihood ratio, − LR, negative likelihood ratio. F1 Score = (2 × PPV × Sensitivity)/(PPV + Sensitivity).
For the exploratory analyses, estimates of sensitivities for the super learner model were greater than the regression model. Thus, we report findings for the super learner model here and in Table 3. Regression model results can be found in Supplemental Table 6. For the outcome of IBIs using the super learner model, we observed a sensitivity of 90.9% (77.3%–100%) and specificity of 51.2% (48.6%–53.8%). For the remaining exploratory analyses, sensitivities and specificities were at least 96% and 48%, respectively, to detect bacterial infections (Table 3). For IBIs, the 31- to 90-day approach revealed a sensitivity and specificity of 95.5% (84.2%–100%) and 43.3% (40.8%–45.9%), respectively (Table 3). Using a bacterial infection risk estimate of 0.005, the regression model misclassified 3 infants with IBIs as low-risk (0.5%), whereas the super learner model and the 31- to 90-day approach (using the super learner model) misclassified 2 (0.3%) and 1 (0.2%) infants with IBIs as low-risk, respectively (Supplemental Table 7). Characteristics of misclassified infants are shown in Supplemental Table 8.
Results of Exploratory Analyses, Super Learner Model
. | Sensitivity (95% CI) . | Specificity (95% CI) . | PPV (95% CI) . | NPV (95% CI) . | + LR (95% CI) . | − LR (95% CI) . |
---|---|---|---|---|---|---|
Entire sample | ||||||
IBIs | 0.909 (0.773–1) | 0.512 (0.486–0.538) | 0.028 (0.017–0.041) | 0.997 (0.993–1) | 1.86 (1.56–2.10) | 0.18 (0–0.45) |
Infants 8–60 d old, full-term, well-appearing | ||||||
Bacterial infections | 0.967 (0.917–1) | 0.507 (0.471–0.543) | 0.138 (0.107–0.172) | 0.995 (0.987–1) | 1.96 (1.81–2.13) | 0.07 (0–0.16) |
IBIs | 0.882 (0.720–1) | 0.479 (0.444–0.514) | 0.035 (0.019–0.054) | 0.995 (0.987–1) | 1.69 (1.37–1.97) | 0.25 (0–0.59) |
Well-appearing infants | ||||||
Bacterial infections | 0.976 (0.936–1) | 0.659 (0.630–0.686) | 0.178 (0.144–0.215) | 0.997 (0.993–1) | 2.86 (2.62–3.14) | 0.04 (0–0.10) |
IBIs | 0.846 (0.615–1) | 0.620 (0.591–0.648) | 0.024 (0.011–0.039) | 0.997 (0.993–1) | 2.24 (1.59–2.72) | 0.25 (0.00–0.63) |
31- to 90-day approach | ||||||
Bacterial infections | 0.980 (0.949–1) | 0.583 (0.557–0.609) | 0.150 (0.123–0.180) | 0.997 (0.993–1) | 2.35 (2.19–2.52) | 0.04 (0.00–0.09) |
IBIs | 0.955 (0.842–1) | 0.433 (0.408–0.459) | 0.026 (0.016–0.037) | 0.998 (0.995–1) | 1.68 (1.48–1.83) | 0.11 (0.00–0.36) |
Leukocyte esterase* | ||||||
Bacterial infections | 0.960 (0.915–0.991) | 0.484 (0.457–0.511) | 0.122 (0.100–0.147) | 0.994 (0.987–0.999) | 1.86 (1.74–1.99) | 0.08 (0.02–0.18) |
IBIs | 0.909 (0.773–1) | 0.459 (0.433–0.485) | 0.026 (0.015–0.037) | 0.997 (0.992–1) | 1.68 (1.41–1.89) | 0.20 (0–0.50) |
. | Sensitivity (95% CI) . | Specificity (95% CI) . | PPV (95% CI) . | NPV (95% CI) . | + LR (95% CI) . | − LR (95% CI) . |
---|---|---|---|---|---|---|
Entire sample | ||||||
IBIs | 0.909 (0.773–1) | 0.512 (0.486–0.538) | 0.028 (0.017–0.041) | 0.997 (0.993–1) | 1.86 (1.56–2.10) | 0.18 (0–0.45) |
Infants 8–60 d old, full-term, well-appearing | ||||||
Bacterial infections | 0.967 (0.917–1) | 0.507 (0.471–0.543) | 0.138 (0.107–0.172) | 0.995 (0.987–1) | 1.96 (1.81–2.13) | 0.07 (0–0.16) |
IBIs | 0.882 (0.720–1) | 0.479 (0.444–0.514) | 0.035 (0.019–0.054) | 0.995 (0.987–1) | 1.69 (1.37–1.97) | 0.25 (0–0.59) |
Well-appearing infants | ||||||
Bacterial infections | 0.976 (0.936–1) | 0.659 (0.630–0.686) | 0.178 (0.144–0.215) | 0.997 (0.993–1) | 2.86 (2.62–3.14) | 0.04 (0–0.10) |
IBIs | 0.846 (0.615–1) | 0.620 (0.591–0.648) | 0.024 (0.011–0.039) | 0.997 (0.993–1) | 2.24 (1.59–2.72) | 0.25 (0.00–0.63) |
31- to 90-day approach | ||||||
Bacterial infections | 0.980 (0.949–1) | 0.583 (0.557–0.609) | 0.150 (0.123–0.180) | 0.997 (0.993–1) | 2.35 (2.19–2.52) | 0.04 (0.00–0.09) |
IBIs | 0.955 (0.842–1) | 0.433 (0.408–0.459) | 0.026 (0.016–0.037) | 0.998 (0.995–1) | 1.68 (1.48–1.83) | 0.11 (0.00–0.36) |
Leukocyte esterase* | ||||||
Bacterial infections | 0.960 (0.915–0.991) | 0.484 (0.457–0.511) | 0.122 (0.100–0.147) | 0.994 (0.987–0.999) | 1.86 (1.74–1.99) | 0.08 (0.02–0.18) |
IBIs | 0.909 (0.773–1) | 0.459 (0.433–0.485) | 0.026 (0.015–0.037) | 0.997 (0.992–1) | 1.68 (1.41–1.89) | 0.20 (0–0.50) |
Diagnostic characteristics are shown using a bacterial infection risk estimate threshold of 0.005 except for the 31- to 90-day approach (bacterial infections), which were obtained by using a cut-point of 0.03. + LR, positive likelihood ratio, − LR, negative likelihood ratio, IBIs, invasive bacterial infections (bacteremia, bacterial meningitis).
This model used leukocyte esterase alone as a marker of urinary tract inflammation rather than pyuria OR leukocyte esterase.
Two infants were diagnosed with bacterial infections within 7 days of the index encounter, both with urinary tract infections (Escherichia coli). One infant did not have a urinalysis performed at the initial encounter and the other infant had a normal urinalysis at the index encounter. No infants were diagnosed with IBIs within 7 days of the initial encounter. Only 2.5% of predictor variables were coded as “unknown” (cough status) or “not ordered” (urinalysis). There were no missing data.
Discussion
Findings in this validation study reveal that AUCs of the super learner and regression models were similar to results from the derivation study, even after removing insurance status as a predictor variable. This was important to reduce potential sources of bias related to unclear and likely varied effects of insurance status in different locations. Point estimates of the super learner model were greater, likely because, as an ensemble method, the super learner model is able to use a collection of machine learning algorithms to analyze data in linear and nonlinear manners, increasing the chances of accurate classification and generalizability.36,42,43 Results suggest a clinical-based predictive model can accurately estimate bacterial infection risk in young infants with fever. Compared with the derivation study, the validation sample was composed of more ill-appearing infants and lower proportions of Hispanic infants and infants with public insurance,26 demonstrating that it performs well across a range of settings and populations.
Compared with previous serum biomarker models,5–12,39 our model revealed similar sensitivities and specificities to detect bacterial infections and IBIs for the entire sample, subsets of full-term, well-appearing infants 8 to 60 days old and well-appearing infants 0 to 90 days old. Similarly, the 31- to 90-day approach revealed high sensitivity and specificity in detecting bacterial infections and IBIs while avoiding missed infections among infants in the first month of life. Additionally, findings suggest that leukocyte esterase alone can be used as a marker of urinary tract inflammation. Prospective investigation is needed to confirm the clinical-based model can accurately detect bacterial infections and IBIs in febrile infants and to directly compare findings with current models.
Rather than identifying febrile infants at low risk for a bacterial infection, we sought to provide guidance to clinicians in any setting faced with evaluating and managing any young, febrile infant. When prospectively validated in an appropriately powered sample, this approach is positioned to reduce barriers related to clinical adoption and address possible inequities by increasing access to highly sensitive predictive models in resource-limited settings. This paradigm shift is characterized by a few key features, which are highlighted below.
Previous studies have excluded ill-appearing and premature infants,11 focusing on a narrower age range (ie, ≤60 days old). This approach implies that all ill-appearing and premature infants are not low-risk for bacterial infections and merit full evaluation, including blood, urine, and cerebrospinal fluid cultures. Although bacterial infection risk may be as high as 25% in ill-appearing infants,10 there may be a subgroup of infants with intermediate or low risk. This designation change could alter management (eg, a subset of infants could be safely monitored in a hospital without antimicrobial therapy). By including ill-appearing infants, our clinical-based model is well-positioned to provide clinicians with a more personalized management strategy.
Even accurate predictive models reveal limited clinical adoption, often related to unanticipated implementation barriers.16,17 Barriers typically refer to the ease or difficulty with which predictor variables can be obtained and are frequently measured in terms of time needed for collection, financial costs, and pain or stress of the patient or caregiver.44 Current models to detect bacterial infections in febrile infants include at least 1 serum biomarker,5–12,39,40 which requires time for venipuncture and to obtain, deliver, and process the specimen, ranging from minutes to hours and possibly delaying care.45,46
To improve clinical adoption, we have omitted serum biomarkers, which may reduce the time to reach a decision. During encounters with febrile infants, clinicians could obtain a urinalysis from a bag specimen,40 placed during triage, with reflex catheterization to obtain cultures for samples with evidence of urinary tract inflammation. Simultaneously, clinicians could quickly obtain other clinical history variables and assess the infant’s appearance. Once urinalysis results are available, clinicians could simply enter predictor information into the web-based risk calculator47 and obtain an accurate, highly sensitive bacterial infection risk estimate to inform clinical decisions. Other than an internet connection and urinalysis, no other resources would be needed, and invasive, painful procedures could be safely avoided for a subset of low-risk infants. The novel predictive model may be particularly relevant for outpatient clinic settings, as it could facilitate decision making about ED referral. When prospectively validated in an appropriately powered sample, clinicians could use the risk calculator to guide the management of any young, febrile infant. In addition, our exploratory analysis suggests that urine dipstick results, specifically leukocyte esterase, can be used as a predictor to detect bacterial infections with high sensitivity. Although future studies should investigate the extent to which this approach should be used, this finding is promising because the clinical-based model may reduce wait times and could represent an important decision support tool in resource-limited settings.
Each low-risk predictive model uses different inclusion/exclusion criteria, predictors, and thresholds,5,6,8,10,11 increasing clinicians’ cognitive load in choosing and interpreting a predictive model. In contrast, our clinical-based approach uses inclusion criteria that are simple and intuitive (ie, age 0–90 days; temperature ≥38°C) and uses readily available predictors without the need for invasive and possibly challenging venipuncture or lumbar puncture procedures; there are also no low-risk threshold values for clinicians to remember. Our clinical-based approach considers all predictors and the entire spectrum of values.
Every predictive model will miss a small proportion of bacterial infections. Similar to other studies,10 the super learner model missed one 11-day-old infant with meningitis, indicating that, if the objective is to avoid all missed cases of meningitis, it may be prudent to consider the 31- to 90-day approach for which there were 0 missed cases of meningitis. Clinicians’ preferences vary along the spectrum of risk tolerance to risk aversion. Additionally, when encountered with an unfamiliar, yet potentially fatal disease, caregivers may be more likely to agree to a clinician’s preferences without a clear understanding of the possible harms and benefits of management options. Although it has remained elusive so far, the super learner model holds promise as an important tool in a shared decision-making approach. Instead of evaluating individual predictors in isolation, the super learner model considers all predictor variables simultaneously, just as clinicians do. Thus, no single predictor would automatically merit not low-risk status, and predictor variable threshold values would be unnecessary. The output of the super learner model is a bacterial infection risk estimate, allowing clinician–caregiver dyads to modify risk-estimate thresholds (and therefore sensitivities/specificities) on the basis of their collective risk tolerance/risk aversion preferences.
Limitations include data that were collected retrospectively from a single site and based on clinician documentation. Some variables may be recorded inaccurately. However, we employed a rigorous process of manual data abstraction to minimize discrepancies and observed high interrater reliability, and there were no missing data. Additionally, results are similar to those obtained in the derivation study, which was performed at a separate site with a different electronic health record vendor, suggesting that biases in documentation may be low. Moreover, although we used the medical literature to select predictor variables, there may be included factors that are not important and other excluded factors that may improve prediction. Additional investigation may be needed to identify the optimal set of predictor variables. Next, we used a lower colony count (10 000 CFU/mL) to define UTI compared with other studies.8,48 We did so because urine culture results at the site for the derivation study did not specify growth from 10 000 to 100 000 CFU/mL. Instead, we used an evidence-based alternative of 10 000 CFU/mL plus urinary tract inflammation to avoid misclassifying cases of asymptomatic bacteriuria.10,11,24,26 Future studies should examine whether diagnostic characteristics differ using a UTI definition with a higher colony count. Last, the models, including the exploratory analyses, should not be used for clinical purposes yet and require additional prospective evaluation.
In this retrospective validation study, the refined super learner and regression models performed similarly relative to the derivation study. This study provides additional evidence that our clinical-based approach can accurately estimate bacterial infection risk in young, febrile infants. In addition, the performance of the super learner model is similar to previously published predictive models that use serum biomarkers.5–12,39 The super learner model is well-positioned to achieve high rates of clinical adoption and may have important implications related to equity across sites of care, safely avoiding unnecessary interventions, and promoting a shared decision-making approach. Clinical- and biomarker-based models (ie, American Academy of Pediatrics Clinical Practice Guidelines) may complement each other because there may be circumstances in which a biomarker-based predictive model is preferred, and other situations may be better suited for a clinical-based model. Additional investigation is needed to determine if the clinical-based model could enhance current biomarker-based models and the extent to which the 2 model types could be used in concert, perhaps as a tiered approach. Future studies should further refine predictor variables, prospectively test the super learner model and the 31- to 90-day approach to detect bacterial infections and IBIs, examine predictive capabilities in infants ≤60 days old, and investigate methods to optimize clinical adoption.
Acknowledgments
We thank Courtney Richfield, BS, MA, of the University of Rochester, Department of Pediatrics, for her assistance in formatting, proofing, and processing this manuscript. Special thanks to Dr. Lauren Solan, MD, Med, Associate Professor of Pediatrics, University of Rochester, Rochester, New York, and Dr. Tina Sosa, MD, MSc, Assistant Professor of Pediatrics, University of Rochester, Rochester, New York, for their thoughtful suggestions and revisions.
Dr Yaeger conceptualized and designed the study, collected data, drafted the initial manuscript, and revised the manuscript; Mr Jones and Dr Ertefaie conducted the initial and final analyses and reviewed and revised the manuscript; Drs Caserta, van Wijngaarden, and Fiscella designed the study, assisted with analysis, and reviewed and revised the manuscript; and all authors approve the final manuscript as submitted and agree to be accountable for all aspects of the work.
FUNDING: The project described in this publication was supported by the University of Rochester CTSA award number KL2 TR001999 from the National Center for Advancing Translational Sciences of the National Institutes of Health. The funder/sponsor did not participate in any aspect of the work. Funded by the National Institutes of Health (NIH).
CONFLICT OF INTEREST DISCLOSURE: The authors have indicated they have no potential conflicts of interest relevant to this article to disclose.
Comments