BACKGROUND:

Recent decision rules for the management of febrile infants support the identification of infants at higher risk of serious bacterial infections (SBIs) without the performance of routine lumbar puncture. We derive and validate a model to identify febrile infants ≤60 days of age at low risk for SBIs using supervised machine learning approaches.

METHODS:

We conducted a secondary analysis of a multicenter prospective study performed between December 2008 and May 2013 of febrile infants. Our outcome was SBI, (culture-positive urinary tract infection, bacteremia, and/or bacterial meningitis). We developed and validated 4 supervised learning models: logistic regression, random forest, support vector machine, and a single-hidden layer neural network.

RESULTS:

A total of 1470 patients were included (1014 >28 days old). One hundred thirty-eight (9.3%) had SBIs (122 urinary tract infections, 20 bacteremia, and 8 meningitis; 11 with concurrent SBIs). Using 4 features (urinalysis, white blood cell count, absolute neutrophil count, and procalcitonin), we demonstrated with the random forest model the highest specificity (74.9, 95% confidence interval: 71.5%–78.2%) with a sensitivity of 98.6% (95% confidence interval: 92.2%–100.0%) in the validation cohort. One patient with bacteremia was misclassified. Among 1240 patients who received a lumbar puncture, this model could have prevented 849 (68.5%) such procedures.

CONCLUSIONS:

We derived and internally validated a supervised learning model for the risk-stratification of febrile infants. Although computationally complex, lacking parameter cutoffs, and in need of external validation, this strategy may allow for reductions in unnecessary procedures, hospitalizations, and antibiotics while maintaining excellent sensitivity.

What’s Known on This Subject:

Decision rules using serum procalcitonin for the risk-stratification of well-appearing febrile infants ≤60 days of age for serious bacterial infection and have been described. The diagnostic value of machine learning models for this purpose has not been reported.

What This Study Adds:

We evaluated machine learning algorithms for the identification of serious bacterial infections from a multicenter prospective study. The random forest model demonstrated the greatest accuracy, with a sensitivity of 99% and specificity of 75% in an internally validated cohort.

The emergency department (ED) management of febrile infants ≤60 days of age remains an area of ongoing investigation. Decision rules prioritize high sensitivity in the risk-stratification of infants for serious bacterial infections (SBIs; urinary tract infection [UTI], bacterial meningitis, and/or bacteremia).15  Recent rules report sensitivities >90% and specificities between 30% and 60%, without routine use of lumbar puncture.68  Some recent strategies use procalcitonin, a biomarker with high diagnostic accuracy for SBI.9,10 

Although 10% of febrile infants ≤60 days of age have an SBI,7,1113  a large proportion without SBI are classified as false-positives by using past decision rules,13,68  leading to unnecessary lumbar punctures, hospitalizations, antimicrobial use, and parental anxiety.14,15  Historic criteria were developed by expert opinion.15,8  Some recent rules were developed by using methods such as logistic regression or recursive partitioning.6,7  The characteristics of machine learning–based models to classify infants with SBI remain unexplored. Supervised learning, in which computer algorithms are used to create models to assign input parameters toward a preassigned outcome, is one commonly performed application of machine learning.16  In pediatrics, such approaches have been used in several arenas, including in the classification of patients with appendicitis17  and in the ED triage of children.18  Machine learning approaches may demonstrate superior classification accuracy compared with conventional statistical techniques, as demonstrated in one investigation evaluating pediatric patients with traumatic head injury.19,20  Applied toward febrile infants, these models may be able to maintain high sensitivity toward the identification of infants with SBI with improved specificity.

We evaluated the test characteristics of machine learning models to predict outcomes of SBI among infants presenting to the ED with fever. We compare these models to previously established clinical prediction rules.68 

We performed a secondary analysis of a multicenter prospective study performed between December 2008 and May 2013 by the Pediatric Emergency Care Applied Research Network (PECARN) in 26 EDs. The current study was performed by using a public use data set obtained from the parent study and was designated nonhuman subjects research by our institutional review board.21  Analyses were conducted by using R version 3.6.1 (R Foundation for Statistical Computing, Vienna, Austria).

Inclusion criteria of the parent study have been published.22  A convenience sample of patients ≤60 days of age with fever (≥38°C in the preceding 24 hours) evaluated in study site EDs were included. Patients were excluded from the parent study for clinical sepsis, recent antibiotic use, prematurity, preexisting medical conditions, or soft tissue infections. We additionally excluded encounters that were missing any of the following: blood or urine cultures, serum procalcitonin (retaining levels performed on the first or second day of admission), white blood cell count (WBC), absolute neutrophil count (ANC), or urinalysis. We excluded encounters lacking cerebrospinal fluid (CSF) cultures or a phone or medical record review performed 8 to 14 days after the encounter visit to verify that discharged patients without lumbar puncture did not have bacterial meningitis. We developed our model with procalcitonin given the high diagnostic accuracy reported with respect to this predictor for SBI.9,10,23,24  Similar inclusion criteria were used in the PECARN decision rule.7 

Our outcome was SBI, defined as bacterial meningitis, bacteremia, and/or UTI. Cultures were reviewed by using standardized criteria from the parent study.25  UTI was defined using as ≥1000 colony-forming units (CFUs) per milliliter from urine culture obtained via suprapubic aspiration, ≥50 000 CFU/mL from catheterization, or 10 000 to 49 999 CFU/mL from catherization with a positive urinalysis (leukocyte esterase, nitrite, or >5 WBC per high-power field).22  Candidate features included age, attending or fellow clinical suspicion for risk of SBI (dichotomized as 1%–10% and >10%), sex, fever location (in enrolling ED or elsewhere), WBC, ANC, serum procalcitonin, and urinalysis.

We performed 2 steps to identify and retain features that were maximally predictive of outcome, while removing features which would limit accuracy. First, we performed univariate logistic regression to test for association of predictors with SBI, retaining variables with P < .10. Second, we performed principal component analysis (PCA), in which orthogonal transformation was performed to linearly uncorrelate variables and identify features that discriminated positive from negative cases of SBI. In PCA, a small angle between vectors implies a positive correlation, a large one suggests negative correlation, and a 90° angle indicates no correlation between characteristics. The farther away these vectors are from a principal component origin, the greater influence these carry in PCA.26 

We randomized patients into derivation and validation cohorts with equal partitions, balanced to ensure equal numbers of SBIs. We developed 4 supervised learning models.27  First, we performed multivariable logistic regression. Logistic regression uses a logistic function to model a binary outcome, resulting in the generation of coefficients that can be used to predict the logarithm of the odds of SBI. This model was optimized to obtain the lowest Akaike Information Criterion using bidirectional stepwise regression. Second, we performed random forest modeling, an ensemble method (in which multiple algorithms are used) whereby numerous decision trees are developed on random subsets of features during model development. These are then run individually and averaged to calculate probabilities of an outcome. The hyperparameter (a parameter used to tune the training process) for the number of variables available for splitting at each tree node was derived from features identified during PCA and verified by using a 10-fold cross validation repeated 3 times. Third, we performed modeling using a support vector machine (SVM) using a radial kernel. SVM is an algorithm in which a boundary is created with the widest margins between each type of predetermined classification. Hyperparameters were identified using 10-fold cross validation. Fourth, we fitted a single-hidden layer neural network as a feed-forward multilayer perceptron. The neural network consists of 3 “layers” in which data are input, analyzed through a serious of transformations, then output into results. We used bootstrapped resampling performed 25 times to identify final values for size and decay functions. Analyses were performed using the MASS (v7.3–51.4), randomForest (version 4.6–14), e1071 (v1.7–2), and nnet (v7.3–12) packages.

Models were assessed by using the validation cohort. We estimated the area under the receiver operating curve (AUROC). Predictions from logistic regression were converted to probabilities using the inverse logit formula. Optimal thresholds were estimated for each model by using a misclassification cost algorithm in which the relative cost of false-negatives was 100 times greater than false-positives.7  For each model, we reported (with 95% confidence intervals [CIs]) parameters of diagnostic accuracy, area under the precision-recall curve (AUPRC), and the F1 score. We additionally compared models to 3 recently published models: PECARN, Step-by-Step, and Aronson (Supplemental Table 6).68  For the Step-by-Step model, we a priori determined that an infant with a >10% physician suspicion of SBI was criteria for “ill-appearing,” and we classified intermediate-risk patients as high-risk.8  For the Aronson model, we used a score of ≥2 to classify patients into high- and low-risk groups.6  We assessed the lumbar punctures that may have been avoided by low-risk group identification from each model.

We performed an exploratory analysis using a modified inclusion criteria. Patients with complete and interpretable culture data were considered for inclusion. Predictors of interest that were missing in >15% of cases were removed, because imputations on data with high proportions of missing data can represent a source of bias.28  For remaining cases, we performed imputation for missing values using predictive mean matching. We then validated the highest performing machine learning model. We performed a post hoc analysis whereby imputation was performed for all missing data from patients having complete and interpretable cultures. Second, we performed an analysis using an outcome limited to bacterial meningitis and bacteremia. Third, we redeveloped the model with highest accuracy using a modified outcome in which patients diagnosed with UTI in the primary analysis by colony counts of 10 000 to 49 999 CFU/hpf with positive urinalysis were considered non-SBIs. Our outcome remained SBI. Fourth, given the limited availability of procalcitonin in North America,29,30  we performed an analysis creating a machine learning model without this test. We additionally evaluated the performance of such a model using only objective measures.

Of 7335 encounters, 1470 were included (Fig 1). A total of 1014 patients (69.0%) were >28 days old. A total of 138 of 1470 (9.3%) patients had SBIs, including 122 (8.3%), 20 (1.4%), and 8 (0.5%) with UTI, bacteremia, and bacterial meningitis, respectively. Eleven (0.7%) had concurrent SBIs. A total of 1240 (84.4%) patients had a lumbar puncture performed, and 1050 (71.4%) were admitted.

FIGURE 1

Patient inclusion and randomization. aFor a sensitivity analysis, 4247 encounters with complete and interpretable cultures were used, and imputation was performed for cases with missing variables. CBC, complete blood count.

FIGURE 1

Patient inclusion and randomization. aFor a sensitivity analysis, 4247 encounters with complete and interpretable cultures were used, and imputation was performed for cases with missing variables. CBC, complete blood count.

Predictors significantly associated with SBI were age, suspicion of SBI, sex, WBC, ANC, positive urinalysis, and procalcitonin (Table 1). In evaluation of non-SBIs cases on the PCA biplot (Supplemental Fig 4), age, urinalysis, and sex strongly influenced the second principal component (PC2). WBC, ANC, and procalcitonin influenced the first principal component (PC1). In particular, ANC and WBC were found to be positively correlated and had a small angle between their vectors in the PC1 dimension. For positive SBI cases, urinalysis, procalcitonin, physician impression of SBI, WBC, and ANC separated PC1. Age, and sex influenced PC2. A total of 4 variables, urinalysis, procalcitonin, ANC, and WBC, based on distances of the vectors from the origin and also how variables correlated with one another on the PCA biplot, were retained for modeling.

TABLE 1

Differences Among Patients With and Without SBI

VariablePatients Without SBI (n = 1332)Patients With SBI (n = 138)P
Age, d, median (IQR) 38 (26–48) 31 (20–45) .001a 
Male, n (%) 767 (57.6) 92 (66.7) .040a 
Elevated temperature in enrolling ED, n (%) 808 (60.7) 89 (64.5) .380 
Clinical suspicion for SBI, n (%)   <.001a 
 1%–10% 1262 (94.7) 114 (82.6)  
 >10% 70 (5.3) 24 (17.4)  
WBC, per μL, median (IQR) 9200 (6900–12 100) 14 500 (1500–18 000) <.001a 
ANC, per μL, median (IQR) 2970 (1910–4710) 7200 (5020–10 500) <.001a 
Positive urinalysis, n (%) 107 (8.0) 116 (84.1) <.001a 
Procalcitonin level, ng/mL, median (IQR) 0.20 (0.15–0.28) 0.74 (0.31–3.39) <.001a 
VariablePatients Without SBI (n = 1332)Patients With SBI (n = 138)P
Age, d, median (IQR) 38 (26–48) 31 (20–45) .001a 
Male, n (%) 767 (57.6) 92 (66.7) .040a 
Elevated temperature in enrolling ED, n (%) 808 (60.7) 89 (64.5) .380 
Clinical suspicion for SBI, n (%)   <.001a 
 1%–10% 1262 (94.7) 114 (82.6)  
 >10% 70 (5.3) 24 (17.4)  
WBC, per μL, median (IQR) 9200 (6900–12 100) 14 500 (1500–18 000) <.001a 
ANC, per μL, median (IQR) 2970 (1910–4710) 7200 (5020–10 500) <.001a 
Positive urinalysis, n (%) 107 (8.0) 116 (84.1) <.001a 
Procalcitonin level, ng/mL, median (IQR) 0.20 (0.15–0.28) 0.74 (0.31–3.39) <.001a 

IQR, interquartile range.

a

P values significant by univariable binary logistic regression.

A total of 735 patients were placed into derivation and validation cohorts. Sixty-nine (9.4%) patients in each cohort had SBIs. Groups were similar with respect to predictor and outcome variables with the exception of WBC (Table 2).

TABLE 2

Differences Among Patients in Derivation and Validation Cohorts

VariableDerivation Cohort (n = 735)Validation Cohort (n = 735)P
Predictor
 Age, d, median (IQR) 37 (25–48) 38 (26–48) .664 
 Male, n (%) 423 (57.6) 436 (59.3) .491 
 Elevated temperature in enrolling ED, n (%) 443 (60.3) 454 (61.8) .556 
 Unstructured clinical suspicion for SBI, n (%)   1.000 
  1%–10% 688 (93.6) 688 (93.6)  
  >10% 47 (6.4) 47 (6.4)  
 WBC, per μL, median (IQR) 9200 (6600–12 300) 9700 (7300–13 000) .024a 
 ANC, per μL, median (IQR) 3180 (1940–5090) 3220 (2040–5180) .390 
 Positive urinalysis, n (%) 110 (15.0) 113 (15.4) .827 
 Procalcitonin level, ng/mL, median (IQR) 0.21 (0.16–0.33) 0.20 (0.15–0.29) .488 
Outcomes    
 Any SBIb 69 (9.4) 69 (9.4) 1.000 
 Positive urine culture 61 (8.3) 61 (8.3) 1.000 
 Positive blood culture 9 (1.2) 11 (1.5) .653 
 Positive CSF culture 4 (0.5) 4 (0.5) 1.000 
VariableDerivation Cohort (n = 735)Validation Cohort (n = 735)P
Predictor
 Age, d, median (IQR) 37 (25–48) 38 (26–48) .664 
 Male, n (%) 423 (57.6) 436 (59.3) .491 
 Elevated temperature in enrolling ED, n (%) 443 (60.3) 454 (61.8) .556 
 Unstructured clinical suspicion for SBI, n (%)   1.000 
  1%–10% 688 (93.6) 688 (93.6)  
  >10% 47 (6.4) 47 (6.4)  
 WBC, per μL, median (IQR) 9200 (6600–12 300) 9700 (7300–13 000) .024a 
 ANC, per μL, median (IQR) 3180 (1940–5090) 3220 (2040–5180) .390 
 Positive urinalysis, n (%) 110 (15.0) 113 (15.4) .827 
 Procalcitonin level, ng/mL, median (IQR) 0.21 (0.16–0.33) 0.20 (0.15–0.29) .488 
Outcomes    
 Any SBIb 69 (9.4) 69 (9.4) 1.000 
 Positive urine culture 61 (8.3) 61 (8.3) 1.000 
 Positive blood culture 9 (1.2) 11 (1.5) .653 
 Positive CSF culture 4 (0.5) 4 (0.5) 1.000 

IQR, interquartile range.

a

P values significant by univariable binary logistic regression.

b

Some patients had concomitant SBIs. In the derivation cohort, there were 2 patients with bacteremia and meningitis and 3 patients with bacteremia and UTI. In the validation cohort, there were 2 patients with concomitant bacteremia and meningitis, 3 patients with UTI and bacteremia, and 1 patient with bacteremia, meningitis, and UTI.

The AUROC of models in the validation cohort were high with the highest found in the random forest model (AUROC: 0.96, 95% CI: 0.93–0.98) (Fig 2). This model had the highest specificity on the receiver operator curve (74.9%, 95% CI: 71.5%–78.2%) (Table 3), AUPRC, and F1 statistics (Table 4). The relative importance of features contained within the random forest model is provided in Fig 3. The random forest model demonstrated a sensitivity comparable to the PECARN, Aronson, and Step-by-Step models, but with specificity above the upper limits of the CI for each comparison model. Using the random forest model to classify infants into high and low-risk groups revealed that 849 of 1240 (68.5%) patients who underwent a lumbar puncture could have potentially avoided the procedure. When applying the PECARN, Step-by-Step, and Aronson rules to this study sample, each respectively would have resulted in 673 of 1240 (54.3%), 460 of 1240 (37.1%), and 319 of 1240 (25.7%) fewer lumbar puncture procedures.

FIGURE 2

Receiver operator curves of all evaluated models on the validation cohort (N = 735). GLM, generalized logistic model.

FIGURE 2

Receiver operator curves of all evaluated models on the validation cohort (N = 735). GLM, generalized logistic model.

TABLE 3

Model Characteristics Among the Derivation and Validation Cohorts

MethodDerivation CohortValidation Cohort
AUCSensitivity (%)Specificity (%)PPV (%)NPV (%)LR (+)LR (−)AUCSensitivity (%)Specificity (%)PPV (%)NPV (%)LR (+)LR (−)
Stepwise logistic regression 0.95 (0.92–0.98) 98.6 (92.2–100.) 49.2 (45.4–53.1) 16.7 (13.2–20.7) 99.7 (98.3–100.0) 1.94 (1.79–2.10) 0.03 (0.00–0.21) 0.95 (0.93–0.98) 100.0 (94.8–100.0) 50.2 (46.3–54.0) 17.2 (13.6–21.3) 100 (98.9–100.0) 2.01 (1.86–2.16) 0.00 (0.00) 
Random foresta 1.00 (0.99–1.00) 100.0 (94.8–100.0) 81.8 (78.7–84.7) 36.3 (29.5–43.6) 100.0 (99.3–100.0) 5.50 (4.68–6.47) 0.00 (—) 0.96 (0.93–0.98) 98.6 (92.2–100) 74.9 (71.5–78.2) 28.9 (23.2–35.2) 99.8 (98.9–100.0) 3.93 (3.44–4.50) 0.02 (0.00–0.14) 
SVM 0.94 (0.90–0.98) 97.1 (89.9–99.6) 47.6 (43.7–51.5) 16.1 (12.7–20.0) 99.4 (97.8–99.9) 1.85 (1.71–2.01) 0.06 (0.02–0.24) 0.93 (0.89–0.97) 97.1 (89.9–99.6) 52.4 (48.5–56.3) 17.4 (13.8–21.6) 99.4 (98–99.9) 2.04 (1.87–2.23) 0.06 (0.01–0.22) 
Single-hidden layer neural network 0.96 (0.94–0.99) 95.7 (87.8–99.1) 68.6 (64.9–72.1) 24.0 (19.1–29.5) 99.3 (98.1–99.9) 3.05 (2.70–3.45) 0.06 (0.02–0.19) 0.95 (0.0.93–0.97) 98.6 (92.2–100.0) 70.4 (66.8–73.9) 25.7 (20.5–31.4) 99.8 (98.8–100.0) 3.33 (2.95–3.76) N 0.02 (0.00–0.14) 
PECARN Rule7  — 97.1 (89.9–99.6) 64.0 (60.2–67.6) 21.8 (17.3–26.9) 99.5 (98.3–99.9) 2.69 (2.42–3.01) 0.05 (0.01–0.18) — 98.6 (92.2–100) 60.2 (56.4–64.0) 20.4 (16.2–25.2) 99.8 (98.6–100.0) 2.48 (2.25–2.73) 0.02 (0.00–0.17) 
Step-by-Step8  — 94.2 (85.8–98.4) 67.4 (63.7–71.0) 23.0 (18.3–28.4) 99.1 (97.8–99.8) 2.89 (2.55–3.27) 0.09 (0.03–0.22) — 92.8 (83.9–97.6) 67.6 (63.9–71.1) 22.9 (18.1–28.2) 98.9 (97.5–99.6) 2.86 (2.52–3.25) 0.11 (0.05–0.25) 
Aronson6  — 98.6 (92.2–100.0) 31.7 (28.2–35.4) 13.0 (10.2–16.2) 99.5 (97.4–100.0) 1.44 (1.36–1.53) 0.05 (0.01–0.32) — 100.0 (94.8–100.0) 30.8 (27.3–34.4) 13.0 (10.3–16.2) 100.0 (98.2–100.0) 1.44 (1.37–1.52) 0.00 (0.00) 
MethodDerivation CohortValidation Cohort
AUCSensitivity (%)Specificity (%)PPV (%)NPV (%)LR (+)LR (−)AUCSensitivity (%)Specificity (%)PPV (%)NPV (%)LR (+)LR (−)
Stepwise logistic regression 0.95 (0.92–0.98) 98.6 (92.2–100.) 49.2 (45.4–53.1) 16.7 (13.2–20.7) 99.7 (98.3–100.0) 1.94 (1.79–2.10) 0.03 (0.00–0.21) 0.95 (0.93–0.98) 100.0 (94.8–100.0) 50.2 (46.3–54.0) 17.2 (13.6–21.3) 100 (98.9–100.0) 2.01 (1.86–2.16) 0.00 (0.00) 
Random foresta 1.00 (0.99–1.00) 100.0 (94.8–100.0) 81.8 (78.7–84.7) 36.3 (29.5–43.6) 100.0 (99.3–100.0) 5.50 (4.68–6.47) 0.00 (—) 0.96 (0.93–0.98) 98.6 (92.2–100) 74.9 (71.5–78.2) 28.9 (23.2–35.2) 99.8 (98.9–100.0) 3.93 (3.44–4.50) 0.02 (0.00–0.14) 
SVM 0.94 (0.90–0.98) 97.1 (89.9–99.6) 47.6 (43.7–51.5) 16.1 (12.7–20.0) 99.4 (97.8–99.9) 1.85 (1.71–2.01) 0.06 (0.02–0.24) 0.93 (0.89–0.97) 97.1 (89.9–99.6) 52.4 (48.5–56.3) 17.4 (13.8–21.6) 99.4 (98–99.9) 2.04 (1.87–2.23) 0.06 (0.01–0.22) 
Single-hidden layer neural network 0.96 (0.94–0.99) 95.7 (87.8–99.1) 68.6 (64.9–72.1) 24.0 (19.1–29.5) 99.3 (98.1–99.9) 3.05 (2.70–3.45) 0.06 (0.02–0.19) 0.95 (0.0.93–0.97) 98.6 (92.2–100.0) 70.4 (66.8–73.9) 25.7 (20.5–31.4) 99.8 (98.8–100.0) 3.33 (2.95–3.76) N 0.02 (0.00–0.14) 
PECARN Rule7  — 97.1 (89.9–99.6) 64.0 (60.2–67.6) 21.8 (17.3–26.9) 99.5 (98.3–99.9) 2.69 (2.42–3.01) 0.05 (0.01–0.18) — 98.6 (92.2–100) 60.2 (56.4–64.0) 20.4 (16.2–25.2) 99.8 (98.6–100.0) 2.48 (2.25–2.73) 0.02 (0.00–0.17) 
Step-by-Step8  — 94.2 (85.8–98.4) 67.4 (63.7–71.0) 23.0 (18.3–28.4) 99.1 (97.8–99.8) 2.89 (2.55–3.27) 0.09 (0.03–0.22) — 92.8 (83.9–97.6) 67.6 (63.9–71.1) 22.9 (18.1–28.2) 98.9 (97.5–99.6) 2.86 (2.52–3.25) 0.11 (0.05–0.25) 
Aronson6  — 98.6 (92.2–100.0) 31.7 (28.2–35.4) 13.0 (10.2–16.2) 99.5 (97.4–100.0) 1.44 (1.36–1.53) 0.05 (0.01–0.32) — 100.0 (94.8–100.0) 30.8 (27.3–34.4) 13.0 (10.3–16.2) 100.0 (98.2–100.0) 1.44 (1.37–1.52) 0.00 (0.00) 

Numbers in parenthesis represent 95% CIs. AUC, area under the curve; LR (+), positive likelihood ratio; LR (−), negative likelihood ratio; NPV, negative predictive value; PPV, positive predictive value; —, not applicable.

a

Model with highest specificity in threshold analysis performed on validation cohort.

TABLE 4

AUPRC and F1 Scores for Evaluated Models

ValidationDerivation
AUPRCF1 ScoreAUPRCF1 Score
Stepwise logistic regression 0.74 0.66 0.74 0.67 
Random forest 0.97 0.90 0.78 0.86 
SVM 0.79 0.64 0.71 0.69 
Single-hidden layer neural network 0.82 0.81 0.72 0.83 
ValidationDerivation
AUPRCF1 ScoreAUPRCF1 Score
Stepwise logistic regression 0.74 0.66 0.74 0.67 
Random forest 0.97 0.90 0.78 0.86 
SVM 0.79 0.64 0.71 0.69 
Single-hidden layer neural network 0.82 0.81 0.72 0.83 
FIGURE 3

Feature (variable) importance of the random forest model, as classified using the mean decrease in accuracy. Features with a large mean decrease in accuracy carry greater importance for classification of data.

FIGURE 3

Feature (variable) importance of the random forest model, as classified using the mean decrease in accuracy. Features with a large mean decrease in accuracy carry greater importance for classification of data.

No more than 4 SBIs were missed with any machine learning model. The random forest and logistic regression models missed 1 patient. The SVM and neural network models missed 4 cases (Table 5). The PECARN, Step-by-Step, and Aronson models had 3, 9, and 1 false-negatives, respectively.

TABLE 5

Missed (False-Negative) Cases of SBI Using Each Methodology

SourceAge, d/SexUnstructured Suspicion of SBIElevated Temperature in EDWBC, Cells per μLANC, Cells per μLBands, Cells per μLUrinalysisProcalcitonin (ng/mL)PathogenMissed Algorithms
Derivation 21/male High Yes 7730 1500 10 Positive 0.24 Escherichia coli UTI SVM 
Derivation 21/male Low No 8220 1320 Positive 0.2 Klebsiella pneumoniae UTI SVM 
Derivation 26/female Low Yes 7000 5390 3780 Negative 3.61 Group B Streptococcus bacteremia NN 
Derivation 27/male Low No 10 700 5960 Negative 0.24 E coli UTI Step-by-Step 
Derivation 36/male Low No 2300 920 280 Negative 0.16 Pseudomonas aeruginosa UTI PECARN 
LR 
Aronson 
Step-by-Step 
Derivation 37/female Low Yes 2940 260 260 Negative 1.77 Group B Streptococcus meningitis NN 
Derivation 52/female Low Yes 21 080 6750 Negative 0.42 E coli UTI NN 
Step-by-Step 
Derivation 55/female Low Yes 3800 2200 110 Negative 0.2 E coli UTI PECARN 
Step-by-Step 
Validation 30/male Low Yes 6700 2680 Negative 0.14 Enterobacter cloacae bacteremia PECARN 
RF 
NN 
Step-by-Step 
Validation 36/male Low Yes 12 710 5320 30 Negative 0.13 Staphylococcus aureus bacteremia SVM 
Step-by-Step 
Validation 42/female Low Yes 11 000 7260 33 Negative 0.23 Enterococcus UTI Step-by-Step 
Validation 50/male Low No 10 620 6370 Negative 0.27 Staphylococcus aureus bacteremia Step-by-Step 
Validation 50/male Low Yes 17 000 5950 Negative 0.19 Klebsiella pneumoniae UTI Step-by-Step 
Validation 54/male Low No 10 520 1580 110 Positive 0.17 E coli UTI SVM 
SourceAge, d/SexUnstructured Suspicion of SBIElevated Temperature in EDWBC, Cells per μLANC, Cells per μLBands, Cells per μLUrinalysisProcalcitonin (ng/mL)PathogenMissed Algorithms
Derivation 21/male High Yes 7730 1500 10 Positive 0.24 Escherichia coli UTI SVM 
Derivation 21/male Low No 8220 1320 Positive 0.2 Klebsiella pneumoniae UTI SVM 
Derivation 26/female Low Yes 7000 5390 3780 Negative 3.61 Group B Streptococcus bacteremia NN 
Derivation 27/male Low No 10 700 5960 Negative 0.24 E coli UTI Step-by-Step 
Derivation 36/male Low No 2300 920 280 Negative 0.16 Pseudomonas aeruginosa UTI PECARN 
LR 
Aronson 
Step-by-Step 
Derivation 37/female Low Yes 2940 260 260 Negative 1.77 Group B Streptococcus meningitis NN 
Derivation 52/female Low Yes 21 080 6750 Negative 0.42 E coli UTI NN 
Step-by-Step 
Derivation 55/female Low Yes 3800 2200 110 Negative 0.2 E coli UTI PECARN 
Step-by-Step 
Validation 30/male Low Yes 6700 2680 Negative 0.14 Enterobacter cloacae bacteremia PECARN 
RF 
NN 
Step-by-Step 
Validation 36/male Low Yes 12 710 5320 30 Negative 0.13 Staphylococcus aureus bacteremia SVM 
Step-by-Step 
Validation 42/female Low Yes 11 000 7260 33 Negative 0.23 Enterococcus UTI Step-by-Step 
Validation 50/male Low No 10 620 6370 Negative 0.27 Staphylococcus aureus bacteremia Step-by-Step 
Validation 50/male Low Yes 17 000 5950 Negative 0.19 Klebsiella pneumoniae UTI Step-by-Step 
Validation 54/male Low No 10 520 1580 110 Positive 0.17 E coli UTI SVM 

PECARN, rule reported by Kuppermann et al7 ; Aronson, rule developed by Aronson et al6 ; Step-by-Step, rule reported by Gomez et al.8  LR, logistic regression; RF, random forest; NN, neural network.

In our first exploratory analysis, we used imputation to test model performance on a cohort of patients that had missing data elements in <15% of cases, keeping other inclusion criteria the same. A total of 1537 encounters were included, with 158 (10.3%) having SBIs (Supplemental Table 7). Application of the random forest model to this data set resulted in one missed case, with high sensitivity and specificity (Supplemental Table 8). In our second exploratory analysis, the model performed similarly to results reported in the primary analysis when using an outcome limited to bacterial meningitis and bacteremia (Supplemental Table 9). In our third exploratory analysis using a modified definition for UTI, 17 UTIs with lower colony counts were reclassified as non-UTIs. Model performance was similar in this analysis as for the primary study (Supplemental Table 10). In a post hoc analysis, we performed imputation for all missing data, including procalcitonin, while using the full SBI definition. This model demonstrated a sensitivity of 94.9% (95% CI: 92.3%–96.8%) and specificity of 74.3% (95% CI: 72.9%–75.7%; Supplemental Table 11).

In creation of a model without procalcitonin, 3989 patients were included (Supplemental Fig 5). Three hundred ninety (9.8%) had an SBI (335 [8.4%] UTIs, 73 [1.8%] bacteremia, and 21 [0.5%] bacterial meningitis; 38 [1.0%] with concurrent infections). The variables of physician suspicion of SBI, age, sex, urinalysis, and ANC were selected from the PCA biplot. This model achieved a sensitivity of 98.0% (95% CI: 94.9%–99.4%) and specificity of 42.4% (95% CI: 40.1%–44.7%) (Supplemental Tables 12 and 13). In development of a random forest model including only objective measures (ANC, urinalysis, patient age, and sex, omitting the clinician suspicion for SBI), the specificity from the validation cohort was lower compared with the model that included clinical assessment (Supplemental Tables 14 and 15).

We developed machine learning models to risk-stratify infants ≤60 days of age for SBI. We identified a random forest model that demonstrated high sensitivity (99%) and specificity (75%) compared with previously published models.18,31  This model missed 1 patient out of 138 with SBI (0.7%). Machine learning models for the risk-stratification of febrile infants demonstrated high accuracy and may help support clinical decision-making to minimize unnecessary hospitalizations, antibiotics, and lumbar punctures.

Kuppermann et al,7  in their recent model, demonstrated a sensitivity of 98% and a specificity of 60%, using recursive partitioning. In our study, we use the public-use data set from that investigation, which may be a subset of the parent study. We also used slightly different inclusion criteria, such as the retention of procalcitonin levels performed within the first 2 days of admission. The Step-by-Step model, another recent predictive tool using ANC, urinalysis, procalcitonin, and C-reactive protein, demonstrated a sensitivity of 92% and specificity of 47%.8  A model reported by Aronson et al,6  for invasive bacterial infections, revealed a sensitivity of 99% and a specificity of 31%. Similar to these models, our model does not require CSF for risk-stratification. This model also demonstrates superior test characteristics compared with older models, including the Philadelphia (sensitivity 97%–99%, specificity 39%–42%)2,31  and Rochester criteria (sensitivity 82%–97%, specificity 40%–50%).3,5,8 

The use of supervised learning techniques carries potential to decrease medical interventions. Applying the random forest rule, for example, may have allowed lumbar punctures to be avoided when compared with previous rules.68  Although incremental, this reduction in procedures represents an area of improvement. In addition to reduced procedures, more accurate classification of lower-risk infants may potentially result in reduced empirical antimicrobial therapy, hospitalizations, false-positive cultures, and nosocomial infection risk.32 

Notably, one patient was classified as a false-negative in the random forest model with Enterobacter cloacae bacteremia. This patient was additionally misclassified as a false-negative in the PECARN and Step-by-Step rules.7,8  According to Kuppermann et al,7  a repeated blood culture before the administration of antibiotics was negative, and the patient had a benign hospitalization. This misclassification stresses a continued need to develop more robust methods in the identification of febrile infants with SBI.

The random forest model was able to achieve the highest diagnostic accuracy among investigated machine-learning models. This occurred while using similar variables to those used in previous decision rules. Although a strength of a recursive partitioning approach is that it can be used by bedside clinicians without calculations, it does not fully exploit the computational power available in modern health care settings. In contrast, the random forest algorithm is computationally intense but likely better able to leverage the data, particularly with respect to continuous variables. The unbalanced nature of this data set, in which 10% of patients had SBIs, may have contributed to this result. With unbalanced data, standard classifiers such as SVMs, logistic regression, and decision trees tend to bias toward the majority (or negative) class and are further limited by smaller sample sizes.33  In contrast, the random forest model, which provides predicted probabilities from multiple decision trees, may be better suited to these types of data.

Despite demonstrating superior test characteristics, our models carry important limitations compared with previously published guidelines. Unlike rules that provided parameter cutoffs,18  these models require use of algorithms which are not intuitive. A sample tree, provided in Supplemental Fig 6, demonstrates one such decision tree made in the random forest model; the final model contains 5000 of such trees, the results of which are aggregated to provide a final probability. Integration of these findings into health information systems, including through interactive web applications or electronic medical record systems, may mitigate this disadvantage. For example, a validated model was recently developed into a web application to support the real time risk assessment of UTI in children 2 to 23 months of age.34  Other investigators have reported on use of sepsis alerts to notify clinicians of at-risk patients using machine learning data.3537  A model to identify at-risk febrile infants may allow for greater diagnostic accuracy by leveraging available computational resources available through integration with the electronic medical record. Importantly, before clinical deployment, validation of such a model is paramount. Models may demonstrate a decline in performance when tested on other data sets.38  This concern is especially relevant with respect to machine learning models.39 

Our findings are subject to limitations, including those with respect to use of a convenience sample and use of cultures to determine true-positives. However, rates of SBI from the parent study appear to be similar to those documented elsewhere, suggesting that this is not likely a major source of error.8,13  Models were developed on a smaller-sized data set. We excluded patients with procalcitonin, which resulted in a large number of exclusions. This step was similarly performed in the PECARN model.7  We did not use the exact same inclusion criteria that was used in the PECARN model, or the same derivation and validation cohorts. This resulted in slightly different study numbers. Our models may be overfit and require external validation to assess generalizability.40  Despite these limitations, the findings from this analysis suggest that machine learning models have the potential to perform well in the identification of SBI among populations of well-appearing febrile infants.

We evaluated machine-learning algorithms in the risk-stratification of well-appearing febrile infants ≤60 days old from a multicenter prospective study. Although they need external validation, we suggest that ensemble machine learning algorithms maintain the high sensitivity of recently published decision tools while providing higher specificity. This in turn would allow for more accurate identification of patients without disease, leading to reduced invasive procedures, antimicrobial agents, and hospitalizations.

Dr Ramgopal contributed to conceptualization and design of the study, methodology, investigation, formal analysis, and drafting of the initial manuscript; Dr Horvat contributed to methodology, formal analysis, and editing of the manuscript for intellectually important content; Dr Alpern contributed to conceptualization and design of the study, methodology, and editing of the manuscript for intellectually important content; Dr Yanamala contributed to formal analysis and editing of the manuscript for intellectually important content; and all authors approved the final manuscript as submitted and agree to be accountable for all aspects of the work.

FUNDING: Dr Horvat is sponsered by the Eunice Kennedy Shriver National Institute of Child Health and Human Development (1K23HD099331-01A1). Funded by the National Institutes of Health (NIH).

     
  • ANC

    absolute neutrophil count

  •  
  • AUPRC

    area under the precision-recall curve

  •  
  • AUROC

    area under the receiver operating curve

  •  
  • CFU

    colony-forming unit

  •  
  • CI

    confidence interval

  •  
  • CSF

    cerebrospinal fluid

  •  
  • ED

    emergency department

  •  
  • PC1

    first principal component

  •  
  • PC2

    second principal component

  •  
  • PCA

    principal component analysis

  •  
  • PECARN

    Pediatric Emergency Care Applied Research Network

  •  
  • SBI

    serious bacterial infection

  •  
  • SVM

    support vector machine

  •  
  • UTI

    urinary tract infection

  •  
  • WBC

    white blood cell count

1
Herr
SM
,
Wald
ER
,
Pitetti
RD
,
Choi
SS
.
Enhanced urinalysis improves identification of febrile infants ages 60 days and younger at low risk for serious bacterial illness
.
Pediatrics
.
2001
;
108
(
4
):
866
871
2
Baker
MD
,
Bell
LM
,
Avner
JR
.
Outpatient management without antibiotics of fever in selected infants
.
N Engl J Med
.
1993
;
329
(
20
):
1437
1441
3
Jaskiewicz
JA
,
McCarthy
CA
,
Richardson
AC
, et al;
Febrile Infant Collaborative Study Group
.
Febrile infants at low risk for serious bacterial infection--an appraisal of the Rochester criteria and implications for management
.
Pediatrics
.
1994
;
94
(
3
):
390
396
4
Baskin
MN
,
O’Rourke
EJ
,
Fleisher
GR
.
Outpatient treatment of febrile infants 28 to 89 days of age with intramuscular administration of ceftriaxone
.
J Pediatr
.
1992
;
120
(
1
):
22
27
5
Dagan
R
,
Powell
KR
,
Hall
CB
,
Menegus
MA
.
Identification of infants unlikely to have serious bacterial infection although hospitalized for suspected sepsis
.
J Pediatr
.
1985
;
107
(
6
):
855
860
6
Aronson
PL
,
Shabanova
V
,
Shapiro
ED
, et al;
Febrile Young Infant Research Collaborative
.
A prediction model to identify febrile infants ≤60 days at low risk of invasive bacterial infection
.
Pediatrics
.
2019
;
144
(
1
):
e20183604
7
Kuppermann
N
,
Dayan
PS
,
Levine
DA
, et al
.
A clinical prediction rule to identify febrile infants 60 days and younger at low risk for serious bacterial infections
.
JAMA Pediatr
.
2019
;
173
(
4
):
342
351
8
Gomez
B
,
Mintegi
S
,
Bressan
S
,
Da Dalt
L
,
Gervaix
A
,
Lacroix
L
;
European Group for Validation of the Step-by-Step Approach
.
Validation of the “Step-by-Step” approach in the management of young febrile infants
.
Pediatrics
.
2016
;
138
(
2
):
e20154381
9
Milcent
K
,
Faesch
S
,
Gras-Le Guen
C
, et al
.
Use of procalcitonin assays to predict serious bacterial infection in young febrile infants
.
JAMA Pediatr
.
2016
;
170
(
1
):
62
69
10
Mahajan
P
,
Grzybowski
M
,
Chen
X
, et al
.
Procalcitonin as a marker of serious bacterial infections in febrile children younger than 3 years old
.
Acad Emerg Med
.
2014
;
21
(
2
):
171
179
11
Ramgopal
S
,
Walker
LW
,
Nowalk
AJ
,
Cruz
AT
,
Vitale
MA
.
Immature neutrophils in young febrile infants
.
Arch Dis Child
.
2019
;
104
(
9
):
884
886
12
Mintegi
S
,
Gomez
B
,
Carro
A
,
Diaz
H
,
Benito
J
.
Invasive bacterial infections in young afebrile infants with a history of fever
.
Arch Dis Child
.
2018
;
103
(
7
):
665
669
13
Greenhow
TL
,
Hung
Y-Y
,
Herz
AM
,
Losada
E
,
Pantell
RH
.
The changing epidemiology of serious bacterial infections in young infants
.
Pediatr Infect Dis J
.
2014
;
33
(
6
):
595
599
14
De
S
,
Tong
A
,
Isaacs
D
,
Craig
JC
.
Parental perspectives on evaluation and management of fever in young infants: an interview study
.
Arch Dis Child
.
2014
;
99
(
8
):
717
723
15
Aronson
PL
,
Thurm
C
,
Alpern
ER
, et al;
Febrile Young Infant Research Collaborative
.
Variation in care of the febrile young infant <90 days in US pediatric emergency departments
.
Pediatrics
.
2014
;
134
(
4
):
667
677
16
Alloghani
M
,
Al-Jumeily
D
,
Mustafina
J
,
Hussain
A
,
Aljaaf
AJ
. A Systematic Review on Supervised and Unsupervised Machine Learning Algorithms for Data Science. In:
Berry
MW
,
Mohamed
A
,
Yap
BW
, eds.
Supervised and Unsupervised Learning for Data Science. Unsupervised and Semi-Supervised Learning
.
New York, NY
:
Springer International Publishing
;
2019
:
3
22
17
Deleger
L
,
Brodzinski
H
,
Zhai
H
, et al
.
Developing and evaluating an automated appendicitis risk stratification algorithm for pediatric patients in the emergency department
.
J Am Med Inform Assoc
.
2013
;
20
(
e2
):
e212
e220
18
Goto
T
,
Camargo
CA
 Jr.
,
Faridi
MK
,
Freishtat
RJ
,
Hasegawa
K
.
Machine learning-based prediction of clinical outcomes for children during emergency department triage
.
JAMA Netw Open
.
2019
;
2
(
1
):
e186937
19
Bertsimas
D
,
Dunn
J
,
Steele
DW
,
Trikalinos
TA
,
Wang
Y
.
Comparison of machine learning optimal classification trees with the pediatric emergency care applied research network head trauma decision rules
.
JAMA Pediatr
.
2019
;
173
(
7
):
648
656
20
Kuppermann
N
,
Holmes
JF
,
Dayan
PS
, et al;
Pediatric Emergency Care Applied Research Network (PECARN)
.
Identification of children at very low risk of clinically-important brain injuries after head trauma: a prospective cohort study
.
Lancet
.
2009
;
374
(
9696
):
1160
1170
21
PECARN
.
Study datasets.
Available at: http://www.pecarn.org/studyDatasets/StudyDetails?studyID=20. Accessed November 5, 2017
22
Mahajan
P
,
Kuppermann
N
,
Suarez
N
, et al;
Febrile Infant Working Group for the Pediatric Emergency Care Applied Research Network (PECARN)
.
RNA transcriptional biosignature analysis for identifying febrile infants with serious bacterial infections in the emergency department: a feasibility study
.
Pediatr Emerg Care
.
2015
;
31
(
1
):
1
5
23
Maniaci
V
,
Dauber
A
,
Weiss
S
,
Nylen
E
,
Becker
KL
,
Bachur
R
.
Procalcitonin in young febrile infants for the detection of serious bacterial infections
.
Pediatrics
.
2008
;
122
(
4
):
701
710
24
Gomez
B
,
Bressan
S
,
Mintegi
S
, et al
.
Diagnostic value of procalcitonin in well-appearing young febrile infants
.
Pediatrics
.
2012
;
130
(
5
):
815
822
25
Mahajan
P
,
Ramilo
O
,
Kuppermann
N
.
Application of transcriptional signatures for diagnosis of febrile infants within the pediatric emergency care applied research network (PECARN): protocol number 022. 2012. Available at: http://pecarn.org/studyDatasets/documents/Biosignatures_Protocol_v2.3_9.24.2012.pdf. Accessed September 13, 2018
26
Gabriel
KR
.
The biplot graphic display of matrices with application to principal component analysis
.
Biometrika
.
1971
;
58
(
3
):
453
467
27
Lantz
B
.
Machine Learning with R
, 3rd ed.
Birmingham, United Kingdom
:
Packt Publishing
;
2019
28
Lee
JH
,
Huber
J
 Jr
.
Multiple imputation with large proportions of missing data: how much is too much?
United Kingdom Stata Users’ Group Meeting 2011. Stata User’s Group. 2011;23
.
29
Fisher
KA
,
Landyn
V
,
Lindenauer
PK
,
Walkey
AJ
.
Procalcitonin test availability: a survey of acute care hospitals in Massachusetts
.
Ann Am Thorac Soc
.
2017
;
14
(
9
):
1489
1491
30
Burstein
B
,
Gravel
J
,
Aronson
PL
,
Neuman
MI
;
Pediatric Emergency Research Canada (PERC)
.
Emergency department and inpatient clinical decision tools for the management of febrile young infants among tertiary paediatric centres across Canada
.
Paediatr Child Health
.
2019
;
24
(
3
):
e142
e154
31
Garra
G
,
Cunningham
SJ
,
Crain
EF
.
Reappraisal of criteria used to predict serious bacterial illness in febrile infants less than 8 weeks of age
.
Acad Emerg Med
.
2005
;
12
(
10
):
921
925
32
Leazer
R
,
Erickson
N
,
Paulson
J
, et al
.
Epidemiology of cerebrospinal fluid cultures and time to detection in term infants
.
Pediatrics
.
2017
;
139
(
5
):
e20163268
33
López
V
,
Fernández
A
,
García
S
,
Palade
V
,
Herrera
F
.
An insight into classification with imbalanced data: empirical results and current trends on using data intrinsic characteristics
.
Inf Sci (Ny)
.
2013
;
250
:
113
141
34
Shaikh
N
,
Hoberman
A
,
Hum
SW
, et al
.
Development and validation of a calculator for estimating the probability of urinary tract infection in young febrile children
.
JAMA Pediatr
.
2018
;
172
(
6
):
550
556
35
Calvert
JS
,
Price
DA
,
Chettipally
UK
, et al
.
A computational approach to early sepsis detection
.
Comput Biol Med
.
2016
;
74
:
69
73
36
Taylor
RA
,
Pare
JR
,
Venkatesh
AK
, et al
.
Prediction of in-hospital mortality in emergency department patients with sepsis: a local big data-driven, machine learning approach
.
Acad Emerg Med
.
2016
;
23
(
3
):
269
278
37
Horng
S
,
Sontag
DA
,
Halpern
Y
,
Jernite
Y
,
Shapiro
NI
,
Nathanson
LA
.
Creating an automated trigger for sepsis clinical decision support at emergency department triage using machine learning
.
PLoS One
.
2017
;
12
(
4
):
e0174708
38
Bleeker
SE
,
Moll
HA
,
Steyerberg
EW
, et al
.
External validation is necessary in prediction research: a clinical example
.
J Clin Epidemiol
.
2003
;
56
(
9
):
826
832
39
Beam
AL
,
Manrai
AK
,
Ghassemi
M
.
Challenges to the reproducibility of machine learning models in health care [published online ahead of print January 6, 2020]
.
JAMA
.
doi: 10.1001/jama.2019.20866
40
Fratello
M
,
Tagliaferri
R
. Decision Trees and Random Forests. In:
Ranganathan
S
,
Nakai
K
,
Schönbach
C
,
Gribskov
M
, eds.
Encyclopedia of Bioinformatics and Computational Biology: ABC of Bioinformatics
.
2018
:
374
383

Competing Interests

POTENTIAL CONFLICT OF INTEREST: The authors have indicated they have no potential conflicts of interest to disclose.

FINANCIAL DISCLOSURE: The authors have indicated they have no financial relationships relevant to this article to disclose.

Supplementary data