The field of machine learning involves the study of algorithms that can be used to discover mathematical functions for classification and prediction. Increasing computational power and larger data sets have allowed for the detection of complex, multidimensional associations not discoverable with traditional statistical modeling techniques, such as logistic regression. In this issue of Pediatrics, Ramgopal et al1  apply a mix of traditional statistical techniques and machine learning to the vexing problem of the young infant (≤60 days) with fever. Several clinical decision rules have been developed to help the clinician decide which patients are at low risk of bacterial infection and can forego lumbar puncture and/or hospital admission. The authors used a public use data set from the Pediatric Emergency Care Applied Research Network (PECARN) study of young infants with fever and tested several different models to attempt to improve on PECARN’s own decision rule.2 

The authors were indeed able to improve on the PECARN decision rule, achieving an improvement in specificity from 60% to 75% without a loss of almost perfect sensitivity. They were able to accomplish this despite a small data set by machine learning standards, with only 138 patients with bacterial infection. The most successful model was a random forest incorporating urinalysis, procalcitonin, absolute neutrophil count, and white blood cell count. A random forest, like the classification and regression trees method used by PECARN, is a tree-based algorithm that generates many trees (the authors chose to generate 5000), which are then aggregated to give a risk estimation for each patient. The random forest outperformed other models as well as 2 recent external decision rules for young infants with fever, the Aronson rule3  and the European Step-by-Step rule.4  The Aronson rule used age, urinalysis, and absolute neutrophil count and achieved a sensitivity of 99% but a specificity of only 31%.3  The Step-by-Step rule, which adds procalcitonin and C-reactive protein and includes patients up to 90 days of age, has now been prospectively validated but has a sensitivity of only 92% and a specificity of 47%.4 

Beyond improving existing prediction models, machine learning models may also be used to identify hypotheses that humans have not yet considered. For example, in the principal component analysis (PCA), physician judgment correlated well with procalcitonin levels, which may indicate that subjective clinician impressions are sensing something important that can be measured objectively. Machine learning models could also be combined with advanced laboratory models to identify correlations between clinical phenotypes and RNA biosignatures, which PECARN has also identified.5  Modern electronic health records (EHRs) are replete with data and are now capable of executing real-time machine learning–based models, including dynamic risk analyses.

The authors may have inadvertently reduced the power of their modeling by using assumptions from traditional statistics that may not be required for machine learning. First, they used bivariate analyses to eliminate potential variables from inclusion in model building. This might ignore information that is useful in complex modeling but does not manifest in bivariate analyses. Second, they eliminated cases with missing data. Missing data may provide useful information because, generally, data are not missing at random. Similarly, although PCA is helpful for visualizing relationships between variables, using PCA to eliminate variables from consideration may also result in a loss of predictive ability. This is because PCA is focused on explaining variation in the independent variables with minimal information loss, which may or may not improve prediction of the outcome of interest. The authors could also have tested gradient-boosting machines, another machine learning technique that often outperforms random forests.

Some caveats are in order. First, although these analyses are promising, any prediction model, regardless of the underlying statistical techniques, requires prospective validation in separate target populations.6  Given that the data set used for this study came from large academic pediatric hospitals, validation should include community hospitals, where 80% of pediatric emergencies are treated. Second, these data were prospectively collected and were of research quality. It will be difficult to glean subtle clinical gestalt, such as clinician impression, from the EHR as currently structured. This may limit the capabilities of machine learning in direct application to the EHR, although there has been increasing success in analysis of unstructured data.710  Third, the sensitivity analyses (but not the main model) reveal evidence of overfitting; there are large differences in performance between the derivation and validation samples. This overfitting implies that performance could be improved on out-of-sample populations via algorithm tuning. Finally, models should have face validity, that is, they should make sense to the clinician who will use them and explain them to families through shared decision-making. In this regard, it is helpful that the authors have provided a feature importance graph. This is functionally similar to providing the parameter estimates or adjusted odds ratios for a regression model so that the reader can interpret the relative importance of the variables contributing to the risk determination.

Modern medicine and data science are courting, and they soon will be married. One important application will be decision rules, which provide practitioners with evidence-based approaches to difficult clinical decisions. In many cases, clinical decision rules will be improved by the inclusion of machine learning methods. In this article, the authors identify the potential to unite these sciences to assist clinicians in reducing the burden of unnecessary testing.

PECARN, Pediatric Emergency Care Applied Research NetworkOpinions expressed in these commentaries are those of the authors and not necessarily those of the American Academy of Pediatrics or its Committees.

FUNDING: No external funding.

COMPANION PAPER: A companion to this article can be found online at www.pediatrics.org/cgi/doi/10.1542/peds.2019-4096.

EHR

electronic health record

PCA

principal component analysis

1
Ramgopal
S
,
Horvat
CM
,
Yanamala
N
,
Alpern
ER
.
Machine learning to predict serious bacterial infections in young febrile infants
.
Pediatrics
.
2020
;
146
(
3
):
e20194096
2
Kuppermann
N
,
Dayan
PS
,
Levine
DA
, et al;
Febrile Infant Working Group of the Pediatric Emergency Care Applied Research Network (PECARN)
.
A clinical prediction rule to identify febrile infants 60 days and younger at low risk for serious bacterial infections
.
JAMA Pediatr
.
2019
;
173
(
4
):
342
351
3
Aronson
PL
,
Shabanova
V
,
Shapiro
ED
, et al;
Febrile Young Infant Research Collaborative
.
A prediction model to identify febrile infants ≤60 days at low risk of invasive bacterial infection
.
Pediatrics
.
2019
;
144
(
1
):
e20183604
4
Gomez
B
,
Mintegi
S
,
Bressan
S
,
Da Dalt
L
,
Gervaix
A
,
Lacroix
L
;
European Group for Validation of the Step-by-Step Approach
.
Validation of the “step-by-step” approach in the management of young febrile infants
.
Pediatrics
.
2016
;
138
(
2
):
e20154381
5
Mahajan
P
,
Kuppermann
N
,
Mejias
A
, et al
;
Pediatric Emergency Care Applied Research Network (PECARN)
.
Association of RNA biosignatures with bacterial infections in febrile infants aged 60 days or younger
.
[published correction appears in JAMA. 2016;316(18):1924]
.
JAMA
.
2016
;
316
(
8
):
846
857
6
Moons
KGM
,
Altman
DG
,
Reitsma
JB
, et al
.
Transparent reporting of a multivariable prediction model for individual prognosis or diagnosis (TRIPOD): explanation and elaboration
.
Ann Intern Med
.
2015
;
162
(
1
):
W1
W73
7
Zhang
X
,
Bellolio
MF
,
Medrano-Gracia
P
,
Werys
K
,
Yang
S
,
Mahajan
P
.
Use of natural language processing to improve predictive models for imaging utilization in children presenting to the emergency department
.
BMC Med Inform Decis Mak
.
2019
;
19
(
1
):
287
8
Stein
JD
,
Rahman
M
,
Andrews
C
, et al
.
Evaluation of an algorithm for identifying ocular conditions in electronic health record data
.
JAMA Ophthalmol
.
2019
;
137
(
5
):
491
497
9
Ben Miled
Z
,
Haas
K
,
Black
CM
, et al
.
Predicting dementia with routine care EMR data
.
Artif Intell Med
.
2020
;
102
:
101771
10
Roquette
BP
,
Nagano
H
,
Marujo
EC
,
Maiorano
AC
.
Prediction of admission in pediatric emergency department with deep neural networks and triage textual data
.
Neural Netw
.
2020
;
126
:
170
177

Competing Interests

POTENTIAL CONFLICT OF INTEREST: Drs J.M. Chamberlain and Zorc report that they participated in the Pediatric Emergency Care Applied Research Network biosignatures study and contributed data to the data set used to create the Pediatric Emergency Care Applied Research Network decision rule and to the public use data set used for the article for which they are providing commentary; and Mr D.B. Chamberlain has indicated he has no potential conflicts of interest to disclose.

FINANCIAL DISCLOSURE: The authors have indicated they have no financial relationships relevant to this article to disclose.