OBJECTIVE

To develop a highly sensitive and specific blood biomarker panel that identifies febrile children with Kawasaki disease (KD).

METHODS

We tested blood samples from a single-center cohort of KD (n = 50) and control febrile children (n = 100) to develop a biomarker panel from 11 candidates selected by their assay clinical availability. We used machine learning with least absolute shrinkage and selection operator regression to identify 11 blood markers with values incorporated into a model, which provided a binary predictive risk score for KD determined with Youden’s index. We further reduced the model using least angle regression.

RESULTS

Using 10-fold cross-validation with least absolute shrinkage and selection operator regression on these 11 readouts plus patient age resulted in an area under the receiver operating characteristic curve of 0.94 (95% confidence interval [CI]: 0.90–0.98; P <.01). Using Youden’s index, which provided an optimal cut off for a binary predictive risk score, 88 of 97 KD-negative patients were diagnosed negative, and 47 of 50 KD-positive patients were positive, yielding a sensitivity of 0.94 (95% CI: 0.87–1.0) and specificity of 0.91 (95% CI: 0.85–0.96). Least angle regression reduced the final panel to 3 biomarkers: C-reactive protein, NT-proB-type natriuretic peptide, and thyroid hormone uptake. The predictive model then provided an area under the receiver operating characteristic curve of 0.92 (95% CI: 0.87–0.96; P <.001) along with sensitivity and specificity at 86% each.

CONCLUSIONS

Machine learning identified a highly accurate diagnostic model for KD. The reduced model employs 3 biomarkers currently approved by regulatory bodies and performed on platforms commonly used by certified diagnostic laboratories.

Kawasaki disease (KD) is acute systemic vasculitis affecting children and heralded by persistent fever and symptoms of inflammation. Coronary artery vasculitis resulting in dilation and aneurysm formation represents the most serious complication of KD.1  Accordingly, KD is the most common cause of acquired heart disease in children in first-world nations. The formation of giant aneurysms, which might not regress, can impose a lifelong impact on quality of life starting as young as infancy and early childhood. Appropriately, clinical practice guidelines, such as those proffered by the American Heart Association (AHA), recommend prompt recognition and treatment of KD.1  Randomized clinical trials reveal that an infusion of immunoglobulin (IVIG) dramatically reduces the risk of coronary artery disease when provided within the first 10 days of fever onset.2,3  The efficacy of IVIG, when provided beyond 10 days, remains unknown.

However, the practice of confirming a KD diagnosis within this time frame is challenging. In the United States, diagnostic criteria include 4 to 5 days of persistent fever along with the presence of 4 of 5 clinical criteria among erythematous rash, conjunctivitis, typical oral mucous changes, cervical lymphadenopathy, and extremity abnormalities, such as swollen hands and feet, or erythema of palms and soles.1  These symptoms are highly variable, and experts recognize that many KD diagnoses are delayed or missed.1,4,5  Accordingly, in 2004 and again in 2017, the AHA expanded the criteria to diagnose patients with an “incomplete diagnosis,” which includes patients with fever, 2 to 3 clinical criteria, and 3 supportive laboratory criteria.1,6  These criteria include elevated C-reactive protein, elevated white blood cell count, anemia, elevated liver enzymes, low plasma albumin, or elevated platelet count, as well as sterile pyuria. These biomarkers all change within days, often eluding physicians at the time of examination. Additionally, many of these signs and symptoms reveal substantial overlap with other childhood diseases, including Streptococcus, adenovirus, and other viral infections.1,5  In particular, cervical lymphadenopathy in children with KD is often initially misdiagnosed as lymphadenitis and treated with antibiotics, thus delaying treatment with IVIG and increasing the risk of coronary artery dilation and aneurysm formation.5,7 

Thus, KD often evades diagnosis by clinicians who are unfamiliar with or confused by the complex clinical algorithms.5  In addition, a KD diagnosis often requires multiple emergency department visits, even at tertiary centers with dedicated KD programs,8  causing considerable strain on families with a sick child. Practice sites with less KD familiarity are likely to have lower diagnostic success.5 

A clinically useful and easily accessible diagnostic biomarker or panel would overcome many of the challenges inherent in making a timely KD diagnosis. Previous attempts to develop KD biomarkers have used individual blood proteins, as opposed to algorithmic machine learning-derived panels of multiple proteins, and revealed limited specificity.911  Similar attempts to use multiple biomarkers have revealed limited success and required multiple, often complex measuring methods on various platforms.12,13  We used machine learning, a subset of artificial intelligence, to identify multiple biomarkers measured on a commercially viable single platform, which could be used as a model for diagnosing KD. Machine learning methods produce an algorithm or model, which, in turn, produces raw predictions from a number of inputs, such as blood protein concentrations. These predictions can be used as diagnostic or risk scores. Some of these models, if they use multiple inputs, have a “weight” for each input, which corresponds to the predictive importance of each input, and can provide insight into how the model generates its diagnostic score. We used data obtained from a commercially available single platform to develop a panel of biomarkers that are diagnostic for KD, and then further reduced the complexity of the model to accommodate a minimal number of biomarkers while maintaining accuracy. The score generated by this model can be transformed into an easily understood and accurate binary categorization for use by clinicians to diagnose KD.

All study procedures were approved by the hospital institutional review board (IRB). We used a derivation set of samples obtained from a tertiary Pacific Northwest Children’s hospital, consisting of 50 diagnosed KD patients and a control cohort of 100 febrile children. The KD cases in the derivation set were diagnosed by using AHA criteria for either complete or incomplete KD diagnosis. All blood samples were obtained before treatment with IVIG. For the KD cohort, all patient samples were obtained after obtaining parental informed consent. Peer control samples were obtained under a separate IRB-approved protocol and a waiver of consent from patients seen in the emergency department with at least 2 to 3 days of fever who either did not qualify for KD by AHA criteria or received a confirmed alternate diagnosis that was responsible for their fever. Control patients were identified by monitoring the electronic emergency department admission board. Only those with blood samples obtained for evaluation were included. Because of IRB requirements, the only available clinical or demographic variable for control patients was their age and primary indication for emergency room admission. The ages of the patients between the 2 groups were normally distributed and were compared by using the students’ t test.

Samples from both cohorts were either serum or plasma, depending on availability. The sample type for each patient was not determined completely at random: 2 of 50 KD patients (4%) had plasma samples, whereas 72 of 100 febrile controls (72%) had plasma samples (P <.001, Fisher’s exact test).

Samples were stored at −80°C until assayed. We performed an exploratory assay of the blood samples for 42 proteins selected for association with KD by using a Luminex 100/200 xMAP platform. In this exploratory assay, thyroxine-binding globulin (TBG) performed exceptionally well, so 3 additional, more commercially available (as compared with TBG) thyroid function assays (thyroid hormone uptake [TU], thyroid-stimulating hormone [TSH], and free T4 [FT4]) were performed with the Dimension Vista (Siemens) luminescent oxygen channeling assay. This resulted in a discovery panel of 45 total biomarkers (plus patient age; Supplemental Table 4).

Because biomarker readouts exhibit quantitative differences that are inherent with their sample type, we identified which biomarkers had readouts that appeared to be associated with the sample type, so they could be removed from the analysis. Thus, we could be as certain as possible that our diagnostic signal was not due to a difference in the sample type. We examined the statistical association of the readouts by the sample type among the febrile controls only (Wilcoxon rank test, Supplemental Table 5). All biomarkers with a Benjamini and Hochberg false discovery rate ≥0.2 were excluded from our analysis, resulting in the exclusion of 15 biomarkers.

We then reduced these biomarkers to a subset of 11 finalists for further analysis (Table 1) based on their commercial clinical availability. The final 11 markers were chosen to ensure the fastest uptake by the clinical market and must be widely available to most commercial laboratories and medium to large hospitals. To meet this criterion, the markers must be Food and Drug Administration (FDA) cleared and available on the leading diagnostic instrument platforms located in major laboratories and hospitals.

Although each of these biomarkers did not necessarily have a statistically significant association with KD in a univariate comparison, they were included for their potential to make a marginal predictive contribution in concert with other biomarkers. Patient age was retained in the analysis for the same reason. Three patients from the febrile control cohort were excluded because they had insufficient samples, providing a final combined cohort of 147 patients. All individual assay values for the KD patients were compared with febrile controls by using the Wilcoxon rank test.

The raw assay values tend to be exponentially distributed, so to prepare these features for analysis, they were transformed as follows for all patients: (1) log-transformed to more-closely fit a Gaussian distribution, (2) outliers were Winsorized (ie, clipped) at the values of 3 times the median absolute deviation or greater than the upper limit of quantification, and (3) the values were rescaled to a 0-mean, unit-variance distribution. Lastly, any biomarker values less than the lower limit of quantitation were then extrapolated as the limit of quantitation divided by 2. We first examined the diagnostic performance of this panel using 10-fold cross-validation (CV) with the least absolute shrinkage and selection operator (LASSO). Because 10-fold CV can exhibit a high variance with small sample sizes, we then confirmed the results of these analyses with 1000 iterations of Monte Carlo cross-validation (MCCV), with each iteration using a randomly selected 70/30 (training/test) split. We evaluated the models’ performance with the cross-validated area under the receiver operating characteristic curve (AUC). We made binary diagnostic calls (positive/negative) from these scores using a data-driven cutoff, determined by the Youden’s index optimized for sensitivity versus specificity within each CV fold. We calculated additional operating characteristics, including sensitivity and specificity.

We also analyzed the natural variable importance metrics for patient age and the 11 assay readouts used in the 2 cross-validation analyses. LASSO uses shrinkage as a form of feature selection, and we looked at which features were selected during each of the cross-validation folds. In addition, because all assay results were rescaled to share the same 0-mean, unit-variance distribution, the magnitudes of their coefficients were compared to assess their relative predictive importance.

We then investigated the performance of a reduced panel as a proof of concept, selecting the biomarkers with least angle regression (see Supplemental Information), a machine learning method based on forward selection. This set of features constituted the final panel, which was used to train a new diagnostic model on all 147 patients using LASSO, and then evaluated with in-sample validation, in which the model is validated on the same cohort that was used to train the model.

This exploratory reduced panel was subject to further analyses of discrimination, assessing the panel’s calibration with the addition of each biomarker through the minimization of the Akaike14,15  or Bayesian information criteria and goodness of fit in testing16,17  To assess the upward bias inherent with in-sample validation, we also built a model with LASSO using all features (biomarker concentrations and age) in the panel for the combined cohort and evaluated it with in-sample validation. We compared results with our cross-validated results to provide an estimate of bias inherent in the model. All statistics were performed by using R software, version 4.1 (R Foundation for Statistical Computing). All P values are 2-sided, with a value <.05 considered significant.

Eighty percent of the KD patients (n = 40) qualified with complete criteria by AHA guidelines, whereas the remaining 10 were diagnosed as incomplete. Five patients (10%) were infants (age <1 year). KD cases had a mean age of 45.9 (31.7) months compared with 50.6 (37.1) months for controls (n = 100, P = .045). Diagnoses or primary indication for ER admission for the controls were highly variable, but ∼30% had laboratory tests or confirmed clinical findings of a bacterial or viral infection. Twenty percent of control patients had at least 1 clinical criterion for KD diagnosis listed as a primary indication. None of these patients, however, qualified for KD diagnosis.

Results of case-versus-control comparisons for patient age and each protein or biomarker in the full discovery panel are shown in Supplemental Table 4. Patient age plus the 11 final parameters that were selected for the analysis are shown in Table 1. Using 10-fold CV with LASSO on these 11 readouts plus patient age resulted in a cross-validated AUC of 0.94 (95% CI: 0.90–0.98; P <.001; Fig 1). With our cutoff determined by the optimal Youden’s index, 88 of 97 KD-negative patients were diagnosed as negative, and 47 of 50 KD-positive patients were diagnosed as positive, resulting in a cross-validated sensitivity of 0.94 (0.87–1.0) and a cross-validated specificity of 0.91 (0.85–0.96).

We examined which features were selected during each of the cross-validation analyses to assess their predictive importance. All of the variable importance metrics are enumerated in Table 2. The features that were eliminated in at least 1-fold include age (excluded from 9-folds), immunoglobulin-A (IgA; excluded from 7-folds), TSH, (excluded from 1-fold), and FT4 (excluded from 9-folds). When the readout features are normalized to the same distribution, if we use the magnitude of the model coefficients as a surrogate for variable importance, the top 5 features (in decreasing order) are C-reactive protein (CRP), NT-proB-type natriuretic peptide (NT-proBNP), ST2, TBG, and immunoglobulin-M. These top 5 biomarker assays have revealed equivalent results in serum and plasma.

Our follow-up analysis using 1000 iterations of MCCV confirmed these results (Fig 1). This analysis resulted in a mean AUC of 0.93 (0.86–1.0; P <.001). Using a cutoff determined by the optimal Youden’s index results in a mean sensitivity of 0.93 (0.83–1.0) and a mean specificity of 0.87 (0.75–0.98). Only 2 features were selected in all 1000 iterations (CRP and NT-proBNP), and all features were selected at least 80% of the time except age, IgA, TSH, and FT4. As with our 10-fold cross-validated model, the top 5 features were (in decreasing order) CRP, NT-proBNP, ST2, TBG, and immunoglobulin-M.

We sought to identify a reduced biomarker panel that might achieve a similar performance as our cross-validated analysis on the panel of 11 biomarkers plus patient age. Our least angle regression-based feature selection process identified an optimized final panel of 3 biomarkers: NT-proBNP, CRP, and TU. All reveal individual significant differences between the KD and control cohorts (Table 1). The model trained from these 3 biomarkers yielded an in-sample AUC of 0.92 (0.87–0.96; P <.001; Fig 1). Using optimal Youden’s index, we get an in-sample sensitivity of 0.86 (0.76–0.96) and an in-sample specificity of 0.86 (0.79–0.93). The incremental addition of variables improves discrimination while improving calibration for diagnosing the presence of KD, as indicated by a downward trend of Akaike and Bayesian information criteria, and with goodness of fit evidenced through Hosmer-Lemishow testing (Table 3).

We assessed how much bias might be present by conducting in-sample validation on the entire combined cohort using the full panel of 11 assay readouts plus patient age. We created a model from all available data and compared this model’s in-sample performance with the cross-validated AUC we observed with our analyses that employed cross-validation.

With the final model trained on the entire cohort, all features were selected by the machine-learning method except age, IgA, and FT4. The in-sample AUC of this model was 0.97 (0.94–0.99), with an in-sample sensitivity of 0.92 (0.84–1.0) and specificity of 0.91 (0.85–0.96), compared with the cross-validated AUCs of 0.94 (10-fold CV) and 0.93 (MCCV).

We demonstrated that our investigative multibiomarker panel and algorithmically derived model serve to predict acute KD among febrile patients presenting to the ED with an AUC of 0.94 using 10-fold CV. This blood test was highly accurate for discriminating KD from other febrile illnesses in children.

Our exploratory analysis used a data-driven feature selection approach to reduce this panel to 3 FDA-cleared biomarkers that are widely available in commercial laboratories and large hospitals. The biomarkers selected by this process were NT-proBNP, CRP, and TU. This new model revealed clinically applicable performance, although these results were obtained via in-sample validation and, thus, will have some bias. To evaluate potential bias in our data, we repeated the in-sample analysis process with all 11 biomarkers and patient age. The in-sample AUC observed with this model was only slightly higher than using 10-fold CV on the same panel, which suggests the minimization of bias.

Several studies have previously searched for either a specific single biomarker or a multiple-protein panel to diagnose KD. Various protein and cytokine multiplex arrays and shotgun liquid chromatography/mass spectroscopy methods have been employed to identify candidate biomarkers. Additionally, the authors of 1 recent study targeted 2 neutrophil-derived proteins, myeloid-related protein 8/14 and human neutrophil elastase, in addition to CRP as a prospective biomarker panel. The 2 proteins have been identified as potential biomarkers for various autoinflammatory diseases.13  Those study authors evaluated each protein as KD predictors, as well as combined with CRP, and showed various AUC values as high as 0.82, although with relatively poor specificity. Additionally, details regarding analyses for combining protein values are lacking. The 2-index neutrophil proteins are not currently cleared by the FDA or other regulatory bodies.

Tremoulet et al developed a biomarker panel with similar AUC values to ours using Random Forest models. However, the resulting panel required the use of an extended set of 16 clinically available proteins to achieve a performance that was equivalent to our 3-biomarker model.12  Random Forest models are complex ensembles of decision trees, which are more difficult to interpret and implement than the linear model resulting from LASSO. Their panel incorporated CRP but did not include NT-proBNP, which is generally elevated in KD.9,10 

We used FDA-cleared and widely available assays in a panel derived from an artificial intelligence-built algorithm. TBG fit well into the model, although a literature review revealed little if any direct linkage to KD. At least 1 study with just a few KD patients revealed an inverse correlation between free T3 and interleukin-6 levels during acute KD.18  These data supported a hypothesis that autoinflammatory diseases suppress thyroid hormone homeostasis. Accordingly, we performed further analyses with thyroid function assays, which are more commonly run (as compared with TBG) on most hospital laboratory platforms, including TSH, FT4, and TU. The latter fit well into our machine learning-derived model and can substitute for TBG. The TU assay is easily accessible and is performed in most certified laboratories, whereas the TBG assay is not widely available. Therefore, this panel and associated algorithm could more easily progress to commercial viability as all analytes in the panel are readily available now in most commercial labs and medium to large hospitals.

The limited cohort size and the absence of additional external validation cohort(s) represent the principal limitation in this discovery phase study. However, the FDA considers KD a “rare” and orphan disease. Enrollment in a timely manner is subject to the limitations of studying a rare disease. Nevertheless, our cohort sizes are comparable to or even exceed those employed by previous KD biomarker investigations.13  We used a population of convenience as controls. Because of IRB restrictions, we could store only limited demographic and clinical data. We are currently collecting patient samples and clinical data to validate our machine learning-derived model. In those future studies, we will test whether the incorporation of clinical and demographic variables used within AHA algorithms will strengthen the model. We also will refine the control group by requiring that the research patient exhibits at least 1 of the AHA clinical criterion, in addition to fever. Additionally, we will more closely evaluate if specific confounding factors cause false positive and negative results. Proper controls for KD studies, particularly biomarker studies, have been a topic of contention, although febrile patients presenting to the emergency department have been used previously.

Data in this manuscript were presented in abstract form at the Pediatric Academic Societies meeting 2021, and at the International Kawasaki Disease Symposium, 2021.

FUNDING: Funding was provided by local grant from Seattle Children’s Research Institute. The content is solely the responsibility of the authors and does not necessarily represent the official views of the funding organization. The funding organizations had no role in the design, preparation, review, or approval of this paper. Mr Magaret, Dr Barnes, Ms Peters, Ms Rao, and Ms Rhyne are all employees of Prevencio Inc.

CONFLICT OF INTEREST DISCLOSURES: The authors have indicated they have no potential conflicts of interest to disclose.

COMPANION PAPER: A companion to this article can be found online at www.hosppeds.org/cgi/doi/10.1542/hpeds.2022-007079.

Dr Portman conceptualized and designed the study, recruited participants, led data collection, analysis, and interpretation, and drafted the initial manuscript; Mr Magaret and Ms Rhyne contributed to the design of the study and analyzed and interpreted the data; Dr Barnes contributed to the design of the study; Ms Peters and Ms Rao participated in data collection, analysis, and interpretation; and all authors reviewed and revised the manuscript, approved the final manuscript as submitted, and agree to be accountable for all aspects of the work.

1.
McCrindle
BW
,
Rowley
AH
,
Newburger
JW
, et al
;
American Heart Association Rheumatic Fever, Endocarditis, and Kawasaki Disease Committee of the Council on Cardiovascular Disease in the Young; Council on Cardiovascular and Stroke Nursing; Council on Cardiovascular Surgery and Anesthesia; and Council on Epidemiology and Prevention
.
Diagnosis, treatment, and long-term management of Kawasaki disease: a scientific statement for health professionals from the American Heart Association
.
Circulation
.
2017
;
135
(
17
):
e927
e999
2.
Newburger
JW
,
Takahashi
M
,
Beiser
AS
, et al
.
A single intravenous infusion of gamma globulin as compared with four infusions in the treatment of acute Kawasaki syndrome
.
N Engl J Med
.
1991
;
324
(
23
):
1633
1639
3.
Newburger
JW
,
Takahashi
M
,
Burns
JC
, et al
.
The treatment of Kawasaki syndrome with intravenous gamma globulin
.
N Engl J Med
.
1986
;
315
(
6
):
341
347
4.
Minich
LL
,
Sleeper
LA
,
Atz
AM
, et al
;
Pediatric Heart Network Investigators
.
Delayed diagnosis of Kawasaki disease: what are the risk factors?
Pediatrics
.
2007
;
120
(
6
):
e1434
e1440
5.
Wilder
MS
,
Palinkas
LA
,
Kao
AS
, et al
.
Delayed diagnosis by physicians contributes to the development of coronary artery aneurysms in children with Kawasaki syndrome
.
Pediatr Infect Dis J
.
2007
;
26
(
3
):
256
260
6.
Newburger
JW
,
Takahashi
M
,
Gerber
MA
, et al
;
Committee on Rheumatic Fever, Endocarditis, and Kawasaki Disease, Council on Cardiovascular Disease in the Young, American Heart Association
.
Diagnosis, treatment, and long-term management of Kawasaki disease: a statement for health professionals from the Committee on Rheumatic Fever, Endocarditis, and Kawasaki Disease, Council on Cardiovascular Disease in the Young, American Heart Association
.
Pediatrics
.
2004
;
114
(
6
):
1708
1733
7.
Maric
LS
,
Knezovic
I
,
Papic
N
, et al
.
Risk factors for coronary artery abnormalities in children with Kawasaki disease: a 10-year experience
.
Rheumatol Int
.
2015
;
35
(
6
):
1053
1058
8.
Lo
J
,
Gauvreau
K
,
Baker
AL
, et al
.
Multiple emergency department visits for a diagnosis of Kawasaki disease: an examination of risk factors and outcomes
.
J Pediatr
.
2021
;
232
:
127
132.e3
9.
Dionne
A
,
Dahdah
N
.
A decade of NT-proBNP in acute Kawasaki disease, from physiological response to clinical relevance
.
Children (Basel)
.
2018
;
5
(
10
):
141
10.
Dionne
A
,
Meloche-Dumas
L
,
Desjardins
L
, et al
.
N-terminal pro-B-type natriuretic peptide diagnostic algorithm versus American Heart Association algorithm for Kawasaki disease
.
Pediatr Int (Roma)
.
2017
;
59
(
3
):
265
270
11.
Dahdah
N
,
Siles
A
,
Fournier
A
, et al
.
Natriuretic peptide as an adjunctive diagnostic test in the acute phase of Kawasaki disease
.
Pediatr Cardiol
.
2009
;
30
(
6
):
810
817
12.
Tremoulet
AH
,
Dutkowski
J
,
Sato
Y
, et al
.
Novel data-mining approach identifies biomarkers for diagnosis of Kawasaki disease
.
Pediatr Res
.
2015
;
78
(
5
):
547
553
13.
Zandstra
J
,
van de Geer
A
,
Tanck
MWT
, et al
;
EUCLIDS Consortium, PERFORM Consortium and UK Kawasaki Disease Genetics Study Network
.
Biomarkers for the discrimination of acute Kawasaki disease from infections in childhood
.
Front Pediatr
.
2020
;
8
:
355
14.
Glatting
G
,
Kletting
P
,
Reske
SN
, et al
.
Choosing the optimal fit function: comparison of the Akaike information criterion and the F-test
.
Med Phys
.
2007
;
34
(
11
):
4285
4292
15.
Li
W
,
Nyholt
DR
.
Marker selection by Akaike information criterion and Bayesian information criterion
.
Genet Epidemiol
.
2001
;
21
(
Suppl 1
):
S272
S277
16.
Codina
P
,
Lupón
J
,
Borrellas
A
, et al
.
Head-to-head comparison of contemporary heart failure risk scores
.
Eur J Heart Fail
.
2021
;
23
(
12
):
2035
2044
17.
Antunez Muiños
PJ
,
López Otero
D
,
Amat-Santos
IJ
, et al
.
The COVID-19 lab score: an accurate dynamic tool to predict in-hospital outcomes in COVID-19 patients
.
Sci Rep
.
2021
;
11
(
1
):
9361
18.
Hashimoto
H
,
Igarashi
N
,
Yachie
A
, et al
.
The relationship between serum levels of interleukin-6 and thyroid hormone during the follow-up study in children with nonthyroidal illness: marked inverse correlation in Kawasaki and infectious disease
.
Endocr J
.
1996
;
43
(
1
):
31
38

Supplementary data