Several prediction models have been reported to identify patients with radiographic pneumonia, but none have been validated or broadly implemented into practice. We evaluated 5 prediction models for radiographic pneumonia in children.
We evaluated 5 previously published prediction models for radiographic pneumonia (Neuman, Oostenbrink, Lynch, Mahabee-Gittens, and Lipsett) using data from a single-center prospective study of patients 3 months to 18 years with signs of lower respiratory tract infection. Our outcome was radiographic pneumonia. We compared each model’s area under the receiver operating characteristic curve (AUROC) and evaluated their diagnostic accuracy at statistically-derived cutpoints.
Radiographic pneumonia was identified in 253 (22.2%) of 1142 patients. When using model coefficients derived from the study dataset, AUROC ranged from 0.58 (95% confidence interval, 0.52–0.64) to 0.79 (95% confidence interval, 0.75–0.82). When using coefficients derived from original study models, 2 studies demonstrated an AUROC >0.70 (Neuman and Lipsett); this increased to 3 after deriving regression coefficients from the study cohort (Neuman, Lipsett, and Oostenbrink). Two models required historical and clinical data (Neuman and Lipsett), and the third additionally required C-reactive protein (Oostenbrink). At a statistically derived cutpoint of predicted risk from each model, sensitivity ranged from 51.2% to 70.4%, specificity 49.9% to 87.5%, positive predictive value 16.1% to 54.4%, and negative predictive value 83.9% to 90.7%.
Prediction models for radiographic pneumonia had varying performance. The 3 models with higher performance may facilitate clinical management by predicting the risk of radiographic pneumonia among children with lower respiratory tract infection.
Several prediction models (using clinical and laboratory data) have been described to identify patients with radiographic pneumonia among children with suspected lower respiratory tract infections. None have been externally validated, and thus are infrequently applied in practice.
We externally validated 5 prediction models for radiographic pneumonia in children using data from a prospective cohort study. Of these, 3 models (2 using clinical and physical examination characteristics, and another which includes C-reactive protein) demonstrated satisfactory performance.
Pneumonia is the among the most common conditions encountered among children presenting to United States emergency departments (EDs).1 Although routine use of chest radiographs (CXR) for outpatient pneumonia is not recommended by the Pediatric Infectious Diseases Society and Infectious Diseases Society of America pediatric pneumonia guideline,2 they are performed for approximately 80% of patients with pneumonia in pediatric hospitals.3 Extensive use of CXR leads to increased radiation exposure, unclear or conflicting findings,4,5 patient or caregiver inconvenience,6 and increased health care costs.7 Furthermore, use of antibiotics remains high in children for suspected pneumonia, despite recommendations from professional societies.8–9 However, as the negative predictive value of CXR is high10 ; antibiotics can be avoided in the case of normal imaging, which occurs substantially more often than an abnormal radiograph.
Accurate prediction models for children presenting to the ED may reduce unnecessary CXR use and promote antimicrobial stewardship. Previously published models incorporated historical, physical examination, and laboratory components to generate a predicted probability of disease. For example, if a model predicts that a patient has a high probability of pneumonia, CXR may be avoided unless a provider suspects an alternative pathology or disease complications, and empirical antibiotics may be used based on suspicion of bacterial disease. Conversely, if radiographic pneumonia is unlikely, then the provider may also choose to avoid obtaining a radiograph and consider alternative diagnoses.
Several clinical prediction models have been developed in children to assist in the prediction of radiographic pneumonia, either in isolation11–16 or with other bacterial infections,17–20 using a combination of historical, physical examination, and laboratory characteristics. Models typically perform worse in new populations than in the development cohort.21 Therefore, prediction models require external validation using distinct, high-quality data sources external to the original derivation cohort before implementation to establish their accuracy. We sought to validate previously published models for pediatric radiographic pneumonia using a prospective cohort of children presenting to the ED with suspected pneumonia.
Methods
Study Design
We performed a secondary analysis of a prospective cohort study, Catalyzing Ambulatory Research in Pneumonia Etiology and Diagnostic Innovations in Emergency Medicine (CARPE DIEM), which was conducted at Cincinnati Children’s Hospital Medical Center (CCHMC) ED between July 2013 and December 2017. The CCHMC ED is part of a tertiary care specialty pediatric hospital that evaluates an average of 61 990 pediatric encounters per year between 2013 and 2017, of which 823 per year (1.3%) had a diagnosis of pneumonia. The CCHMC and Ann and Robert H Lurie Children’s Hospital Institutional Review Boards approved this study. A radiographic pneumonia prediction model was previously derived and published using data from CARPE DIEM.15
Patient Inclusion and Data Collection
Patients 3 months to 18 years of age with signs and symptoms of lower respiratory tract infection lower respiratory tract infection (defined based on previous work as new or different cough or sputum production, chest pain, dyspnea, tachypnea, or abnormal auscultatory findings)22 and who had CXR performed for clinical suspicion of community-acquired pneumonia (CAP). We excluded patients with a recent (≤14 days) hospitalization, history of aspiration, medically complex conditions (eg, immunodeficiency, chronic corticosteroid use, chronic lung disease, malignancy, sickle cell disease, congenital heart disease, tracheostomy use, and neuromuscular disorders impacting respiration). Potential patients were prospectively enrolled by research coordinators who used a computerized ED tracking board and who then collaborated with the treating physician to confirm eligibility criteria before enrollment. Research coordinators obtained informed consent from caregivers, and assent from children ≥11 years old. Medical history was collected from patients and guardians by the research coordinators, and physical examination data were collected by the clinical care team.
Outcome Measures
CXRs were independently interpreted by 2 radiologists masked to clinical information. Radiologists classified CXRs into 1 of 4 categories: (1) normal lungs, (2) definite atelectasis, (3) atelectasis versus pneumonia, and (4) definite pneumonia. We defined our primary outcome, radiographic pneumonia, as an interpretation of atelectasis versus pneumonia or definite pneumonia. We included equivocal radiographs in our outcome measure based on previous literature, which suggests that most clinicians prescribe antibiotics in these cases.23
Models Tested
We evaluated 5 previously published models for pneumonia: Mahabee-Gittens et al,13 Lynch et al,11 Neuman et al,14 Oostenbrink et al,12 and Lipsett et al.16 These models were chosen as they used an outcome of radiographic pneumonia, in contrast to other models that included pneumonia as a part of a composite variable defining serious bacterial infections.17–20 Models were derived from prospective studies, defined radiographic pneumonia as the outcome, and had similar inclusion criteria, although age ranges varied (Table 1). When validating a model with a narrower age range, the validation data set was limited to the appropriate age range for each associated model.
Decision Rule, Year . | Age Included, y . | Pneumonia Prevalence in Derivation Study, % (n/N) . | Historical Variables . | Physical Examination or Laboratory Variables . |
---|---|---|---|---|
Lynch et al,11 2004 | 1–16 | 35.7 (204/571) | None | Fever (≥38°C), decreased breath sounds, auscultatory crackles, tachypneaa |
Mahabee-Gittens et al,13 2005 | 2 mo–5 | 8.6 (44/510) | Age >12 mo | Respiratory rate ≥50, oxygen saturation ≤96, nasal flaring |
Neuman et al,14 2011 | 0–18 | 16.4 (422/2574) | Difficulty breathing, chest pain, duration of fever (classified as none, ≤72 h, and >72 h), and duration of cough (classified as none, ≤72 h, and >72 h) | Wheezing, respiratory distress, tachypnea at triagea, retractions, grunting, focal or decreased breath sounds, rales (diffuse or focal), focal rales, focal wheeze, fever at triage (≥38°C), oxygen saturation at triage (classified as 97% to 100%; 93% to 96%, ≤92%) |
Oostenbrink et al,12 2013 | 1 mo–16 | Three study populations: 15.4 (78/504), 13.8 (58/420), 7.3 (27/366) | None | Ill-appearance, tachypneaa, oxygen saturation <94%, CRP |
Lipsett et al,16 2021 | 3 mo–18 | 17.4 (206/1181) | Age, fever at home | Triage oxygen saturation, fever in the ED (≥38°C), rales, wheeze |
Decision Rule, Year . | Age Included, y . | Pneumonia Prevalence in Derivation Study, % (n/N) . | Historical Variables . | Physical Examination or Laboratory Variables . |
---|---|---|---|---|
Lynch et al,11 2004 | 1–16 | 35.7 (204/571) | None | Fever (≥38°C), decreased breath sounds, auscultatory crackles, tachypneaa |
Mahabee-Gittens et al,13 2005 | 2 mo–5 | 8.6 (44/510) | Age >12 mo | Respiratory rate ≥50, oxygen saturation ≤96, nasal flaring |
Neuman et al,14 2011 | 0–18 | 16.4 (422/2574) | Difficulty breathing, chest pain, duration of fever (classified as none, ≤72 h, and >72 h), and duration of cough (classified as none, ≤72 h, and >72 h) | Wheezing, respiratory distress, tachypnea at triagea, retractions, grunting, focal or decreased breath sounds, rales (diffuse or focal), focal rales, focal wheeze, fever at triage (≥38°C), oxygen saturation at triage (classified as 97% to 100%; 93% to 96%, ≤92%) |
Oostenbrink et al,12 2013 | 1 mo–16 | Three study populations: 15.4 (78/504), 13.8 (58/420), 7.3 (27/366) | None | Ill-appearance, tachypneaa, oxygen saturation <94%, CRP |
Lipsett et al,16 2021 | 3 mo–18 | 17.4 (206/1181) | Age, fever at home | Triage oxygen saturation, fever in the ED (≥38°C), rales, wheeze |
CRP, C-reactive protein; ED, emergency department; WHO, World Health Organization.
For Neuman, tachypnea was defined as respiratory rate of >60 breaths per min for age of <2 y, >50 breaths per min for age of 2 to 4.9 y, >30 breaths per min for age of 5 to 9.9 y, and >24 breaths per min for age of 10 to 21.9 y. For Oostenbrink, WHO24 cutoffs were used, and for Lynch, Pediatric Risk of Admission25 criteria were used.
We matched variables within the validation dataset to those in the derivation studies. We categorized continuous variables using the same criteria from each derivation study. For the classification of tachypnea, we used guidelines provided by the model for each definition: Neuman et al used age-specific thresholds, Oostenbrink used World Health Organization criteria,24 Lynch et al used Pediatric Risk of Admission criteria,25 and Mahabee-Gittens used a criteria of >50 respirations per minute. For the variable of “ill appearance” described by Neuman and Oostenbrink, we used a variable for general appearance based on exam by an ED clinician, which was classified into 5 categories: well, mildly ill or distressed, moderately ill or distressed, and severely ill or distressed, and designated the presence of mild, moderate or severely ill or distressed as “ill appearing.”12,14
Analysis
We imputed missing data via multiple imputation by chained equations, creating 5 imputed data sets over which all subsequent results were averaged.26 We generated predicted probabilities for each of the previously published models under 2 methods: (1) using the values of the regression coefficients as originally published (“coefficients as published”), and (2) using the CARPE DIEM dataset to estimate new regression coefficients for the variables included in each model (“coefficient derived from data”). For 3 models,11,13,14 model intercepts were not published. We estimated intercepts for these models in the “coefficients as published” analysis by fixing the regression coefficients of the included variables at their published values and estimating the intercepts on the CARPE DIEM data set.
We conducted receiver operating characteristic analyses of the predicted probabilities generated by each model and calculated area under the receiver operating characteristic curve (AUROC) with 95% confidence intervals (95% CI). We constructed calibration graphs of the predicted probabilities against the observed prevalence of pneumonia in both continuous and decile-categorized formats.27,28 We identified optimal cutoffs for the predicted probabilities from each model using the Euclidean distance method29 and compared diagnostic accuracy statistics (sensitivity, specificity, positive and negative predictive values, positive and negative likelihood ratios).
Given the large proportion of CARPE DIEM patients with missing c-reactive protein (CRP) data, we repeated our validation analyses in only those patients with an observed CRP (ie, not imputed) to validate with the Oostenbrink model. Additionally, given potential differences in CAP etiology by age, we evaluated the performance of all models for the subset of children <5 years of age. Analysis was performed with R, version 4.0.3 (R Foundation for Statistical Computing, Vienna, Austria).
Results
Patient Inclusion
Of 1142 patients enrolled, the median patient age was 3.3 years (interquartile range, 1.4–7.1 years) and 54% were male. Radiographic pneumonia was found in 253 (22%) patients (203 with definite pneumonia and 50 with pneumonia versus atelectasis). Characteristics of the cohort, with rates of missing data before imputation, are provided in Table 2.
Characteristic . | Summary (N = 1142), Number (%) or Median [IQR] . |
---|---|
Demographic | |
Age | 3.3 [1.4–7.1] |
Male sex | 622 (54) |
Historical | |
Fever | 996 (87) |
Days of fever | 2 [1–4] |
Cough | 1099 (96) |
Difficulty breathing | 930 (81) |
Fully immunized | 1062 (93) |
Days of illness | 4 [2–7] |
Vomiting | 585 (51) |
Wheezing | 737 (65) |
Rapid breathing | 848 (74) |
Rhinorrhea | 949 (83) |
Chest pain | 350 (31) |
Abdominal pain | 362 (32) |
Decreased oral intake | 714 (63) |
Decreased urine output | 117 (10) |
Smoke exposure | 482 (42) |
Pneumonia history | 251 (22) |
Past pneumonia hospitalization | 101 (40) |
Asthma | 365 (32) |
Physical examination | |
Temperature (degrees Celsius) | 37.6 [37–38.3] |
RR | 36 [28–48] |
HR | 142 [123–160] |
SBP | 114 [105–123] |
Oxygen saturation | 96 [94–98] |
Retractions | 488 (4) |
Grunting | 78 (7) |
Nasal flaring | 127 (12) |
Head nodding | 34 (3) |
Abdominal pain | 104 (10) |
Crackles or rales | |
None | 761 (69) |
Focal | 240 (22) |
Diffuse | 107 (10) |
Rhonchi | |
None | 715 (64) |
Focal | 83 (7) |
Diffuse | 311 (28) |
Wheezing | |
None | 776 (70) |
Focal | 38 (3) |
Diffuse | 296 (27) |
Decreased breath sounds | |
None | 729 (66) |
Focal | 257 (23) |
Diffuse | 123 (11) |
CRP (mg/L) | 5.2 [1.1–6.7] |
Characteristic . | Summary (N = 1142), Number (%) or Median [IQR] . |
---|---|
Demographic | |
Age | 3.3 [1.4–7.1] |
Male sex | 622 (54) |
Historical | |
Fever | 996 (87) |
Days of fever | 2 [1–4] |
Cough | 1099 (96) |
Difficulty breathing | 930 (81) |
Fully immunized | 1062 (93) |
Days of illness | 4 [2–7] |
Vomiting | 585 (51) |
Wheezing | 737 (65) |
Rapid breathing | 848 (74) |
Rhinorrhea | 949 (83) |
Chest pain | 350 (31) |
Abdominal pain | 362 (32) |
Decreased oral intake | 714 (63) |
Decreased urine output | 117 (10) |
Smoke exposure | 482 (42) |
Pneumonia history | 251 (22) |
Past pneumonia hospitalization | 101 (40) |
Asthma | 365 (32) |
Physical examination | |
Temperature (degrees Celsius) | 37.6 [37–38.3] |
RR | 36 [28–48] |
HR | 142 [123–160] |
SBP | 114 [105–123] |
Oxygen saturation | 96 [94–98] |
Retractions | 488 (4) |
Grunting | 78 (7) |
Nasal flaring | 127 (12) |
Head nodding | 34 (3) |
Abdominal pain | 104 (10) |
Crackles or rales | |
None | 761 (69) |
Focal | 240 (22) |
Diffuse | 107 (10) |
Rhonchi | |
None | 715 (64) |
Focal | 83 (7) |
Diffuse | 311 (28) |
Wheezing | |
None | 776 (70) |
Focal | 38 (3) |
Diffuse | 296 (27) |
Decreased breath sounds | |
None | 729 (66) |
Focal | 257 (23) |
Diffuse | 123 (11) |
CRP (mg/L) | 5.2 [1.1–6.7] |
Missing data were present for the following variables (n): immunization status (5), heart rate (1), systolic blood pressure (85), oxygen saturation (38), retractions (31), grunting (34), nasal flaring (38), head nodding (33), abdominal pain (82), crackles (34), rhonchi (33), wheezing (32), decreased breath sounds (33), and CRP (685).
Outcome of Radiographic Pneumonia
The models by Neuman and Lipsett included the whole CARPE DIEM cohort. The remaining 4 were only applied to subsets of the cohort because of different age ranges used in the original models. The percentage of patients with radiographic pneumonia were similar in the validation cohorts for the models of Neuman, Lipsett, Oostenbrink, and Lynch (22% to 24%), but was lower in the validation cohort of the model of Mahabee-Gittens (12%). Supplemental Table 4 describes differences in the variables included in model between those with and without radiographic pneumonia for their respective age-based validation cohorts.
Model Characteristics
Using the coefficients as published, the highest AUROC was demonstrated by the models of Neuman and Lipsett (0.72, 95% CI 0.68–0.75 for both models, Fig 1). The AUROCs for the remaining 3 models were substantially lower than the Neuman and Lipsett models. When utilizing coefficients estimated from the CARPE DIEM dataset, the model of Neuman exhibited the highest AUROC (0.79, 95% CI 0.75–0.82) followed by Lipsett (0.76, 95% CI 0.73–0.80). The AUROC for the model of Oostenbrink increased substantially when coefficients were estimated from CARPE DIEM (from 0.53 [95% CI 0.49–0.57] to 0.72 [95% CI 0.68–0.77]), whereas the models of Lynch and Mahabee-Gittens exhibited modest increases.
In the coefficients-as-published analysis, the model of Neuman demonstrated a sensitivity of 70.0% and specificity of 65.4% and the models of Lynch and Lipsett demonstrated high sensitivity (83.0% and 81.7%, respectively) with lower specificity (30.0% and 52.6% respectively). The Oostenbrink model exhibited lower sensitivity (63.4%) and specificity (49.8%). The model of Mahabee-Gittens classified nearly the entire cohort (670 of 725 cases, or 92%) as positive for pneumonia, resulting in a high sensitivity (95.3%) but low specificity (8.0%). In the coefficients derived from the CARPE DIEM cohort, 1 or more performance characteristics improved for each of the models (Table 3).
. | Coefficients as Published . | Coefficients Derived From CARPE DIEM Dataset . | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|
. | Sensitivity . | Specificity . | PPV . | NPV . | LR+ . | LR− . | Sensitivity . | Specificity . | PPV . | NPV . | LR+ . | LR− . |
Neuman, et al14 | 70.0 | 65.4 | 36.5 | 88.4 | 2.02 | 0.46 | 69.6 | 77.1 | 46.3 | 89.9 | 3.03 | 0.39 |
Oostenbrink, et al12 | 63.4 | 49.8 | 26.3 | 82.9 | 1.26 | 0.73 | 52.8 | 87.5 | 54.4 | 86.8 | 4.23 | 0.54 |
Lynch, et al11 | 83.0 | 30.0 | 27.7 | 84.5 | 1.19 | 0.57 | 70.4 | 49.9 | 31.3 | 83.9 | 1.41 | 0.59 |
Mahabee-Gittens, et al13 | 95.3 | 8.0 | 12.2 | 92.7 | 1.04 | 0.58 | 51.2 | 64.0 | 16.1 | 90.7 | 1.42 | 0.76 |
Lipsett, et al16 | 81.7 | 52.6 | 33.0 | 90.9 | 1.72 | 0.35 | 60.1 | 79.9 | 45.9 | 87.5 | 2.98 | 0.50 |
. | Coefficients as Published . | Coefficients Derived From CARPE DIEM Dataset . | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|
. | Sensitivity . | Specificity . | PPV . | NPV . | LR+ . | LR− . | Sensitivity . | Specificity . | PPV . | NPV . | LR+ . | LR− . |
Neuman, et al14 | 70.0 | 65.4 | 36.5 | 88.4 | 2.02 | 0.46 | 69.6 | 77.1 | 46.3 | 89.9 | 3.03 | 0.39 |
Oostenbrink, et al12 | 63.4 | 49.8 | 26.3 | 82.9 | 1.26 | 0.73 | 52.8 | 87.5 | 54.4 | 86.8 | 4.23 | 0.54 |
Lynch, et al11 | 83.0 | 30.0 | 27.7 | 84.5 | 1.19 | 0.57 | 70.4 | 49.9 | 31.3 | 83.9 | 1.41 | 0.59 |
Mahabee-Gittens, et al13 | 95.3 | 8.0 | 12.2 | 92.7 | 1.04 | 0.58 | 51.2 | 64.0 | 16.1 | 90.7 | 1.42 | 0.76 |
Lipsett, et al16 | 81.7 | 52.6 | 33.0 | 90.9 | 1.72 | 0.35 | 60.1 | 79.9 | 45.9 | 87.5 | 2.98 | 0.50 |
LR+, positive likelihood ratio; LR−, negative likelihood ratio; NPV, negative predictive value; PPV, positive predictive value.
None of the models calibrated well using the coefficients as published, each predicting higher risk of pneumonia than what was observed, and an observed risk of pneumonia that did not increase with predicted risk (Supplemental Fig 3). Calibration for all models substantially improved when the coefficients were derived from CARPE DIEM data. Model performance at intervals of predicted risk is provided in Fig 2. The model of Neuman calibrated best among the 5 models, followed by the model of Lipsett. A notable feature of the Lynch, Mahabee-Gittens, and, to a lesser extent, Oostenbrink models was the limited range of predicted probabilities exhibited by each when applied to the validation dataset.
In an additional analysis, externally validating the Oostenbrink model among the 432 CARPE DIEM patients with CRP data available, the AUROC of originally published coefficients was 0.55 (95% CI 0.49–0.60), which improved to 0.75 (95% CI 0.70–0.80) when using coefficients derived from the CARPE DIEM dataset. Model discriminatory performance declined slightly when analyses were limited to children <5 years of age (Supplemental Fig 4) and in the performance at an optimally-selected cutpoint (Supplemental Table 5).
Discussion
We externally evaluated 5 previously published prediction models for radiographic pneumonia using a prospective cohort of children with suspected CAP. Of these, the models reported by Neuman, Lipsett, and Oostenbrink demonstrated the highest performance, with AUROCs >0.7 after fitting model variables to the validation dataset. No model provided clear discrimination between patients with and without radiographic pneumonia at a single cutoff.
The Neuman, Lipsett, and Oostenbrink models demonstrated the highest performance in this external validation, though the variables used in these models are different. The Neuman model requires extensive data, including historical and physical examination variables.14 Some of these variables, such as chest pain, may be difficult to ascertain in younger patients. Others, such as those relating to accessory muscle use and auscultatory findings, may be challenging to ascertain (eg, patients with high body mass index) and may be subject to limitations of interrater reliability.30 The model reported by Lipsett retained excellent performance with fewer clinical variables (age, presence of fever, wheeze, rales, and oxygen desaturation). Fewer clinical data are required by the Oostenbrink model, though this model requires the performance of a laboratory test (CRP) and 1 of the measures (ill-appearance) is inherently subjective.12 Reliance on a blood biomarker raises challenges with respect to institutional availability of point of care testing, time and cost, and discomfort required for venipuncture. Furthermore, the ability to apply the Oostenbrink model in a less ill-appearing population may be challenging, as blood testing is infrequently performed in these patients.
Predictive performance was weaker for the other 2 models we studied. The study by Mahabee-Gittens had limited performance after fitting the model with covariates derived from our dataset. This is likely because of the Mahabee-Gittens cohort being limited to a younger patient sample (2 months–5 years) where the radiographic pneumonia is less common.13 Additionally, the prediction of radiographic pneumonia among younger children is challenging because of the high incidence of bronchiolitis and viral pneumonia, which have similar clinical manifestations but variable findings on CXR.31 This combination of factors underscores the challenges in developing a prediction model for pediatric pneumonia limited to young children.
By externally validating decision models for radiographic pneumonia using a population of patients distinct in time and place but with a similar inclusion criteria from the derivation studies, this study demonstrates the limited generalizability of some of the studied models.32 A decline in performance is generally expected when externally validating a model in a population distinct from the derivation cohort.21 The Mahabee-Gittens model, for example, had a decline in the AUROC from the derivation study (0.81; 95% CI 0.75–0.87) compared with our validation study (0.58; 95% CI 0.52–0.64). In contrast, the model by Lipsett demonstrated an increase in AUROC from the original derivation study (0.71, 95% CI 0.67–0.75) to the present external validation (0.76; 95% CI 0.73–0.80), suggesting that this model may have better transportability. As such, this model, which carries the benefit of having only 6 variables and does not require laboratory biomarkers, may be most beneficial in clinical practice. If the predicted probability of disease is low, then other disease states may be considered, and radiography may be avoided. Alternatively, if the predicted probability is high, the patient may be treated for pneumonia without a confirmatory chest radiography.
Many clinical prediction models in emergency medicine, such as those used for young febrile infants for serious bacterial infection33 or for children with head trauma for clinically important traumatic brain injury,34 are used in a “1-way” fashion to determine patients at a low risk of an outcome but generally cannot be used to determine high risk. None of the radiographic pneumonia models we examined provided satisfactory discrimination between patients with and without radiographic pneumonia at a single threshold using a binary cutoff. This challenge, additionally noted in the CARPE DIEM rule,15 means that no clear, singular cutoff can be used to characterize patients as low- or high-risk in a dichotomous fashion to make decisions about CXR or empirical antimicrobial use. This likely relates to the heterogeneity of pneumonia presentation among children, differences in disease etiology, and variability in radiograph interpretation. At the statistically derived cutoffs, the Oostenbrink model demonstrated the greatest ability to “rule in” pneumonia with the highest specificity and positive likelihood ratio. The Neuman model demonstrated moderate ability to rule in but the highest performance to rule out, as reflected by the sensitivity and negative likelihood ratio. The internally derived CARPE DIEM radiographic pneumonia prediction model contains 3 variables: duration of fever, focal decreased breath sounds, and age.15 Evaluated on this subset of patients, this model demonstrated an AUROC of 0.81 (95% CI 0.66–0.84). At an optimally selected cutpoint, this model demonstrated a sensitivity of 73.1% and a specificity of 75.8%, though this model has not yet been externally validated. None of the rules demonstrated a likelihood ratio of greater than 5 or a negative likelihood ratio less than 0.2, which is generally suggestive of a substantial change in pretest to posttest probability. Although performance of the models declined slightly when models were examined in children younger than 5 years of age, most of these were not derived specifically in this age group, and thus a decline in performance is expected.
Prediction rules for pneumonia are not widely used in clinical practice. The reasons for their lack of use may include the lack of external validation, inability to sufficiently differentiate patients with and without radiographic pneumonia with a single threshold, and challenges related to implementation of best practice regarding defining diagnosis and etiology. Successful implementation would potentially decrease overuse of CXR3 and antibiotics35 and facilitate a decrease in unnecessary practice variation.36 We have taken the initial step to externally validate several of these previously derived models. A prediction rule may be implemented in multiple ways. Online calculators, such as in 1 recently described for urinary tract infection risk stratification in young febrile children,37 may allow for the dynamic risk stratification of children considered for a specific disease wherein the logistic regression formula is applied, and a predicted probability is returned to the user. Alternatively, a model may be directly embedded into the electronic medical record, similar to efforts that have been described with pediatric sepsis.38 These predicted probabilities may in turn guide management decisions (eg, a decision to perform CXR in children at intermediate risk, or a decision regarding use of antibiotics without CXR in children at very low or high risk). Finally, a future step may be to evaluate the validity of these models in other settings, including for children seen in the primary care offices or for admitted patients.
Our study has limitations. Certain variables, such as ill appearance used in Oostenbrink were not objectively defined. In addition, most of the studied models did not include complete data required for external validation (ie, model coefficients and intercepts),11,12,14 requiring additional steps to estimate the model performance on the study data. As such, we used 2 analytical approaches toward external validation. We performed multiple imputation for missing data in the CARPE DIEM dataset. This was particularly important in validating the Oostenbrink where a larger proportion of patients had missing CRP. However, the performance of the Oostenbrink model even when limited to those with CRP obtained was similar to the primary analysis performed with imputation. Additionally, all the models studied were developed for ED use and were validated on the data from another pediatric ED. Therefore, the applicability of these model to other settings remains unknown.
Implementation of well validated models can facilitate CXR or antibiotic use, particularly when CXR is not easily available. By providing a probability of radiographic pneumonia based on clinical factors, clinicians can integrate an evidence-based risk estimate into their clinical decision-making. In this study, 3 models revealed superior performance during validation.
Dr Ramgopal designed the study, interpreted the data, and drafted the initial manuscript, Drs Navanandan, Cotter, Ambroggio, Shah, and Ruddy conceptualized the study, designed the data collection instruments and participated in data collection, interpreted the results, and reviewed and revised the manuscript; Dr Lorenz conducted the statistical analyses and reviewed and revised the manuscript; Dr Florin conceptualized the study, designed the data collection instruments and participated in data collection, interpreted the results, reviewed and revised the manuscript, and supervised the study; and all authors approved the final manuscript as submitted and agree to be accountable for all aspects of the work.
This study externally validates 5 prediction models for pediatric radiographic pneumonia.
FUNDING: This study was supported by the National Institutes of Health and National Institute of Allergy and Infectious Diseases (K23AI121325 and R03AI147112 to T.A.F. and K01AI125413 to L.A.), the Gerber Foundation (to T.A.F.), National Institute of Health and National Center for Research Resources and Cincinnati Center for Clinical and Translational Science and Training (5KL2TR000078 to T.A.F.). The funders did not have any role in study design, data collection, statistical analysis, or manuscript preparation. Funded by the National Institutes of Health (NIH).
CONFLICT OF INTERST DISCLOSURES: The authors have indicated they have no financial relationships relevant to this article.
Comments