Clinical sign algorithms are a key strategy to identify young infants at risk of mortality.
Synthesize the evidence on the accuracy of clinical sign algorithms to predict all-cause mortality in young infants 0–59 days.
MEDLINE, Embase, CINAHL, Global Index Medicus, and Cochrane CENTRAL Registry of Trials.
Studies evaluating the accuracy of infant clinical sign algorithms to predict mortality.
We used Cochrane methods for study screening, data extraction, and risk of bias assessment. We determined certainty of evidence using Grading of Recommendations Assessment Development and Evaluation.
We included 11 studies examining 26 algorithms. Three studies from non-hospital/community settings examined sign-based checklists (n = 13). Eight hospital-based studies validated regression models (n = 13), which were administered as weighted scores (n = 8), regression formulas (n = 4), and a nomogram (n = 1). One checklist from India had a sensitivity of 98% (95% CI: 88%–100%) and specificity of 94% (93%–95%) for predicting sepsis-related deaths. However, external validation in Bangladesh showed very low sensitivity of 3% (0%–10%) with specificity of 99% (99%–99%) for all-cause mortality (ages 0–9 days). For hospital-based prediction models, area under the curve (AUC) ranged from 0.76–0.93 (n = 13). The Score for Essential Neonatal Symptoms and Signs had an AUC of 0.89 (0.84–0.93) in the derivation cohort for mortality, and external validation showed an AUC of 0.83 (0.83–0.84).
Heterogeneity of algorithms and lack of external validation limited the evidence.
Clinical sign algorithms may help identify at-risk young infants, particularly in hospital settings; however, overall certainty of evidence is low with limited external validation.
Globally, nearly 2.3 million neonates die each year,1 with 80% of deaths occurring in Sub-Saharan Africa and South Asia.2 A significant proportion of these deaths are due to potentially preventable causes, such as prematurity, infections, and birth asphyxia.3 The timely identification of severe illness among high-risk young infants aged 0 to 59 days in the community, as well as in health care facilities, is crucial for accurate diagnosis and the initiation of appropriate management to prevent mortality and other negative health outcomes.4 Clinical signs in young infants are often challenging to detect and nonspecific; however, they may be the first indication of a sick infant, particularly in settings in which there is limited access to laboratory diagnostics and advanced monitoring.5
A key strategy to identify infants at risk for severe illness or mortality in low- and middle-income countries (LMICs) is through algorithms using infant clinical signs.5 The application of these tools may vary on the basis of the point of care of infant assessment, health system setting, level of health care, availability of laboratory or other testing, and provider awareness of these clinical decision tools. In the nonhospital-based or community setting, the World Health Organization (WHO) Integrated Management of Childhood Illness clinical sign checklist is used by frontline health workers to identify infants requiring immediate treatment with antibiotics and referral to higher-level care and provides a scalable approach to enhance early diagnosis and management of serious illness leading to death.5
In high-income countries (HICs), in which laboratory and imaging investigations are often more accessible and are integrated into clinical practice, clinical sign-based algorithms typically serve a supplementary role, primarily in triaging infants for mortality risk.6 In these settings, algorithms are often integrated into a broader diagnostic framework, assisting in the early identification of infants who might be at a higher risk of critical and fatal outcomes.7 In hospital-based settings, algorithms that are derived from regression modeling and incorporate more vital signs have been tested and validated.6–8 These models may analyze electronic health records and patient monitoring data to detect patterns in clinical presentations and stratify risk.6–8 Conversely, in LMICs, the reliance on clinical sign-based algorithms is not merely a matter of convenience but a critical necessity because of the constrained access to diagnostic testing.
It is critical to examine the predictive accuracy and performance of infant clinical sign algorithms to identify infants at the highest mortality risk. To our knowledge, no previous systematic reviews have assessed the accuracy of clinical sign algorithms for predicting death among young infants aged 0 to 59 days. To inform the WHO Guideline Development Group for Young Infant Sepsis, our objective was to systematically review the evidence on the accuracy of algorithms including infant clinical signs to predict mortality among young infants. The review aimed to answer the following population, index test, comparator, outcome, timing, and setting question: Among young infants aged 0 to 59 days at presentation, what is the accuracy of infant clinical sign algorithms to predict mortality from any cause by 59 days of life in any setting?
Methods
The systematic review protocol was registered prospectively with PROSPERO (CRD42023431387). In this article, we followed the Preferred Reporting Items for Systematic Reviews and Meta-Analyses9 and Transparent Reporting of a Multivariable Prediction Model for Individual Prognosis or Diagnosis checklists,10 as outlined in Supplemental Table S1. In this review, “infant clinical sign algorithms” were defined as tools used in medical decision-making on the basis of 2 or more clinical signs or symptoms from an infant’s history or physical examination. The main outcome of interest was mortality among young infants by 59 days of life.
Search Strategy
A medical research librarian (CGW) conducted a search across multiple databases, including MEDLINE, Embase, CINAHL, Global Index Medicus, and the Cochrane CENTRAL Registry of Trials on May 8, 2023. The search strategy encompassed terms related to neonates or infants, specific individual clinical symptoms and signs and sepsis, mortality, and diagnostic accuracy measures (Supplemental Table S2). A search of systematic reviews on the diagnostic accuracy of infant clinical signs for predicting death in neonates was also conducted on January 15, 2024 (Supplemental Table S3). Additional studies were identified by hand-searching the bibliographies of relevant systematic reviews.
Study Selection
Study screening was performed independently by 2 reviewers in Covidence, first by examining titles and abstracts, followed by a full-text review. Disagreements were adjudicated and resolved by a third reviewer.
Inclusion and Exclusion Criteria
Studies were included if they (1) evaluated algorithms or models that included at least 2 postnatal infant clinical signs assessed by physical examination or by maternal recall or history, (2) contained either a primary or subgroup analysis of infants assessed between 0 and 59 days, (3) reported all-cause or cause-specific mortality up to 59 days of life, and (4) reported at least 1 diagnostic accuracy statistic (ie, sensitivity, specificity, positive or negative predictive value, or likelihood ratio), model calibration (calibration plot, slope, Hosmer–Lemeshow statistic), or discrimination measure (C-statistic, D-statistic, area under the curve [AUC], logrank). We excluded (1) studies restricted to specialized populations (eg, infants with congenital heart disease), (2) review articles, conference proceedings, study protocols, case reports, and commentaries, (3) studies without a primary or subgroup analysis of young infants aged 0 to 59 days, (4) studies of algorithms with no postnatal signs (ie, Apgar scores or gestational age [GA] at birth only), (5) algorithms including biomarkers, or (6) studies of algorithms that reported signs only feasible for settings with resources for advanced clinical care (ie, ventilation, arterial blood gases, pH, continuous blood pressure/heart rate monitoring).
Data Extraction and Management
Data were independently extracted by 2 reviewers into a predesigned form in Excel based on the Critical Appraisal and Data Extraction for Systematic Reviews of Prediction Modeling Studies checklist.11,12 Data were extracted on the study design, setting, population characteristics, number of infants, number of death events, timing and classification of death, model characteristics (candidate predictors, model building, statistical methods, calibration measures, discrimination measures, validation),11,13 and diagnostic accuracy (sensitivity, specificity, positive or negative predictive value, positive or negative likelihood ratio). Model characteristics were extracted for both the derivation data set (population/cohort used to develop the model) and the validation data set (cohort in which the model was validated). We extracted data on model validation methods. Internal validation is defined as the examination of model performance within a random subset of available data within the same cohort (population or geography) or a nonrandom subset of available data. Temporal validation (validation in the same population in a different time period) was considered internal validation. External validation was considered an evaluation of model performance in a different population.14 When data were available on both derivation and validation data sets, model performance was assessed according to the validation cohort. We categorized AUC values as follows: excellent discrimination, AUC of ≥0.90; good discrimination, AUC of 0.80 to 0.89; fair discrimination, AUC of 0.70 to 0.79; and poor discrimination, AUC <0.70.15 Discrepancies in the extracted data were adjudicated and resolved by a third reviewer.
Risk of Bias Assessment
Quality Assessment of Prognostic Accuracy Studies was used to assess the risk of bias (ROB) in 5 key domains.16 These domains were patient selection, index test, outcome, flow and timing, and analysis. Each domain was evaluated for ROB and applicability concerns by 2 independent reviewers, and disagreements were resolved by a third. Full details on the ROB assessment are shown in the Supplemental Fig S1.
Synthesis of Evidence
We used the Grading of Recommendations Assessment Development and Evaluation (GRADE) approach for diagnostic tests and strategies to evaluate the certainty of evidence (COE) for studies in which the authors reported on diagnostic accuracy (sensitivity, specificity) or discrimination (AUC) parameters, preferentially GRADEing validation data.17,18 Criteria used to assess and grade ROB, indirectness, inconsistency, and imprecision are shown in the Supplemental Table S4.
Algorithms for infant signs were categorized using a statistical approach to develop the model as follows: (1) checklist (presence of a minimum number of clinical signs) or (2) prediction model with multivariable regression formulas. Models were additionally categorized by clinical presentation format as follows: checklist, nomogram, weighted score or score chart summed to calculate a total score, and regression formulas, including variables for different signs and risk factors. Studies were not suitable for pooling because of heterogeneity in settings and populations and the algorithms, measures, and checklists evaluated; therefore, graphical or statistical methods for detecting small sample effects could not be performed.
Results
Study Characteristics
A total of 6701 publications were identified from databases, and 83 were identified from bibliography searches of 7 systematic reviews and additional hand-searching. After the removal of duplicate records, 5683 abstracts were screened, of which 650 underwent a full-text review, and 11 studies met inclusion criteria (Fig 1).19–29 Characteristics of the included studies are shown in Table 1 with detailed descriptions in the Supplemental Table S5. The reasons for exclusion at the full-text stage are provided in Supplemental Table S6. Overall ROB assessments are shown in Fig 2, with study-level assessments available in Supplemental Fig S1. The authors of the included 11 studies reported on 26 different algorithms, of which 13 were Integrated Management of Childhood Illness-like signs checklists,19–21 and the remaining 13 were regression-based prediction models. Of the prediction models, the presentation format for clinicians was as a nomogram chart (n = 1),22 weighted scores or score chart (n = 8),23–27 and regression equations or formulas (n = 4).21,28,29 Three studies were conducted in nonhospital settings,19–21 and 8 studies were conducted in hospital/NICU-based settings.22–29 Eight studies were conducted in LMICs,19–22,25,26,28,29 2 were conducted in HICs,23,24 and 1 multicenter study spanned both HICs and LMICs.23 Sample sizes ranged from 116 to 53 909, with a total of 115 040. The median sample size was 3567, and the interquartile range was 10 660.
Author (y) . | Study Setting . | Participants . | Mortality Follow-Up Period . | Sample Size . | Reference Standard . | Name of Model/ Algorithm . | Infant Clinical Signs . | Description of Model/ Algorithm . |
---|---|---|---|---|---|---|---|---|
Nonhospital algorithms | ||||||||
Bang 2005 | India (Gadchiroli community) | Home visits, all neonates born in 39 villages eligible; infants aged 0–28 d | Within 0–28 d of life | 3567 | Sepsis-attributed neonatal death (neonatologist assigned) | SEARCH Checklist 1 | Cry weak or stopped, sucking reduced or stopped,* limbs loose,* infant was cold,* vomiting* or abdominal distension, umbilical sepsis, chest indrawing | Checklist: Any 2 of 7 signs |
2804 | SEARCH Checklist 2 | Cry weak or stopped, sucking reduced or stopped,* limbs loose,* infant was cold,* vomiting* or abdominal distension, umbilical sepsis | Checklist: Any 2 of 6 signs | |||||
SEARCH Checklist 3 | Cry weak or stopped, sucking reduced or stopped,* limbs loose,* infant was cold,* vomiting* or abdominal distension, umbilical sepsis, respiratory rate ≥60 | Checklist: Any 2 of 7 signs | ||||||
SEARCH Checklist 4 | Cry weak or stopped, sucking reduced or stopped,* limbs loose,* infant was cold,* vomiting* or abdominal distension, umbilical sepsis, grunt | Checklist: Any 2 of 7 signs | ||||||
SEARCH Checklist 5 | Cry weak or stopped, sucking reduced or stopped,* limbs loose,* infant was cold,* vomiting* or abdominal distension, umbilical sepsis, grunt or chest indrawing | Checklist: Any 2 of 7 signs | ||||||
SEARCH Checklist 6 | Cry weak or stopped, sucking reduced or stopped,* limbs loose,* infant was cold,* vomiting* or abdominal distension, umbilical sepsis, respiratory rate ≥60 or chest indrawing | Checklist: Any 2 of 7 signs | ||||||
Darmstadt 2011 | Bangladesh (Mirzapur community) | Home visits among intervention arm of cohort; infants aged 0–9 d | Within 10 d of life | 6924 | Neonatal mortality | Projahnmo Revised | Convulsion, RR (70/min), severe chest indrawing, severe fever (T >101 F), severe hypothermia (T <95.5 F), weak, abnormal, or absent cry, unconscious, lethargic/less than normal movement, not able to feed or suck at all, severe skin infection, umbilical erythema | Checklist: Any 1 of 11 signs |
Projahnmo Modification F | Severe chest indrawing, moderate to severe fever (T = 100 F), moderate to severe hypothermia (T <97.5 F), lethargic or less than normal movement, history of feeding problems, jaundice | Checklist: Any 1 of 6 signs | ||||||
YIS-2 | History of convulsion, RR (60/min), severe chest indrawing present, fever (T >99.5 F), hypothermia (T <95.9 F), lethargic or less than normal movement, history of feeding problems | Checklist: Any 1 of 7 signs | ||||||
YIS-2 Modification Z | History of convulsion, severe chest indrawing present, moderate to severe fever (T = 100 F), moderate to severe hypothermia (T <97.5 F), lethargic/less than normal movement, history of feeding problems, jaundice | Checklist: Any 1 of 7 signs | ||||||
SEARCH Checklist 1 | Stopped sucking, weak or no cry, limbs becoming limp, vomiting or abdominal distension, infant cold to touch, severe chest indrawing, umbilical infection | Checklist: Any 2 of 7 signs | ||||||
SEARCH Checklist 1 Modification | Stopped sucking, weak or no cry, limbs becoming limp, vomiting or abdominal distension, infant cold to touch, severe chest indrawing, umbilical infection | Checklist: Any 1 of 7 signs | ||||||
Khan 2020 | Bangladesh (Gaibandha and Rangpur Districts) | All live-born infants in cohort cluster-randomized trial of home visits who lived >48 h; infants aged 2–28 d | Within 2–28 d of life | 19 927 (Derivation: 14 944 Validation: 4983) | Neonatal death (verbal autopsy) | Khan Checklist | Lethargy, cyanosis, non-cephalic presentation, trouble suckling | Checklist: Any 1 of 4 signs |
Khan Model 1 | Birth wt, GA, lethargy, cyanosis, non-cephalic presentation and trouble suckling | 6-sign regression formula | ||||||
Khan Model 2 | GA, non-cephalic presentation, lethargy, trouble suckling | 4-sign regression formula | ||||||
Hospital-based algorithms | ||||||||
Hailemeskel 2022 | Ethiopia (South Gondar Zone) | Preterm infants in NICU in 4 public hospitals; infants aged 0–3 d | Within 72 h of life | 456 | Death | Hailemeskel nomogram for clinical risk prediction | GA, respiratory distress syndrome, multiple neonates, low birth weight, and kangaroo mother care | Nomogram |
Lee 2001 | Canada | All outborn infants transported from community hospitals to 15 tertiary NICUs; infants aged 0–28 d | Within 7 d of NICU admission, or total NICU mortality | 1723 (Derivation: 1115 Validation: 608) | Mortality | TRIPS-I Score | Temperature, respiratory status, blood pressure, and response to noxious stimuli | 4-sign weighted score |
TRIPS-I Modification | Temperature, respiratory status, blood pressure, and response to noxious stimuli, GA, small for GA, 5-min Apgar score <7, cesarean delivery | |||||||
Lee 2013 | Canada | All outborn and inborn infants admitted to 8 tertiary NICUs; infants aged 0–28 d | Within 7 d of NICU admission, or total NICU mortality | 17 075 (Derivation: 11 383 Validation: 5692) | Mortality | TRIPS-II Score | Temperature, respiratory status, blood pressure, and response to noxious stimuli | 4-sign weighted score (updated TRIPS-I weighting) |
TRIPS-II Modification | Temperature, respiratory status, blood pressure, and response to noxious stimuli, GA, small for GA, 5 min Apgar <7 and cesarean section | |||||||
Mediratta 2020 | Ethiopia (Gondar) | Admitted neonates in university NICU; infants aged 0–28 d | Within 28 d | 1085 (Derivation: 812 Validation: 246) | Mortality in NICU | Mediratta Neonatal Mortality Score | Admission level of consciousness, admission respiratory distress, GA, and birth weight | 4-sign weighted score |
Singhi 1995 | India (Chandigarh) | Pediatric emergency of Nehru Hospital, infants aged 0–60 d | Within 60 d of life | 116 | Serious illness, bacteremia, or death (culture- confirmed) | Singhi Score | Consciousness, feeding, hydration, color, consolability, facial expression | 6-sign weighted score |
Russell 2023 | Bangladesh, China, India, Thailand, Vietnam, Kenya, South Africa, Uganda, Italy, Greece, Brazil | Admitted infants treated with antibiotics for new episode of sepsis at 19 hospital sites (secondary and tertiary referral hospital) in 11 countries; infants aged 0–60 d | Within 28 d after enrollment | 3204 (Derivation: 2726 Validation: 478) | Sepsis death (culture- confirmed) | NeoSep Severity Score | Birth weight, GA, hospitalization duration, congenital anomalies, level of respiratory support, clinical signs (abnormal temp, abdominal distension, lethargy/reduced movement, difficulty feeding, evidence of shock) | 10-sign weighted score |
NeoSep Recovery Score | Cyanosis, level or respiratory support, clinical signs (abnormal temp, abdominal distension, lethargy/reduced movement, difficulty feeding, evidence of shock) | 7-sign time-varying (daily) weighted score | ||||||
Aluvaala 2021 | Kenya (Nairobi) | Neonatal unit admissions to a large urban maternity hospital; infants aged 0–28 d | Within hospital neonatal unit stay (most occurred in the first week of life) | 7054 (Derivation: 5427 Validation: 1627) | All-cause mortality | SENSS Score | Difficulty feeding, convulsions, indrawing, central cyanosis, and floppy/inability to suck, as assessed at admission, birth weight (category), sex | 7-sign regression formula |
Tuti 2022 | Kenya | Newborns admitted to NBUs in 16 hospitals; infants aged 0–28 d | Within hospital neonatal unit stay | 53 909 | All-cause mortality | SENSS Score | Difficulty feeding, convulsions, indrawing, central cyanosis, and floppy/inability to suck, as assessed at admission, birth weight (category), sex | 7-sign regression formula |
Author (y) . | Study Setting . | Participants . | Mortality Follow-Up Period . | Sample Size . | Reference Standard . | Name of Model/ Algorithm . | Infant Clinical Signs . | Description of Model/ Algorithm . |
---|---|---|---|---|---|---|---|---|
Nonhospital algorithms | ||||||||
Bang 2005 | India (Gadchiroli community) | Home visits, all neonates born in 39 villages eligible; infants aged 0–28 d | Within 0–28 d of life | 3567 | Sepsis-attributed neonatal death (neonatologist assigned) | SEARCH Checklist 1 | Cry weak or stopped, sucking reduced or stopped,* limbs loose,* infant was cold,* vomiting* or abdominal distension, umbilical sepsis, chest indrawing | Checklist: Any 2 of 7 signs |
2804 | SEARCH Checklist 2 | Cry weak or stopped, sucking reduced or stopped,* limbs loose,* infant was cold,* vomiting* or abdominal distension, umbilical sepsis | Checklist: Any 2 of 6 signs | |||||
SEARCH Checklist 3 | Cry weak or stopped, sucking reduced or stopped,* limbs loose,* infant was cold,* vomiting* or abdominal distension, umbilical sepsis, respiratory rate ≥60 | Checklist: Any 2 of 7 signs | ||||||
SEARCH Checklist 4 | Cry weak or stopped, sucking reduced or stopped,* limbs loose,* infant was cold,* vomiting* or abdominal distension, umbilical sepsis, grunt | Checklist: Any 2 of 7 signs | ||||||
SEARCH Checklist 5 | Cry weak or stopped, sucking reduced or stopped,* limbs loose,* infant was cold,* vomiting* or abdominal distension, umbilical sepsis, grunt or chest indrawing | Checklist: Any 2 of 7 signs | ||||||
SEARCH Checklist 6 | Cry weak or stopped, sucking reduced or stopped,* limbs loose,* infant was cold,* vomiting* or abdominal distension, umbilical sepsis, respiratory rate ≥60 or chest indrawing | Checklist: Any 2 of 7 signs | ||||||
Darmstadt 2011 | Bangladesh (Mirzapur community) | Home visits among intervention arm of cohort; infants aged 0–9 d | Within 10 d of life | 6924 | Neonatal mortality | Projahnmo Revised | Convulsion, RR (70/min), severe chest indrawing, severe fever (T >101 F), severe hypothermia (T <95.5 F), weak, abnormal, or absent cry, unconscious, lethargic/less than normal movement, not able to feed or suck at all, severe skin infection, umbilical erythema | Checklist: Any 1 of 11 signs |
Projahnmo Modification F | Severe chest indrawing, moderate to severe fever (T = 100 F), moderate to severe hypothermia (T <97.5 F), lethargic or less than normal movement, history of feeding problems, jaundice | Checklist: Any 1 of 6 signs | ||||||
YIS-2 | History of convulsion, RR (60/min), severe chest indrawing present, fever (T >99.5 F), hypothermia (T <95.9 F), lethargic or less than normal movement, history of feeding problems | Checklist: Any 1 of 7 signs | ||||||
YIS-2 Modification Z | History of convulsion, severe chest indrawing present, moderate to severe fever (T = 100 F), moderate to severe hypothermia (T <97.5 F), lethargic/less than normal movement, history of feeding problems, jaundice | Checklist: Any 1 of 7 signs | ||||||
SEARCH Checklist 1 | Stopped sucking, weak or no cry, limbs becoming limp, vomiting or abdominal distension, infant cold to touch, severe chest indrawing, umbilical infection | Checklist: Any 2 of 7 signs | ||||||
SEARCH Checklist 1 Modification | Stopped sucking, weak or no cry, limbs becoming limp, vomiting or abdominal distension, infant cold to touch, severe chest indrawing, umbilical infection | Checklist: Any 1 of 7 signs | ||||||
Khan 2020 | Bangladesh (Gaibandha and Rangpur Districts) | All live-born infants in cohort cluster-randomized trial of home visits who lived >48 h; infants aged 2–28 d | Within 2–28 d of life | 19 927 (Derivation: 14 944 Validation: 4983) | Neonatal death (verbal autopsy) | Khan Checklist | Lethargy, cyanosis, non-cephalic presentation, trouble suckling | Checklist: Any 1 of 4 signs |
Khan Model 1 | Birth wt, GA, lethargy, cyanosis, non-cephalic presentation and trouble suckling | 6-sign regression formula | ||||||
Khan Model 2 | GA, non-cephalic presentation, lethargy, trouble suckling | 4-sign regression formula | ||||||
Hospital-based algorithms | ||||||||
Hailemeskel 2022 | Ethiopia (South Gondar Zone) | Preterm infants in NICU in 4 public hospitals; infants aged 0–3 d | Within 72 h of life | 456 | Death | Hailemeskel nomogram for clinical risk prediction | GA, respiratory distress syndrome, multiple neonates, low birth weight, and kangaroo mother care | Nomogram |
Lee 2001 | Canada | All outborn infants transported from community hospitals to 15 tertiary NICUs; infants aged 0–28 d | Within 7 d of NICU admission, or total NICU mortality | 1723 (Derivation: 1115 Validation: 608) | Mortality | TRIPS-I Score | Temperature, respiratory status, blood pressure, and response to noxious stimuli | 4-sign weighted score |
TRIPS-I Modification | Temperature, respiratory status, blood pressure, and response to noxious stimuli, GA, small for GA, 5-min Apgar score <7, cesarean delivery | |||||||
Lee 2013 | Canada | All outborn and inborn infants admitted to 8 tertiary NICUs; infants aged 0–28 d | Within 7 d of NICU admission, or total NICU mortality | 17 075 (Derivation: 11 383 Validation: 5692) | Mortality | TRIPS-II Score | Temperature, respiratory status, blood pressure, and response to noxious stimuli | 4-sign weighted score (updated TRIPS-I weighting) |
TRIPS-II Modification | Temperature, respiratory status, blood pressure, and response to noxious stimuli, GA, small for GA, 5 min Apgar <7 and cesarean section | |||||||
Mediratta 2020 | Ethiopia (Gondar) | Admitted neonates in university NICU; infants aged 0–28 d | Within 28 d | 1085 (Derivation: 812 Validation: 246) | Mortality in NICU | Mediratta Neonatal Mortality Score | Admission level of consciousness, admission respiratory distress, GA, and birth weight | 4-sign weighted score |
Singhi 1995 | India (Chandigarh) | Pediatric emergency of Nehru Hospital, infants aged 0–60 d | Within 60 d of life | 116 | Serious illness, bacteremia, or death (culture- confirmed) | Singhi Score | Consciousness, feeding, hydration, color, consolability, facial expression | 6-sign weighted score |
Russell 2023 | Bangladesh, China, India, Thailand, Vietnam, Kenya, South Africa, Uganda, Italy, Greece, Brazil | Admitted infants treated with antibiotics for new episode of sepsis at 19 hospital sites (secondary and tertiary referral hospital) in 11 countries; infants aged 0–60 d | Within 28 d after enrollment | 3204 (Derivation: 2726 Validation: 478) | Sepsis death (culture- confirmed) | NeoSep Severity Score | Birth weight, GA, hospitalization duration, congenital anomalies, level of respiratory support, clinical signs (abnormal temp, abdominal distension, lethargy/reduced movement, difficulty feeding, evidence of shock) | 10-sign weighted score |
NeoSep Recovery Score | Cyanosis, level or respiratory support, clinical signs (abnormal temp, abdominal distension, lethargy/reduced movement, difficulty feeding, evidence of shock) | 7-sign time-varying (daily) weighted score | ||||||
Aluvaala 2021 | Kenya (Nairobi) | Neonatal unit admissions to a large urban maternity hospital; infants aged 0–28 d | Within hospital neonatal unit stay (most occurred in the first week of life) | 7054 (Derivation: 5427 Validation: 1627) | All-cause mortality | SENSS Score | Difficulty feeding, convulsions, indrawing, central cyanosis, and floppy/inability to suck, as assessed at admission, birth weight (category), sex | 7-sign regression formula |
Tuti 2022 | Kenya | Newborns admitted to NBUs in 16 hospitals; infants aged 0–28 d | Within hospital neonatal unit stay | 53 909 | All-cause mortality | SENSS Score | Difficulty feeding, convulsions, indrawing, central cyanosis, and floppy/inability to suck, as assessed at admission, birth weight (category), sex | 7-sign regression formula |
NBUs, newborn units; NR, not reported; RR, respiratory rate; T, Temperature.
Historical sign ascertained by maternal report
Methodologic Quality of Included Studies
Summary ROB assessments are shown in Fig 2, with individual study ROB assessments shown in the Supplemental Fig S1. Among the 11 studies, 6 studies had no serious ROB,20,21,24,27–29 4 had serious ROB,22,23,25,26 and 1 study had very serious ROB.19 In the participant selection domain, 1 study had serious ROB because of the case-control design.25 Three studies had serious ROB for the index test assessment, primarily because of the failure to prespecify the index test threshold.19,22,26 For the outcome and mortality assessment, 1 study had uncertain ROB, with the remaining studies having low risk.19 For participant flow and timing, most studies were low-risk, with 1 study having uncertain risk due to loss to follow-up.27 For the analysis domain, 1 study had high ROB due to failure to analyze the full cohort,19 and 1 study was high-risk because of missing data.23 For model applicability across domains, all studies had low ROB.
Checklists
Three nonhospital, community-based studies (Bang 2005, Darmstadt 2011, Khan 2020; n = 30 418) validated 13 checklists to screen for infants with the presence of any 1 or 2 of a range of signs that were equally weighted.19–21 Checklists included 4 to 11 signs, most commonly temperature, respiratory and feeding status, and level of consciousness. The Khan 4-sign checklist also included a risk factor of non-cephalic presentation. The details of individual clinical signs used in each checklist are provided in Table 1.
In a field trial in Gadchiroli, India (Society for Education Action and Research in Community Health [SEARCH]), Bang et al derived and internally validated several checklist-based algorithms to predict sepsis deaths among neonates aged 0 to 28 days (see Table 1 for signs included in each checklist).19 SEARCH checklist 1, which required any 2 of 7 signs, had a sensitivity of 98% (95% confidence interval [CI]: 88% to 100%) and specificity of 94% (95% CI: 93% to 95%) among 3567 neonates.19 The remaining algorithms were developed in a subset of 2804 newborns in a different time period (April 1996–October 1999); the sensitivity and specificity for SEARCH checklist 2 was 81% (95% CI: 58% to 95%) and 96% (95% CI: 95% to 96%), for checklist 3 was 86% (95% CI: 64% to 97%) and 95% (95% CI: 94% to 95%), for checklist 4 was 91% (95% CI: 70% to 99%) and 95% (95% CI: 94% to 95%), for checklist 5 was 95% (95% CI: 76% to 100%) and 94% (95% CI: 93% to 95%), and for checklist 6 was 95% (95% CI: 76% to 100%) and 94% (95% CI: 94% to 95%), respectively. All SEARCH algorithms reported in Bang et al had very low COE for both sensitivity and specificity (Table 2).19
Author (y) of Study . | Algorithm . | Infant Age (d) . | Outcome . | N Participants in Analysis . | Prevalence of Mortality . | Sensitivity (95% CI) . | Specificity (95% CI) . | AUC . | COE Sensitivity . | COE Specificity . | Certainty of Evidence AUC . |
---|---|---|---|---|---|---|---|---|---|---|---|
Nonhospital algorithms | |||||||||||
Bang 2005 | SEARCH Checklist 1 | 0–28 | Sepsis Death | 3567 | 0.01 | 0.98 (0.88–1.00) | 0.94 (0.93–0.95) | — | ⨁◯◯◯ Very low | ⨁◯◯◯ Very low | — |
SEARCH Checklist 2 | 2804 | 0.00 | 0.81 (0.58–0.95) | 0.96 (0.95–0.96) | — | ⨁◯◯◯ Very low | ⨁◯◯◯ Very low | — | |||
SEARCH Checklist 3 | 0.00 | 0.86 (0.64–0.97) | 0.95 (0.94–0.95) | — | ⨁◯◯◯ Very low | ⨁◯◯◯ Very low | — | ||||
SEARCH Checklist 4 | 0.00 | 0.91 (0.70–0.99) | 0.95 (0.94–0.95) | — | ⨁◯◯◯ Very low | ⨁◯◯◯ Very low | — | ||||
SEARCH Checklist 5 | 0.00 | 0.95 (0.76–1.00) | 0.94 (0.93–0.95) | — | ⨁◯◯◯ Very low | ⨁◯◯◯ Very low | — | ||||
SEARCH Checklist 6 | 0.00 | 0.95 (0.76–1.00) | 0.94 (0.94–0.95) | — | ⨁◯◯◯ Very low | ⨁◯◯◯ Very low | — | ||||
Darmstadt 2011 | Projahnmo Revised | 0–9 | Death | 6924 | 0.06 | 0.58 (0.46–0.70) | 0.95 (0.94–0.95) | — | ⨁⨁◯◯ Low | ⨁⨁⨁◯ Moderate | — |
Projahnmo Modification F | 0.07 | 0.58 (0.46–0.70) | 0.93 (0.93–0.94) | — | ⨁◯◯◯ Very low | ⨁⨁⨁◯ Moderate | — | ||||
YIS-2 | 0.07 | 0.57 (0.44–0.68) | 0.93 (0.92–0.94) | — | ⨁◯◯◯ Very low | ⨁⨁⨁◯ Moderate | — | ||||
YIS-2 Modification Z | 0.07 | 0.58 (0.46–0.70) | 0.93 (0.93–0.94) | — | ⨁◯◯◯ Very low | ⨁⨁⨁◯ Moderate | — | ||||
SEARCH Checklist 1 | 0.00 | 0.03 (0.00–0.10) | 0.99 (0.99–0.99) | — | ⨁⨁◯◯ Low | ⨁⨁⨁◯ Moderate | — | ||||
SEARCH Checklist 1 Modification | 0.02 | 0.16 (0.08–0.27) | 0.98 (0.98–0.98) | — | ⨁⨁◯◯ Low | ⨁⨁⨁◯ Moderate | — | ||||
Khan 2020 | Khan Checklist | 2–28 | Death | 4983* | 0.01* | 0.62* (CI NR) | 0.60* (CI NR) | — | ⨁⨁◯◯ Low | ⨁⨁◯◯ Low | — |
Khan Model 1 | — | — | 0.80* (0.73–0.87) | — | — | ⨁⨁◯◯ Low | |||||
Khan Model 2 | — | — | 0.74* (0.66–0.81) | — | — | ⨁⨁◯◯ Low | |||||
Hospital algorithms | |||||||||||
Hailemeskel 2022 | Hailemeskel Nomogram | 0–3 | Death | 456 | 0.29 | 0.77* (0.69–0.84) | 0.95* (0.93–0.98) | 0.93* (0.90–0.95) | ⨁◯◯◯ Very low | ⨁⨁◯◯ Low | * |
Lee 2001 | TRIPS-I Score | “Neonatal” (Days NR) | 7-d | 608* | — | — | — | 0.83* (CI NR) | — | — | ⨁◯◯◯ Very low |
Total NICU Death | — | — | 0.76* (CI NR) | — | — | ⨁◯◯◯ Very low | |||||
TRIPS-I Modification | 7-d | — | — | 0.91* (CI NR) | — | — | ⨁◯◯◯ Very low | ||||
Total NICU Death | — | — | 0.85* (CI NR) | — | — | ⨁◯◯◯ Very low | |||||
Lee 2013 | TRIPS-II Score | “Neonatal” (d NR) | 7-d | 5692* | — | — | — | 0.90* (CI NR) | — | — | ⨁⨁◯◯ Low |
Total NICU Death | — | — | 0.87* (CI NR) | — | — | ⨁⨁◯◯ Low | |||||
TRIPS-II Modification | 7-d | — | — | 0.91* (CI NR) | — | — | ⨁⨁◯◯ Low | ||||
Total NICU Death | — | — | 0.90* (CI NR) | — | — | ⨁⨁◯◯ Low | |||||
Mediratta 2020 | Mediratta Neonatal Mortality Score | 0–28 | Death | 812 | 0.26 | 0.81 (CI NR) | 0.80 (CI NR) | — | ⨁◯◯◯ Very low | ⨁◯◯◯ Very low | — |
246* | 0.50* | — | — | 0.85* (0.80–0.89) | — | — | ⨁◯◯◯ Very low | ||||
Singhi 1995 | Singhi Score | 0–60 | Death | 116 | 0.09 | 0.80 (CI NR) | 0.89 (CI NR) | — | ⨁◯◯◯ Very low | ⨁◯◯◯ Very low | — |
Russell 2023 | NeoSep Severity Score | 0–59 | Sepsis Death | 478* | 0.11* | — | — | 0.76* (0.69–0.82) | — | — | ⨁⨁◯◯ Low |
NeoSep Recovery Score | 0.87* (0.60–0.98) | 0.76* (0.71–0.79) | 0.85* (0.78–0.93) | ⨁◯◯◯ Very low | ⨁⨁⨁◯ Moderate | ⨁⨁◯◯ Low | |||||
Aluvaala 2021 | SENSS Score | “Neonatal” (d NR) | Death | 1627* | 0.01* | — | — | 0.89* (0.84–0.93) | — | — | ⨁⨁⨁◯ Moderate |
Tuti 2022 | SENSS Score | “Neonatal” (d NR) | Death | 53 909* | 0.14* | — | — | 0.83** (0.83–0.84) | — | — | ⨁⨁⨁◯ Moderate |
Author (y) of Study . | Algorithm . | Infant Age (d) . | Outcome . | N Participants in Analysis . | Prevalence of Mortality . | Sensitivity (95% CI) . | Specificity (95% CI) . | AUC . | COE Sensitivity . | COE Specificity . | Certainty of Evidence AUC . |
---|---|---|---|---|---|---|---|---|---|---|---|
Nonhospital algorithms | |||||||||||
Bang 2005 | SEARCH Checklist 1 | 0–28 | Sepsis Death | 3567 | 0.01 | 0.98 (0.88–1.00) | 0.94 (0.93–0.95) | — | ⨁◯◯◯ Very low | ⨁◯◯◯ Very low | — |
SEARCH Checklist 2 | 2804 | 0.00 | 0.81 (0.58–0.95) | 0.96 (0.95–0.96) | — | ⨁◯◯◯ Very low | ⨁◯◯◯ Very low | — | |||
SEARCH Checklist 3 | 0.00 | 0.86 (0.64–0.97) | 0.95 (0.94–0.95) | — | ⨁◯◯◯ Very low | ⨁◯◯◯ Very low | — | ||||
SEARCH Checklist 4 | 0.00 | 0.91 (0.70–0.99) | 0.95 (0.94–0.95) | — | ⨁◯◯◯ Very low | ⨁◯◯◯ Very low | — | ||||
SEARCH Checklist 5 | 0.00 | 0.95 (0.76–1.00) | 0.94 (0.93–0.95) | — | ⨁◯◯◯ Very low | ⨁◯◯◯ Very low | — | ||||
SEARCH Checklist 6 | 0.00 | 0.95 (0.76–1.00) | 0.94 (0.94–0.95) | — | ⨁◯◯◯ Very low | ⨁◯◯◯ Very low | — | ||||
Darmstadt 2011 | Projahnmo Revised | 0–9 | Death | 6924 | 0.06 | 0.58 (0.46–0.70) | 0.95 (0.94–0.95) | — | ⨁⨁◯◯ Low | ⨁⨁⨁◯ Moderate | — |
Projahnmo Modification F | 0.07 | 0.58 (0.46–0.70) | 0.93 (0.93–0.94) | — | ⨁◯◯◯ Very low | ⨁⨁⨁◯ Moderate | — | ||||
YIS-2 | 0.07 | 0.57 (0.44–0.68) | 0.93 (0.92–0.94) | — | ⨁◯◯◯ Very low | ⨁⨁⨁◯ Moderate | — | ||||
YIS-2 Modification Z | 0.07 | 0.58 (0.46–0.70) | 0.93 (0.93–0.94) | — | ⨁◯◯◯ Very low | ⨁⨁⨁◯ Moderate | — | ||||
SEARCH Checklist 1 | 0.00 | 0.03 (0.00–0.10) | 0.99 (0.99–0.99) | — | ⨁⨁◯◯ Low | ⨁⨁⨁◯ Moderate | — | ||||
SEARCH Checklist 1 Modification | 0.02 | 0.16 (0.08–0.27) | 0.98 (0.98–0.98) | — | ⨁⨁◯◯ Low | ⨁⨁⨁◯ Moderate | — | ||||
Khan 2020 | Khan Checklist | 2–28 | Death | 4983* | 0.01* | 0.62* (CI NR) | 0.60* (CI NR) | — | ⨁⨁◯◯ Low | ⨁⨁◯◯ Low | — |
Khan Model 1 | — | — | 0.80* (0.73–0.87) | — | — | ⨁⨁◯◯ Low | |||||
Khan Model 2 | — | — | 0.74* (0.66–0.81) | — | — | ⨁⨁◯◯ Low | |||||
Hospital algorithms | |||||||||||
Hailemeskel 2022 | Hailemeskel Nomogram | 0–3 | Death | 456 | 0.29 | 0.77* (0.69–0.84) | 0.95* (0.93–0.98) | 0.93* (0.90–0.95) | ⨁◯◯◯ Very low | ⨁⨁◯◯ Low | * |
Lee 2001 | TRIPS-I Score | “Neonatal” (Days NR) | 7-d | 608* | — | — | — | 0.83* (CI NR) | — | — | ⨁◯◯◯ Very low |
Total NICU Death | — | — | 0.76* (CI NR) | — | — | ⨁◯◯◯ Very low | |||||
TRIPS-I Modification | 7-d | — | — | 0.91* (CI NR) | — | — | ⨁◯◯◯ Very low | ||||
Total NICU Death | — | — | 0.85* (CI NR) | — | — | ⨁◯◯◯ Very low | |||||
Lee 2013 | TRIPS-II Score | “Neonatal” (d NR) | 7-d | 5692* | — | — | — | 0.90* (CI NR) | — | — | ⨁⨁◯◯ Low |
Total NICU Death | — | — | 0.87* (CI NR) | — | — | ⨁⨁◯◯ Low | |||||
TRIPS-II Modification | 7-d | — | — | 0.91* (CI NR) | — | — | ⨁⨁◯◯ Low | ||||
Total NICU Death | — | — | 0.90* (CI NR) | — | — | ⨁⨁◯◯ Low | |||||
Mediratta 2020 | Mediratta Neonatal Mortality Score | 0–28 | Death | 812 | 0.26 | 0.81 (CI NR) | 0.80 (CI NR) | — | ⨁◯◯◯ Very low | ⨁◯◯◯ Very low | — |
246* | 0.50* | — | — | 0.85* (0.80–0.89) | — | — | ⨁◯◯◯ Very low | ||||
Singhi 1995 | Singhi Score | 0–60 | Death | 116 | 0.09 | 0.80 (CI NR) | 0.89 (CI NR) | — | ⨁◯◯◯ Very low | ⨁◯◯◯ Very low | — |
Russell 2023 | NeoSep Severity Score | 0–59 | Sepsis Death | 478* | 0.11* | — | — | 0.76* (0.69–0.82) | — | — | ⨁⨁◯◯ Low |
NeoSep Recovery Score | 0.87* (0.60–0.98) | 0.76* (0.71–0.79) | 0.85* (0.78–0.93) | ⨁◯◯◯ Very low | ⨁⨁⨁◯ Moderate | ⨁⨁◯◯ Low | |||||
Aluvaala 2021 | SENSS Score | “Neonatal” (d NR) | Death | 1627* | 0.01* | — | — | 0.89* (0.84–0.93) | — | — | ⨁⨁⨁◯ Moderate |
Tuti 2022 | SENSS Score | “Neonatal” (d NR) | Death | 53 909* | 0.14* | — | — | 0.83** (0.83–0.84) | — | — | ⨁⨁⨁◯ Moderate |
—/NR, not reported.
Internal validation statistics (N, prevalence, sensitivity, specificity, AUC) given when available.
External validation statistics (N, prevalence, sensitivity, specificity, AUC).
Darmstadt et al validated several previously published checklists, including the SEARCH algorithms, and derived new algorithms in rural Mirzapur, Bangladesh for predicting all-cause neonatal deaths within the first 10 days of life on 6924 neonates (Table 1).20 The Projahnmo revised 11-sign checklist had a sensitivity of 58% (95% CI: 46% to 70%, low COE) and a specificity of 95% (95% CI: 94% to 95%, moderate COE) for predicting death.20 A modified 6-sign version of Projahnmo (Modification F) had the same sensitivity of 58% (95% CI: 46% to 70%, very low COE) and specificity of 93% (95% CI: 93% to 94%, moderate COE).20 The Young Infants Signs-2 (YIS-2), a 7-sign checklist, had a sensitivity of 57% (95% CI: 44% to 68%, very low COE) and a specificity of 93% (95% CI: 92% to 94%, moderate COE).20 The modification of the YIS-2 (Modification Z) that modified temperature thresholds and included jaundice yielded a sensitivity of 58% (95% CI: 46% to 70%, very low COE) and specificity of 93% (95% CI: 93% to 94%, moderate COE).20 The SEARCH checklist 1 was validated in this study in Bangladesh and had a much lower sensitivity of 3% (95% CI: 0% to 10%, low COE) compared with the original study in India, as well as a specificity of 99% (95% CI: 99% to 99%, moderate COE) for identifying 10-day all-cause mortality.20 Darmstadt et al also applied a modification of the original SEARCH checklist 1 by requiring only 1 of 7 signs and found a sensitivity of 16% (95% CI: 8% to 27%, low COE) and specificity of 98% (95% CI: 98% to 98%, moderate COE; Table 2).20
In another nonhospital study in Gaibanda, Bangladesh, Khan et al developed and internally validated a 4-sign checklist using data from 14 944 infants 2 to 28 days from a cluster-randomized trial in Bangladesh (n = 4983 in validation sample; Table 1).21 The checklist-based algorithm was associated with a sensitivity of 62% (CI: not reported) and specificity of 60% (CI: not reported) and had low COE for both sensitivity and specificity (Table 2).21
Prediction Models
Nomograms
In a hospital-based study in Gondar, Ethiopia, Hailemeskel et al developed a 5-sign weighted-score nomogram in a prospective cohort of 456 preterm neonates aged 0 to 3 days in public hospitals in Ethiopia (Tables 1 and 3).22 The model was internally validated and had a sensitivity of 77% (95% CI: 69% to 84% very low COE) and specificity of 95% (95% CI: 93% to 98%, low COE) for preterm mortality, with excellent discrimination (AUC 0.93, 95% CI: 0.90–0.95, low COE; Table 2).22
Author, y, Model Name . | EPV or EPP . | Selection (N) of Candidate and Final Predictors . | N (%) and Handling of Missing Data . | Calibration Measures (Calibration Plot, Slope, Hosmer–Lemeshow) . | Discrimination Measures (C-statistic, D-statistic, AUC, logrank) . | Diagnostic Accuracy (Sensitivity, Specificity, PPV, NPV) . | Overall Measures (R2) . | Type of Validation and Methods . | Regression Equation . |
---|---|---|---|---|---|---|---|---|---|
Hailemeskel 2022 Hailemeskel Nomogram for clinical risk prediction | NR | Candidate (5): Variable selection in multivariable model made using least absolute shrinkage; Final (5): Lasso regression | N (%): NR; Method: NR | Calibrated P = 0.52 | AUROC = 0.93 (0.90–0.9) | Se = 0.77; Sp = 0.95; False-positive rate = 0.04; Prediction accuracy = 0.90 | NR | Internal: validated using 2000 bootstrap replicates; External: NR | Linear predictor of the equation model = −1.54 + 2.66 ∗GA <32 wk + 0.77 ∗multiple pregnancy 4.4; ∗has no RDS, + 2.1; ∗has not gotten KMC, + 1.52; ∗low birth wt |
Lee 2001 TRIPS-I & TRIPS-I modification | NR | Candidate (4): Based on univariable associations; Final (4): Significance of coefficients and clinical judgment, stepwise selection | 226 (9); Method: NR | Calibration of TRIPS for prediction of 7-d mortality (against validation cohort of pretransport data): Hosmer–Lemeshow χ2 = 3.54 (df = 4), (P = .47) | Mortality within 7d of NICU admission: ROC (P): TRIPS-I = 0.83 (0.47); TRIPS-I mod. = 0.91 (0.67); Total NICU mortality: ROC (P): TRIPS-I = 0.76 (0.22); TRIPS-I mod = 0.85 (0.70) | NR for models | NR | Internal: Random split data; External: None | β-coefficients were used to wt each variable’s impact, with the sum of these weighted scores constituting the TRIPS Score; Temperature (°C): <36.1 or >37.6 = 8; 36.1–36.5 or 37.2–37.6 = 1; 36.6–37.1 = 0; Respiratory status: Severe (apnea, gasping, intubated) = 14; Moderate (RR >60/min &/or SpO2 <85) = 5; None (RR <60/min & SpO2 >85) = 0; Systolic BP (mm Hg): <20 = 26; 20–40 = 16; >40 = 0; Response to noxious stimuli: None, seizure, muscle relaxant= 17; Lethargic response, no cry = 6; Withdraws vigorously, cries = 0 |
Lee 2013 TRIPS-II & TRIPS-II modification | NR | Candidate (4): Based on univariable associations; Final (4): Significance of coefficients and clinical judgment, stepwise selection | 42; Method: NR | Hosmer–Lemeshow χ2 = 8.3 (df = 5) (P = .14); TRIPS-II predicted mortality well across the full range of GA groups and TRIPS-II scores. Overall, the total NICU mortality rate increases as TRIPS-II increases | Mortality within 7 d of NICU admission: ROC (P): TRIPS = 0.90 (0.30); TRIPS-II mod = 0.91 (0.19); Total NICU Mortality: ROC (P): TRIPS = 0.87 (0.33); TRIPS-II mod = 0.90 (0.54) | NR models | NR | Internal: Random split data; External: None | β-coefficients were used to wt each variable’s impact, with the sum of these weighted scores constituting the TRIPS-II score; Temperature (°C): <36.1 or >37.6 = 5; 36.1–37.6 = 0; Respiratory status: Severe = 23; Moderate or none = 0; Systolic BP (mm Hg): <30 = 13; 30–40 = 8; >40 = 0; Response to noxious stimuli: None, seizure, muscle relaxant= 13; Lethargic response = 5; Withdraws vigorously, cries = 0 |
Mediratta, 2020 Mediratta Neonatal Mortality Score | 10 | Candidate: (12) Based on univariable associations; Final (4): Stepwise selection | N (%): NR; Method: variables missing ≥15% of data excluded; Single imputation | Derivation data set: Calibration slope= 0.84; Hosmer–Lemeshow = 16.5 (P = .09); Validation data set: calibration slope = 0.85; Hosmer–Lemeshow = 17.0 (P = .07) | Derivation AUC: 0.88 (0.85–0.91); Validation AUC: 0.85 (0.80–0.89) | Se = 0.81; Sp = 0.80; PPV = 0.58; NPV = 0.83 | NR | Internal: Bootstrap; Temporal: Different time period at the same hospital | Each variable in model assigned point value from 0 to 16 based on β coefficients in the multivariate model. Cutoff value for the score corresponding to 50% probability of mortality was 12 |
Singhi 1995 Singhi Score | NR | Candidate (11): Pearson's correlation; Final (6): Value of abnormal findings graded on a 3-point scale (0, 1, and 2) in the ascending order of severity (multiple stepwise regression) | N (%): NR; Method: NR | NR | NR | An impaired consciousness level with a total score of >7 had Se = 0.80; Sp = 0.89; PPV = 0.52; NPV = 0.97 in derivation (validation NR) | Consciousness = 0.19; Feeding = 0.20; Hydration = 0.22; Color = 0.23; Consolability = 0.24; Facial expression = 0.26 | NR | NR |
Russell 2023 NeoSep Severity Score NeoSep Recovery Score | NR | NeoSep severity: Candidate predictors (10), Final model (10), backward elimination (exit P = .05); NeoSep Recovery: Candidate predictors (7), Final model (7) Forward selection (entry P = .05) | N (%): NR Method: Factors with missing values >10% excluded | NeoSep Hosmer–Lemeshow P = .53 | NeoSep Severity Derivation: C-statistic = 0.77 (0.75–0.80); Validation: C-statistic = 0.76 (0.69–0.82); NeoSep Recovery Derivation: AUROC = 0.82 (0.78–0.85); Validation: AUROC = 0.85 (0.78–0.93) | NeoSep Severity: a score ≥5 at baseline was associated with 28-d mortality >10% (exact NR); NeoSep Recovery: Score >4; Se = 0.87 (0.60–0.98); Sp = 0.76 (0.71–0.79) in validation | NR | 15% randomly selected sample per site was reserved for model validation | NeoSep Severity: a points-based risk score, in which each predictor of death is assigned a number of points developed from model coefficients; NeoSep Recovery: a points-based risk score derived similarly to the baseline severity score |
Khan 2020 Regression formulas (Model 1 & 2) | NR | Candidate (16): Based on univariable associations; Final (4): Statistical significance | 638 (3.2) Method: NR | Derivation Hosmer–Lemeshow: Khan Model 1 = 6.28 (P = .62); Khan Model 2 = 10.25 (P = .25); Validation Hosmer–Lemeshow: Khan Model 1 = 6.56 (P = .77); Khan Model 2 = 10.12 (P = .43) | Derivation AUC: Khan Model 1 = 0.80 (0.76–0.84); Khan Model 2 = 0.75 (0.71–0.80); Validation AUC: Khan Model 1 = 0.80 (0.73–0.87); Khan Model 2 = 0.74 (0.66–0.81) | NR for models | NR | Internal: Random split data; External: None | Khan Model 1: 1 + e 7.787 − 4.853 (birth weight ≤1.5 kg) −1.904 (birth weight >1.5 kg) −0.097 (GA) + 0.426 (lethargy) + 0.326 (cyanosis) + 0.72 (non-cephalic presentation) + 0.947 (poor suckling); Khan Model 2: e2.707 − 0.203 (GA) + 0.668 (lethargy) + 0.904 (non-cephalic presentation) + +1.381 (poor suckling) |
Aluvaala 2021 SENSS Score | Derivation EPV = 45 (445/10); Validation: 151 deaths and 1476 nonevents | Candidate (11): Selected on the basis of availability in clinical practice; Final (7): Logistic regression with no variable selection | Derivation: 0.2% to 16%; Validation: 0.1% to 14%; Method: Missing at random Multiple imputation (chained equation approach) | Calibration intercept = −0.33 (−0.56 to −0.11) | Derivation C-statistic = 0.91 (0.89 to 0.93); External validation C-statistic = 0.89 (0.84 to 0.93) | NR | NR | Internal: bootstrapping; Temporal: applying the model coefficients obtained at derivation to temporal data | Linear predictor = −3.8583 + 5.7580 ∗ ELBW + 3.7082+ VLBW + 0.9232 ∗ LBW − 0.4918 ∗ Macrosomia − 0.1336 ∗ Male + 1.3596 ∗ Difficulty feeding + 1.3977 ∗ Convulsion + 1.9790 ∗ Indrawing + 0.9584 ∗ Cyanosis+ 1.6266 ∗ Floppy unable to suck |
Tuti 2022 SENSS Score | 10 predictor parameters | Candidate (7): N/A; Final (7): Existing SENSS model | Predictor missingness ranged from 1.19% to 14.63%; Method: Multiple imputation | After model updating, SENSS calibration intercept improved to 0.35 (0.32–0.38); Calibration slope improved to 1.029 (1.01–1.05); Brier score = 0.09 (0.08–0.10) | The C-statistic (discrimination) = 0.83 (0.83–0.84) | NR | 0.453 | External: validation on separate Kenyan cohort | Linear predictor = −3.8583 + 5.7580 * ELBW + 3.7082 * VLBW + 0.9232 * LBW − 0.4918 * macrosomia − 0.1336 * Male + 1.3596 * difficulty feeding + 1.3977 * convulsion + 1.9790 * indrawing + 0.9584 * cyanosis + 1.6266 * floppy unable to suck |
Author, y, Model Name . | EPV or EPP . | Selection (N) of Candidate and Final Predictors . | N (%) and Handling of Missing Data . | Calibration Measures (Calibration Plot, Slope, Hosmer–Lemeshow) . | Discrimination Measures (C-statistic, D-statistic, AUC, logrank) . | Diagnostic Accuracy (Sensitivity, Specificity, PPV, NPV) . | Overall Measures (R2) . | Type of Validation and Methods . | Regression Equation . |
---|---|---|---|---|---|---|---|---|---|
Hailemeskel 2022 Hailemeskel Nomogram for clinical risk prediction | NR | Candidate (5): Variable selection in multivariable model made using least absolute shrinkage; Final (5): Lasso regression | N (%): NR; Method: NR | Calibrated P = 0.52 | AUROC = 0.93 (0.90–0.9) | Se = 0.77; Sp = 0.95; False-positive rate = 0.04; Prediction accuracy = 0.90 | NR | Internal: validated using 2000 bootstrap replicates; External: NR | Linear predictor of the equation model = −1.54 + 2.66 ∗GA <32 wk + 0.77 ∗multiple pregnancy 4.4; ∗has no RDS, + 2.1; ∗has not gotten KMC, + 1.52; ∗low birth wt |
Lee 2001 TRIPS-I & TRIPS-I modification | NR | Candidate (4): Based on univariable associations; Final (4): Significance of coefficients and clinical judgment, stepwise selection | 226 (9); Method: NR | Calibration of TRIPS for prediction of 7-d mortality (against validation cohort of pretransport data): Hosmer–Lemeshow χ2 = 3.54 (df = 4), (P = .47) | Mortality within 7d of NICU admission: ROC (P): TRIPS-I = 0.83 (0.47); TRIPS-I mod. = 0.91 (0.67); Total NICU mortality: ROC (P): TRIPS-I = 0.76 (0.22); TRIPS-I mod = 0.85 (0.70) | NR for models | NR | Internal: Random split data; External: None | β-coefficients were used to wt each variable’s impact, with the sum of these weighted scores constituting the TRIPS Score; Temperature (°C): <36.1 or >37.6 = 8; 36.1–36.5 or 37.2–37.6 = 1; 36.6–37.1 = 0; Respiratory status: Severe (apnea, gasping, intubated) = 14; Moderate (RR >60/min &/or SpO2 <85) = 5; None (RR <60/min & SpO2 >85) = 0; Systolic BP (mm Hg): <20 = 26; 20–40 = 16; >40 = 0; Response to noxious stimuli: None, seizure, muscle relaxant= 17; Lethargic response, no cry = 6; Withdraws vigorously, cries = 0 |
Lee 2013 TRIPS-II & TRIPS-II modification | NR | Candidate (4): Based on univariable associations; Final (4): Significance of coefficients and clinical judgment, stepwise selection | 42; Method: NR | Hosmer–Lemeshow χ2 = 8.3 (df = 5) (P = .14); TRIPS-II predicted mortality well across the full range of GA groups and TRIPS-II scores. Overall, the total NICU mortality rate increases as TRIPS-II increases | Mortality within 7 d of NICU admission: ROC (P): TRIPS = 0.90 (0.30); TRIPS-II mod = 0.91 (0.19); Total NICU Mortality: ROC (P): TRIPS = 0.87 (0.33); TRIPS-II mod = 0.90 (0.54) | NR models | NR | Internal: Random split data; External: None | β-coefficients were used to wt each variable’s impact, with the sum of these weighted scores constituting the TRIPS-II score; Temperature (°C): <36.1 or >37.6 = 5; 36.1–37.6 = 0; Respiratory status: Severe = 23; Moderate or none = 0; Systolic BP (mm Hg): <30 = 13; 30–40 = 8; >40 = 0; Response to noxious stimuli: None, seizure, muscle relaxant= 13; Lethargic response = 5; Withdraws vigorously, cries = 0 |
Mediratta, 2020 Mediratta Neonatal Mortality Score | 10 | Candidate: (12) Based on univariable associations; Final (4): Stepwise selection | N (%): NR; Method: variables missing ≥15% of data excluded; Single imputation | Derivation data set: Calibration slope= 0.84; Hosmer–Lemeshow = 16.5 (P = .09); Validation data set: calibration slope = 0.85; Hosmer–Lemeshow = 17.0 (P = .07) | Derivation AUC: 0.88 (0.85–0.91); Validation AUC: 0.85 (0.80–0.89) | Se = 0.81; Sp = 0.80; PPV = 0.58; NPV = 0.83 | NR | Internal: Bootstrap; Temporal: Different time period at the same hospital | Each variable in model assigned point value from 0 to 16 based on β coefficients in the multivariate model. Cutoff value for the score corresponding to 50% probability of mortality was 12 |
Singhi 1995 Singhi Score | NR | Candidate (11): Pearson's correlation; Final (6): Value of abnormal findings graded on a 3-point scale (0, 1, and 2) in the ascending order of severity (multiple stepwise regression) | N (%): NR; Method: NR | NR | NR | An impaired consciousness level with a total score of >7 had Se = 0.80; Sp = 0.89; PPV = 0.52; NPV = 0.97 in derivation (validation NR) | Consciousness = 0.19; Feeding = 0.20; Hydration = 0.22; Color = 0.23; Consolability = 0.24; Facial expression = 0.26 | NR | NR |
Russell 2023 NeoSep Severity Score NeoSep Recovery Score | NR | NeoSep severity: Candidate predictors (10), Final model (10), backward elimination (exit P = .05); NeoSep Recovery: Candidate predictors (7), Final model (7) Forward selection (entry P = .05) | N (%): NR Method: Factors with missing values >10% excluded | NeoSep Hosmer–Lemeshow P = .53 | NeoSep Severity Derivation: C-statistic = 0.77 (0.75–0.80); Validation: C-statistic = 0.76 (0.69–0.82); NeoSep Recovery Derivation: AUROC = 0.82 (0.78–0.85); Validation: AUROC = 0.85 (0.78–0.93) | NeoSep Severity: a score ≥5 at baseline was associated with 28-d mortality >10% (exact NR); NeoSep Recovery: Score >4; Se = 0.87 (0.60–0.98); Sp = 0.76 (0.71–0.79) in validation | NR | 15% randomly selected sample per site was reserved for model validation | NeoSep Severity: a points-based risk score, in which each predictor of death is assigned a number of points developed from model coefficients; NeoSep Recovery: a points-based risk score derived similarly to the baseline severity score |
Khan 2020 Regression formulas (Model 1 & 2) | NR | Candidate (16): Based on univariable associations; Final (4): Statistical significance | 638 (3.2) Method: NR | Derivation Hosmer–Lemeshow: Khan Model 1 = 6.28 (P = .62); Khan Model 2 = 10.25 (P = .25); Validation Hosmer–Lemeshow: Khan Model 1 = 6.56 (P = .77); Khan Model 2 = 10.12 (P = .43) | Derivation AUC: Khan Model 1 = 0.80 (0.76–0.84); Khan Model 2 = 0.75 (0.71–0.80); Validation AUC: Khan Model 1 = 0.80 (0.73–0.87); Khan Model 2 = 0.74 (0.66–0.81) | NR for models | NR | Internal: Random split data; External: None | Khan Model 1: 1 + e 7.787 − 4.853 (birth weight ≤1.5 kg) −1.904 (birth weight >1.5 kg) −0.097 (GA) + 0.426 (lethargy) + 0.326 (cyanosis) + 0.72 (non-cephalic presentation) + 0.947 (poor suckling); Khan Model 2: e2.707 − 0.203 (GA) + 0.668 (lethargy) + 0.904 (non-cephalic presentation) + +1.381 (poor suckling) |
Aluvaala 2021 SENSS Score | Derivation EPV = 45 (445/10); Validation: 151 deaths and 1476 nonevents | Candidate (11): Selected on the basis of availability in clinical practice; Final (7): Logistic regression with no variable selection | Derivation: 0.2% to 16%; Validation: 0.1% to 14%; Method: Missing at random Multiple imputation (chained equation approach) | Calibration intercept = −0.33 (−0.56 to −0.11) | Derivation C-statistic = 0.91 (0.89 to 0.93); External validation C-statistic = 0.89 (0.84 to 0.93) | NR | NR | Internal: bootstrapping; Temporal: applying the model coefficients obtained at derivation to temporal data | Linear predictor = −3.8583 + 5.7580 ∗ ELBW + 3.7082+ VLBW + 0.9232 ∗ LBW − 0.4918 ∗ Macrosomia − 0.1336 ∗ Male + 1.3596 ∗ Difficulty feeding + 1.3977 ∗ Convulsion + 1.9790 ∗ Indrawing + 0.9584 ∗ Cyanosis+ 1.6266 ∗ Floppy unable to suck |
Tuti 2022 SENSS Score | 10 predictor parameters | Candidate (7): N/A; Final (7): Existing SENSS model | Predictor missingness ranged from 1.19% to 14.63%; Method: Multiple imputation | After model updating, SENSS calibration intercept improved to 0.35 (0.32–0.38); Calibration slope improved to 1.029 (1.01–1.05); Brier score = 0.09 (0.08–0.10) | The C-statistic (discrimination) = 0.83 (0.83–0.84) | NR | 0.453 | External: validation on separate Kenyan cohort | Linear predictor = −3.8583 + 5.7580 * ELBW + 3.7082 * VLBW + 0.9232 * LBW − 0.4918 * macrosomia − 0.1336 * Male + 1.3596 * difficulty feeding + 1.3977 * convulsion + 1.9790 * indrawing + 0.9584 * cyanosis + 1.6266 * floppy unable to suck |
AUROC, area under the receiver operating characteristic curve; BP, blood pressure; df, degrees of freedom; ELBW, extremely low birthweight; EPV, events per variable; EPP, events per parameter; KMC, kangaroo mother care; LBW, low birthweight; Mod., modification; NPV, negative predictive value; NR, not reported; PPV, positive predictive value; R2, coefficient of determination; RDS, respiratory distress syndrome; ROC, receiver operating characteristic; RR, respiratory rate; Se, sensitivity; Sp, specificity; SpO2, saturation of peripheral oxygen; VLBW, very low birthweight.
Weighted Scores and Score Charts
All the studies that used weighted scores or score charts were hospital-based.23–27 Lee et al developed and internally validated the Transport Risk Index of Physiologic Stability (TRIPS) score among 1723 (608 in the internal validation sample) outborn newborns at 8 tertiary-level Canadian neonatal intensive care units (NICU; Tables 1 and 3). The TRIPS score predicted mortality within 7 days of NICU admission, with an AUC of 0.83 (CI: not reported; Table 2).23 A TRIPS score modification with additional variables, including small for GA, 5-minute Apgar score <7, and cesarean delivery, increased the AUC to 0.91 (CI: not reported).23 The TRIPS score and the TRIPS modification predicted total NICU mortality with AUCs of 0.76 (CI: not reported) and 0.85 (CI: not reported), respectively. All algorithms in Lee et al had very low COE.
Subsequently, Lee et al developed TRIPS version II (TRIPS-II) to predict 7-day NICU mortality in a larger sample of 17 075 newborns (Tables 1 and 3).24 The TRIPS-II model had an AUC of 0.90 (CI: not reported) in the validation sample of 5692 infants (Table 2).24 Combining TRIPS-II with GA and other variables like small for GA, 5-minute Apgar score <7, and cesarean delivery did not substantially improve discrimination (AUC 0.91 [CI: not reported]).24 The TRIPS-II score and the TRIPS-II modification predicted total NICU mortality with AUCs of 0.87 (CI: not reported) and 0.90 (CI: not reported), respectively. All algorithms in Lee et al had low certainty.
Mediratta et al derived and validated a 4-sign weighted Neonatal Mortality score for death during NICU admission in a retrospective case-control study in 1085 infants in Ethiopia (Tables 1 and 3).25 This score had sensitivity of 81% (CI: not reported) and specificity of 80% (CI: not reported) in the derivation sample of 812 infants. The Neonatal Mortality score had very low COE for both sensitivity and specificity (Table 2).25 In a temporal validation data set comprising a cohort of 246 infants from the same hospital in a different time period, the AUC was 0.85 (95% CI: 0.80–0.89, very low COE).25
Singhi et al developed and internally validated a 6-sign score among 116 infants aged 0 to 28 days presenting at pediatric emergency in India, predicting death due to serious illness (Tables 1 and 3).26 The authors of the study reported a sensitivity of 80% (CI: not reported) and specificity of 89% (CI: not reported) to classify mortality as a result of serious illness.26 The Singhi score had a very low COE for both sensitivity and specificity (Table 2).
Russell et al developed and validated the NeoSep Severity and NeoSep Recovery scores to predict neonatal mortality in a cohort of 3204 infants aged <60 days with clinical sepsis from 19 hospitals in 11 countries (Asia, Africa, Europe and South America; Tables 1 and 3).27 The NeoSep Severity score is a 10-sign weighted score that had an AUC of 0.76 (95% CI 0.69–0.82, low COE) to predict neonatal mortality in the internal validation sample of 478 infants (Table 2).27 Using the 7-sign time-varying (daily) weighted NeoSep Recovery, a score of ≥4 had a sensitivity of 87% (95% CI: 60% to 98%, very low COE) and specificity of 76% (95% CI: 71% to 79% moderate COE) and a good discriminatory ability (AUC 0.85, 95% CI: 0.78–0.93, low COE) to predict neonatal mortality in the internal validation sample of 478 infants27 (Table 2).
Regression Formulas
Khan et al also developed and validated a 6-sign prediction model and a 4-sign prediction model in the format of logistic regression formulas in a sample of 14 944 infants (4983 in the internal validation sample; Tables 1 and 3).21 The 6-sign model incorporated birth weight, GA, lethargy, cyanosis, non-cephalic presentation, and trouble suckling, and it demonstrated good discriminatory ability for predicting neonatal death with AUC of 0.80 (95% CI: 0.73–0.87) in the validation cohort.21 A more simplified version of the equation in the same study excluding birth weight and cyanosis had fair discrimination at AUC 0.74 (95% CI: 0.66–0.81) in the validation set.21 Both Khan regression formula models had low COE.
Aluvaala et al derived and externally validated the Score for Essential Neonatal Symptoms and Signs (SENSS) in a large maternity hospital in Nairobi, Kenya on a sample of 7054 neonates (1627 infants in the internal validation sample; Tables 1 and 3).28 The score is a 7-sign multivariable regression formula and the AUC for temporal internal validation was 0.89 (95% CI: 0.84–0.93, moderate COE; Table 2).28
Tuti et al externally validated and updated the SENSS score to predict all-cause in-hospital neonatal mortality among 53 909 infants in a large multicountry study using retrospectively collected routine clinical data from 16 hospitals in Kenya (Tables 1 and 3).29 The score had a AUC of 0.83 (95% CI: 0.83–0.84, moderate COE; Table 2).29 The calibration of the original SENSS model was poor, reflected by the calibration intercept and slope as reported by the authors.29
Discussion
Early and accurate identification of infants at the highest risk of mortality is the critical first step required to deliver evidence-based interventions to avert death. In this systematic review, we identified 11 studies in which the authors reported on 26 clinical sign algorithms to identify young infants at risk for mortality between 0 and 59 days of life. Algorithm formats ranged from simple checklists, most often used at the community level, to regression formulas used in neonatal intensive care settings. The algorithms included 4 to 11 signs, including GA, birth weight, temperature abnormality, feeding difficulty, level of consciousness and respiratory distress. Overall, all studies were of very low to moderate COE, and only 2 algorithms were externally validated in 4 studies.19,20,28,29
An adaptation of the maternal “Three Delays Model”30 outlines key time points at which timely interventions are critical to reduce neonatal and infant morbidity and mortality as follows: (1) the recognition of danger signs and decision to seek care, (2) reaching an appropriate source of care, and (3) obtaining adequate and appropriate treatment. The algorithms included in this review can be implemented at these different stages of the Three Delays Model continuum, from home to transport to hospital to inpatient care.31 During home visits or at primary health facilities, identifying high-mortality risk infants may allow for interventions including urgent referral to hospital or initiating empirical antibiotics to cover possible sepsis. We identified 13 checklists including signs and symptoms feasible for frontline health workers that were developed and validated at the community level to identify infants at high-mortality risk during home visits.19,20 These checklists tended to rely on signs ascertained by history and postnatal physical examination and did not include birth history, risk factors, or measures such as birth weight or GA. The SEARCH algorithms had high sensitivity and specificity for predicting sepsis-specific death, as identified by a neonatologist in the original study in which it was derived and internally validated. However, in the external validation cohort of Darmstadt et al in Bangladesh, the sensitivity was substantially lower. This marked difference in performance may have been due to the different outcome (ie, all-cause mortality as opposed to sepsis-specific mortality in the original Bang et al study), different age group (0–10 days in Darmstadt et al versus 0–28 days in Bang et al), different setting, different population and population-to-health worker ratio, and different epidemiological characteristics among the study neonates. Among the other community level sign-based checklists, none had adequate sensitivity (≥80%), but all had high specificity (>90%) for predicting mortality.20,21 The high specificity suggests that young infants who survive will commonly have a negative test result based on the checklists’ criteria and will be correctly identified as surviving in nonhospital settings. However, the low sensitivity suggests that the checklists may fail to identify a large number of infants who die. With the inclusion of birth weight and GA, the regression formulas developed in the Khan study for community-level use had better performance.21 Small size at birth (preterm, low birth weight, or small for GA) contributes to half of neonatal deaths globally.32 In a machine learning model recently developed by the Global Network, birth weight was the strongest predictor of neonatal mortality,33 although this study was excluded from the current review because the model did not include postnatal clinical signs. In LMICs, birth weight and GA are often unavailable when antenatal care is limited, GA is unknown, and many deliveries occur at home. Alternative methods of clinically estimating GA using anthropometric, physical, and neuromuscular signs may allow for more feasible and accurate GA estimation in low-resource settings.34
The identification of high-mortality risk among outborn infants transported to NICUs may be useful to prepare interventions and personnel resources at the NICU while the infants are en route. The TRIPS score consisting of 4 physical examination signs is also feasible in the nonhospital setting before accessing an appropriate source of care. The addition of perinatal risk factors (GA, 5-min Apgar score, and cesarean delivery) increased the TRIPS algorithm’s discriminatory value. These risk factors may therefore be important predictors of mortality in outborn infants who are in the process of being transported to appropriate places of care.
The hospital-based algorithms contained certain signs more applicable to the hospital or NICU settings, including respiratory status and support, vital signs, including temperature and blood pressure, and, in some cases, kangaroo mother care and evidence of shock. At least 1 of the prediction models developed in each of these LMIC studies demonstrated good discrimination with 1 prediction model (Hailemeskel score), demonstrating excellent discrimination. The SENSS score was externally validated and had good discriminatory value with moderate COE. Thus, using only 4 to 10 clinical signs without the support of laboratory investigations in resource-limited settings, algorithms still achieved good to excellent discriminatory value in predicting young infants at risk for future death. The hospital- and NICU-based infant clinical sign algorithms may therefore hold promise for rapid bedside identification of infants at high mortality risk in low-resource settings. Once identified, timely hospital interventions may be implemented, including the rewarming of hypothermic infants, intravenous hydration, septic workups, timely antibiotic administration, and escalation in respiratory support when available. Algorithms may also help prioritize resource allocation to infants at highest risk of death.
After an infant is admitted to the hospital or NICU, their clinical status may continue to change on a daily basis. The NeoSep Recovery score was a time-varying prediction model allowing for an infant’s risk estimate to be updated as new information becomes available.27 This time-varying model demonstrated higher predictive accuracy compared with the baseline model (NeoSep Severity score).27 During hospitalization, time-varying models may better reflect evolving patient clinical trajectories and dynamic decision-making in clinical practice.35,36
In LMICs, the presentation format, practical application, and feasibility of use of regression formulas are an important consideration. Weighted scores or score charts and nomograms are simpler ways of applying prediction models than regression formulas or equations. Although both methodologies are designed to predict outcomes on the basis of regression formulas using multiple variables, weighted scores offer a more accessible alternative by allowing health care providers to compute results through basic tabulation or summation of scores through a chart or simple calculation because variables are assigned numeric values. In contrast, regression formulas, which may require more complex calculation, necessitating digital tools such as a web app or online calculator, may be prone to error if individually calculated. This requirement often poses a challenge in LMIC settings because of limited digital infrastructure, often unreliable internet connection, and power outages. Consequently, despite the potential for marginally superior accuracy with regression formulas, the operational feasibility of weighted scores may make them a pragmatic choice in LMIC contexts.
Previous systematic reviews of prediction models for infant mortality have included laboratory tests as predictors (eg, blood gas, hematologic parameters, etc).37–39 In 2011, Medlock et al identified 41 prediction model development studies for prediction of mortality in very premature infants with fair to excellent discriminatory ability (AUCs ranging from 0.70 to 0.96).37 Our review excluded laboratory tests because the focus was on infant signs alone and to inform WHO guidelines. However, the AUC range across studies included in our review was similar to this previous review (ie, 0.76 to 0.93). This suggests that the discriminatory ability of prediction models that rely solely on infant clinical signs may not be inferior to models that include laboratory tests, and these clinical sign-based models have the advantage of being more feasible in different levels of the health system or community in LMIC settings.
There were several limitations of the current evidence, particularly the considerable heterogeneity and lack of external validation of the included algorithms and prediction models. Robust external validation is needed before the widespread use of such algorithms and scores. However, conducting external validation of prediction models is challenging in low-resource settings with limited data availability, particularly of input covariates such as GA or birth weight. Improving data collection in low-resource settings and external validation of existing algorithms should be prioritized and will aid in implementing high-performing models into clinical practice. The COE of algorithms where GRADE was performed was very low to moderate.
Conclusions
Algorithms leveraging infant clinical signs have demonstrated fair to excellent discriminatory value to predict young infant mortality in a range of settings, including LMICs. Risk prediction is instrumental in the early identification of critically ill infants, thereby expediting the initiation of targeted therapeutic interventions and the appropriate allocation of scarce resources, which is pivotal for young infant survival. Limited external validation impedes the translation of these algorithms into practical and feasible clinical decision tools. Improving data collection and management in low-resource settings may allow for the external validation of well-performing young infant mortality prediction algorithms.
Acknowledgments
This work is dedicated to our colleague and dear friend Rebecca E. Rosenberg (1977–2023), who passed away during the study and made critical contributions to data extraction and synthesis from the initial study conception. Becca’s wit and humor made us laugh at every meeting, and her own research and contributions to newborn health worldwide will be everlasting. Further, Yasir Shafiq has joined this research work under the framework of the International PhD in Global Health, Humanitarian Aid, and Disaster Medicine jointly organized by Università del Piemonte Orientale (UPO).
Mr Shafiq and Dr Fung conceptualized and designed the study, designed the data collection instruments, screened studies, collected data, conducted data analysis, and drafted the initial manuscript; Ms Driker screened studies, collected data, and conducted the data analysis; Dr Rosenberg screened studies, collected data, and extracted data; Ms Hussaini and Ms Adnan screened studies and collected data; Drs Rees and Mediratta screened studies, collected data, and assisted with interpretation of the results; Ms Wade designed the search strategies and conducted the searches across all databases; Dr Chou provided inputs on the methodology and presentation of the results; Dr Edmond conceptualized the study and provided inputs on the presentation of the results; Dr North conceptualized and designed the study and interpreted results; Dr Lee conceptualized and designed the study, conducted data extraction, collected data, and interpreted the results; and all authors reviewed and revised the manuscript, approved the final manuscript as submitted, and agreed to be accountable for all aspects of the work.
This trial has been registered at www.crd.york.ac.uk/prospero (identifier CRD42023431387).
FUNDING: Brigham and Women’s Hospital received funding from the World Health Organization (WHO) to complete this work. The sponsor commissioned the review for the guideline development group meeting for the development of WHO recommendations on the management of serious bacterial infection in young infants aged 0 to 59 days. The sponsor provided inputs on the presentation of the results and manuscript.
CONFLICT OF INTEREST DISCLOSURES: Karen Edmond is an employee of the sponsor, the WHO. Roger Chou is the GRADE methodologist for the WHO guidelines for the management of severe bacterial infections in infants aged 0 to 59 days. The remaining authors have indicated they have no potential conflicts of interest relevant to this article to disclose.
- AUC
area under the curve
- CI
confidence interval
- COE
certainty of evidence
- GA
gestational age
- GRADE
Grading of Recommendations Assessment Development and Evaluation
- HICs
high-income countries
- LMIC
low- and middle-income country
- ROB
risk of bias
- SEARCH
Society for Education Action and Research in Community Health
- SENSS
Score for Essential Neonatal Symptoms and Signs
- TRIPS
Transport Risk Index of Physiologic Stability
- TRIPS-II
TRIPS version II
- WHO
World Health Organization
- YIS-2
Young Infants Signs-2
Comments