CONTEXT

Clinical sign algorithms are a key strategy to identify young infants at risk of mortality.

OBJECTIVE

Synthesize the evidence on the accuracy of clinical sign algorithms to predict all-cause mortality in young infants 0–59 days.

DATA SOURCES

MEDLINE, Embase, CINAHL, Global Index Medicus, and Cochrane CENTRAL Registry of Trials.

STUDY SELECTION

Studies evaluating the accuracy of infant clinical sign algorithms to predict mortality.

DATA EXTRACTION

We used Cochrane methods for study screening, data extraction, and risk of bias assessment. We determined certainty of evidence using Grading of Recommendations Assessment Development and Evaluation.

RESULTS

We included 11 studies examining 26 algorithms. Three studies from non-hospital/community settings examined sign-based checklists (n = 13). Eight hospital-based studies validated regression models (n = 13), which were administered as weighted scores (n = 8), regression formulas (n = 4), and a nomogram (n = 1). One checklist from India had a sensitivity of 98% (95% CI: 88%–100%) and specificity of 94% (93%–95%) for predicting sepsis-related deaths. However, external validation in Bangladesh showed very low sensitivity of 3% (0%–10%) with specificity of 99% (99%–99%) for all-cause mortality (ages 0–9 days). For hospital-based prediction models, area under the curve (AUC) ranged from 0.76–0.93 (n = 13). The Score for Essential Neonatal Symptoms and Signs had an AUC of 0.89 (0.84–0.93) in the derivation cohort for mortality, and external validation showed an AUC of 0.83 (0.83–0.84).

LIMITATIONS

Heterogeneity of algorithms and lack of external validation limited the evidence.

CONCLUSIONS

Clinical sign algorithms may help identify at-risk young infants, particularly in hospital settings; however, overall certainty of evidence is low with limited external validation.

Globally, nearly 2.3 million neonates die each year,1 with 80% of deaths occurring in Sub-Saharan Africa and South Asia.2 A significant proportion of these deaths are due to potentially preventable causes, such as prematurity, infections, and birth asphyxia.3 The timely identification of severe illness among high-risk young infants aged 0 to 59 days in the community, as well as in health care facilities, is crucial for accurate diagnosis and the initiation of appropriate management to prevent mortality and other negative health outcomes.4 Clinical signs in young infants are often challenging to detect and nonspecific; however, they may be the first indication of a sick infant, particularly in settings in which there is limited access to laboratory diagnostics and advanced monitoring.5 

A key strategy to identify infants at risk for severe illness or mortality in low- and middle-income countries (LMICs) is through algorithms using infant clinical signs.5 The application of these tools may vary on the basis of the point of care of infant assessment, health system setting, level of health care, availability of laboratory or other testing, and provider awareness of these clinical decision tools. In the nonhospital-based or community setting, the World Health Organization (WHO) Integrated Management of Childhood Illness clinical sign checklist is used by frontline health workers to identify infants requiring immediate treatment with antibiotics and referral to higher-level care and provides a scalable approach to enhance early diagnosis and management of serious illness leading to death.5 

In high-income countries (HICs), in which laboratory and imaging investigations are often more accessible and are integrated into clinical practice, clinical sign-based algorithms typically serve a supplementary role, primarily in triaging infants for mortality risk.6 In these settings, algorithms are often integrated into a broader diagnostic framework, assisting in the early identification of infants who might be at a higher risk of critical and fatal outcomes.7 In hospital-based settings, algorithms that are derived from regression modeling and incorporate more vital signs have been tested and validated.6–8 These models may analyze electronic health records and patient monitoring data to detect patterns in clinical presentations and stratify risk.6–8 Conversely, in LMICs, the reliance on clinical sign-based algorithms is not merely a matter of convenience but a critical necessity because of the constrained access to diagnostic testing.

It is critical to examine the predictive accuracy and performance of infant clinical sign algorithms to identify infants at the highest mortality risk. To our knowledge, no previous systematic reviews have assessed the accuracy of clinical sign algorithms for predicting death among young infants aged 0 to 59 days. To inform the WHO Guideline Development Group for Young Infant Sepsis, our objective was to systematically review the evidence on the accuracy of algorithms including infant clinical signs to predict mortality among young infants. The review aimed to answer the following population, index test, comparator, outcome, timing, and setting question: Among young infants aged 0 to 59 days at presentation, what is the accuracy of infant clinical sign algorithms to predict mortality from any cause by 59 days of life in any setting?

The systematic review protocol was registered prospectively with PROSPERO (CRD42023431387). In this article, we followed the Preferred Reporting Items for Systematic Reviews and Meta-Analyses9 and Transparent Reporting of a Multivariable Prediction Model for Individual Prognosis or Diagnosis checklists,10 as outlined in Supplemental Table S1. In this review, “infant clinical sign algorithms” were defined as tools used in medical decision-making on the basis of 2 or more clinical signs or symptoms from an infant’s history or physical examination. The main outcome of interest was mortality among young infants by 59 days of life.

A medical research librarian (CGW) conducted a search across multiple databases, including MEDLINE, Embase, CINAHL, Global Index Medicus, and the Cochrane CENTRAL Registry of Trials on May 8, 2023. The search strategy encompassed terms related to neonates or infants, specific individual clinical symptoms and signs and sepsis, mortality, and diagnostic accuracy measures (Supplemental Table S2). A search of systematic reviews on the diagnostic accuracy of infant clinical signs for predicting death in neonates was also conducted on January 15, 2024 (Supplemental Table S3). Additional studies were identified by hand-searching the bibliographies of relevant systematic reviews.

Study screening was performed independently by 2 reviewers in Covidence, first by examining titles and abstracts, followed by a full-text review. Disagreements were adjudicated and resolved by a third reviewer.

Studies were included if they (1) evaluated algorithms or models that included at least 2 postnatal infant clinical signs assessed by physical examination or by maternal recall or history, (2) contained either a primary or subgroup analysis of infants assessed between 0 and 59 days, (3) reported all-cause or cause-specific mortality up to 59 days of life, and (4) reported at least 1 diagnostic accuracy statistic (ie, sensitivity, specificity, positive or negative predictive value, or likelihood ratio), model calibration (calibration plot, slope, Hosmer–Lemeshow statistic), or discrimination measure (C-statistic, D-statistic, area under the curve [AUC], logrank). We excluded (1) studies restricted to specialized populations (eg, infants with congenital heart disease), (2) review articles, conference proceedings, study protocols, case reports, and commentaries, (3) studies without a primary or subgroup analysis of young infants aged 0 to 59 days, (4) studies of algorithms with no postnatal signs (ie, Apgar scores or gestational age [GA] at birth only), (5) algorithms including biomarkers, or (6) studies of algorithms that reported signs only feasible for settings with resources for advanced clinical care (ie, ventilation, arterial blood gases, pH, continuous blood pressure/heart rate monitoring).

Data were independently extracted by 2 reviewers into a predesigned form in Excel based on the Critical Appraisal and Data Extraction for Systematic Reviews of Prediction Modeling Studies checklist.11,12 Data were extracted on the study design, setting, population characteristics, number of infants, number of death events, timing and classification of death, model characteristics (candidate predictors, model building, statistical methods, calibration measures, discrimination measures, validation),11,13 and diagnostic accuracy (sensitivity, specificity, positive or negative predictive value, positive or negative likelihood ratio). Model characteristics were extracted for both the derivation data set (population/cohort used to develop the model) and the validation data set (cohort in which the model was validated). We extracted data on model validation methods. Internal validation is defined as the examination of model performance within a random subset of available data within the same cohort (population or geography) or a nonrandom subset of available data. Temporal validation (validation in the same population in a different time period) was considered internal validation. External validation was considered an evaluation of model performance in a different population.14 When data were available on both derivation and validation data sets, model performance was assessed according to the validation cohort. We categorized AUC values as follows: excellent discrimination, AUC of ≥0.90; good discrimination, AUC of 0.80 to 0.89; fair discrimination, AUC of 0.70 to 0.79; and poor discrimination, AUC <0.70.15 Discrepancies in the extracted data were adjudicated and resolved by a third reviewer.

Quality Assessment of Prognostic Accuracy Studies was used to assess the risk of bias (ROB) in 5 key domains.16 These domains were patient selection, index test, outcome, flow and timing, and analysis. Each domain was evaluated for ROB and applicability concerns by 2 independent reviewers, and disagreements were resolved by a third. Full details on the ROB assessment are shown in the Supplemental Fig S1.

We used the Grading of Recommendations Assessment Development and Evaluation (GRADE) approach for diagnostic tests and strategies to evaluate the certainty of evidence (COE) for studies in which the authors reported on diagnostic accuracy (sensitivity, specificity) or discrimination (AUC) parameters, preferentially GRADEing validation data.17,18 Criteria used to assess and grade ROB, indirectness, inconsistency, and imprecision are shown in the Supplemental Table S4.

Algorithms for infant signs were categorized using a statistical approach to develop the model as follows: (1) checklist (presence of a minimum number of clinical signs) or (2) prediction model with multivariable regression formulas. Models were additionally categorized by clinical presentation format as follows: checklist, nomogram, weighted score or score chart summed to calculate a total score, and regression formulas, including variables for different signs and risk factors. Studies were not suitable for pooling because of heterogeneity in settings and populations and the algorithms, measures, and checklists evaluated; therefore, graphical or statistical methods for detecting small sample effects could not be performed.

A total of 6701 publications were identified from databases, and 83 were identified from bibliography searches of 7 systematic reviews and additional hand-searching. After the removal of duplicate records, 5683 abstracts were screened, of which 650 underwent a full-text review, and 11 studies met inclusion criteria (Fig 1).19–29 Characteristics of the included studies are shown in Table 1 with detailed descriptions in the Supplemental Table S5. The reasons for exclusion at the full-text stage are provided in Supplemental Table S6. Overall ROB assessments are shown in Fig 2, with study-level assessments available in Supplemental Fig S1. The authors of the included 11 studies reported on 26 different algorithms, of which 13 were Integrated Management of Childhood Illness-like signs checklists,19–21 and the remaining 13 were regression-based prediction models. Of the prediction models, the presentation format for clinicians was as a nomogram chart (n = 1),22 weighted scores or score chart (n = 8),23–27 and regression equations or formulas (n = 4).21,28,29 Three studies were conducted in nonhospital settings,19–21 and 8 studies were conducted in hospital/NICU-based settings.22–29 Eight studies were conducted in LMICs,19–22,25,26,28,29 2 were conducted in HICs,23,24 and 1 multicenter study spanned both HICs and LMICs.23 Sample sizes ranged from 116 to 53 909, with a total of 115 040. The median sample size was 3567, and the interquartile range was 10 660.

FIGURE 1

PRISMA flow diagram of study selection.

FIGURE 1

PRISMA flow diagram of study selection.

Close modal
TABLE 1

Characteristics of Included Studies

Author (y)Study SettingParticipantsMortality Follow-Up PeriodSample SizeReference StandardName of Model/
Algorithm
Infant Clinical SignsDescription of Model/
Algorithm
Nonhospital algorithms 
Bang 2005 India
(Gadchiroli community) 
Home visits, all neonates born in 39 villages eligible; infants aged 0–28 d Within 0–28 d of life 3567 Sepsis-attributed neonatal death (neonatologist assigned) SEARCH Checklist 1 Cry weak or stopped, sucking reduced or stopped,* limbs loose,* infant was cold,* vomiting* or abdominal distension, umbilical sepsis, chest indrawing Checklist:
Any 2 of 7 signs 
2804 SEARCH Checklist 2 Cry weak or stopped, sucking reduced or stopped,* limbs loose,* infant was cold,* vomiting* or abdominal distension, umbilical sepsis Checklist:
Any 2 of 6 signs 
SEARCH Checklist 3 Cry weak or stopped, sucking reduced or stopped,* limbs loose,* infant was cold,* vomiting* or abdominal distension, umbilical sepsis, respiratory rate ≥60 Checklist:
Any 2 of 7 signs 
SEARCH Checklist 4 Cry weak or stopped, sucking reduced or stopped,* limbs loose,* infant was cold,* vomiting* or abdominal distension, umbilical sepsis, grunt Checklist:
Any 2 of 7 signs 
SEARCH Checklist 5 Cry weak or stopped, sucking reduced or stopped,* limbs loose,* infant was cold,* vomiting* or abdominal distension, umbilical sepsis, grunt or chest indrawing Checklist:
Any 2 of 7 signs 
SEARCH Checklist 6 Cry weak or stopped, sucking reduced or stopped,* limbs loose,* infant was cold,* vomiting* or abdominal distension, umbilical sepsis, respiratory rate ≥60 or chest indrawing Checklist:
Any 2 of 7 signs 
Darmstadt 2011 Bangladesh
(Mirzapur community) 
Home visits among intervention arm of cohort; infants aged 0–9 d Within 10 d of life 6924 Neonatal mortality Projahnmo Revised Convulsion, RR (70/min), severe chest indrawing, severe fever (T >101 F), severe hypothermia (T <95.5 F), weak, abnormal, or absent cry, unconscious, lethargic/less than normal movement, not able to feed or suck at all, severe skin infection, umbilical erythema Checklist:
Any 1 of 11 signs 
Projahnmo Modification F Severe chest indrawing, moderate to severe fever (T = 100 F), moderate to severe hypothermia (T <97.5 F), lethargic or less than normal movement, history of feeding problems, jaundice Checklist:
Any 1 of 6 signs 
YIS-2 History of convulsion, RR (60/min), severe chest indrawing present, fever (T >99.5 F), hypothermia (T <95.9 F), lethargic or less than normal movement, history of feeding problems Checklist:
Any 1 of 7 signs 
YIS-2 Modification Z History of convulsion, severe chest indrawing present, moderate to severe fever (T = 100 F), moderate to severe hypothermia (T <97.5 F), lethargic/less than normal movement, history of feeding problems, jaundice Checklist:
Any 1 of 7 signs 
SEARCH Checklist 1 Stopped sucking, weak or no cry, limbs becoming limp, vomiting or abdominal distension, infant cold to touch, severe chest indrawing, umbilical infection Checklist:
Any 2 of 7 signs 
SEARCH Checklist 1
Modification 
Stopped sucking, weak or no cry, limbs becoming limp, vomiting or abdominal distension, infant cold to touch, severe chest indrawing, umbilical infection Checklist:
Any 1 of 7 signs 
Khan 2020 Bangladesh
(Gaibandha and Rangpur Districts) 
All live-born infants in cohort cluster-randomized trial of home visits who lived >48 h;
infants aged 2–28 d 
Within 2–28 d of life 19 927
(Derivation: 14 944 Validation: 4983) 
Neonatal death (verbal autopsy) Khan Checklist Lethargy, cyanosis, non-cephalic presentation, trouble suckling Checklist:
Any 1 of 4 signs 
Khan Model 1 Birth wt, GA, lethargy, cyanosis, non-cephalic presentation and trouble suckling 6-sign regression formula 
Khan Model 2 GA, non-cephalic presentation, lethargy, trouble suckling 4-sign regression formula 
Hospital-based algorithms 
Hailemeskel 2022 Ethiopia
(South Gondar Zone) 
Preterm infants in NICU in 4 public hospitals; infants aged 0–3 d Within 72 h of life 456 Death Hailemeskel nomogram for clinical risk prediction GA, respiratory distress syndrome, multiple neonates, low birth weight, and kangaroo mother care Nomogram 
Lee 2001 Canada All outborn infants transported from community hospitals to 15 tertiary NICUs; infants aged 0–28 d Within 7 d of NICU admission, or total NICU mortality 1723
(Derivation: 1115 Validation: 608) 
Mortality TRIPS-I Score Temperature, respiratory status, blood pressure, and response to noxious stimuli 4-sign weighted score 
TRIPS-I Modification Temperature, respiratory status, blood pressure, and response to noxious stimuli, GA, small for GA, 5-min Apgar score <7, cesarean delivery 
Lee 2013 Canada All outborn and inborn infants admitted to 8 tertiary NICUs; infants aged 0–28 d Within 7 d of NICU admission, or total NICU mortality 17 075
(Derivation: 11 383 Validation: 5692) 
Mortality TRIPS-II Score Temperature, respiratory status, blood pressure, and response to noxious stimuli 4-sign weighted score
(updated TRIPS-I weighting) 
TRIPS-II Modification Temperature, respiratory status, blood pressure, and response to noxious stimuli, GA, small for GA, 5 min Apgar <7 and cesarean section 
Mediratta 2020 Ethiopia
(Gondar) 
Admitted neonates in university NICU; infants aged 0–28 d Within 28 d 1085
(Derivation: 812
Validation: 246) 
Mortality in NICU Mediratta Neonatal Mortality Score Admission level of consciousness, admission respiratory distress, GA, and birth weight 4-sign weighted score 
Singhi 1995 India
(Chandigarh) 
Pediatric emergency of Nehru Hospital, infants aged 0–60 d Within 60 d of life 116 Serious illness, bacteremia, or death
(culture-
confirmed) 
Singhi Score Consciousness, feeding, hydration, color, consolability, facial expression 6-sign weighted score 
Russell 2023 Bangladesh, China, India, Thailand, Vietnam, Kenya, South Africa, Uganda, Italy, Greece, Brazil Admitted infants treated with antibiotics for new episode of sepsis at 19 hospital sites (secondary and tertiary referral hospital) in 11 countries; infants aged 0–60 d Within 28 d after enrollment 3204
(Derivation: 2726
Validation: 478) 
Sepsis death (culture-
confirmed) 
NeoSep Severity Score Birth weight, GA, hospitalization duration, congenital anomalies, level of respiratory support, clinical signs (abnormal temp, abdominal distension, lethargy/reduced movement, difficulty feeding, evidence of shock) 10-sign weighted score 
NeoSep Recovery Score Cyanosis, level or respiratory support, clinical signs (abnormal temp, abdominal distension, lethargy/reduced movement, difficulty feeding, evidence of shock) 7-sign time-varying (daily) weighted score 
Aluvaala 2021 Kenya
(Nairobi) 
Neonatal unit admissions to a large urban maternity hospital; infants aged 0–28 d Within hospital
neonatal unit stay (most occurred in the first week of life) 
7054
(Derivation: 5427
Validation:
1627) 
All-cause mortality SENSS Score Difficulty feeding, convulsions, indrawing, central cyanosis, and floppy/inability to suck, as assessed at admission, birth weight (category), sex 7-sign regression formula 
Tuti 2022 Kenya Newborns admitted to NBUs in 16 hospitals; infants aged 0–28 d Within hospital neonatal unit stay 53 909 All-cause mortality SENSS Score Difficulty feeding, convulsions, indrawing, central cyanosis, and floppy/inability to suck, as assessed at admission, birth weight (category), sex 7-sign regression formula 
Author (y)Study SettingParticipantsMortality Follow-Up PeriodSample SizeReference StandardName of Model/
Algorithm
Infant Clinical SignsDescription of Model/
Algorithm
Nonhospital algorithms 
Bang 2005 India
(Gadchiroli community) 
Home visits, all neonates born in 39 villages eligible; infants aged 0–28 d Within 0–28 d of life 3567 Sepsis-attributed neonatal death (neonatologist assigned) SEARCH Checklist 1 Cry weak or stopped, sucking reduced or stopped,* limbs loose,* infant was cold,* vomiting* or abdominal distension, umbilical sepsis, chest indrawing Checklist:
Any 2 of 7 signs 
2804 SEARCH Checklist 2 Cry weak or stopped, sucking reduced or stopped,* limbs loose,* infant was cold,* vomiting* or abdominal distension, umbilical sepsis Checklist:
Any 2 of 6 signs 
SEARCH Checklist 3 Cry weak or stopped, sucking reduced or stopped,* limbs loose,* infant was cold,* vomiting* or abdominal distension, umbilical sepsis, respiratory rate ≥60 Checklist:
Any 2 of 7 signs 
SEARCH Checklist 4 Cry weak or stopped, sucking reduced or stopped,* limbs loose,* infant was cold,* vomiting* or abdominal distension, umbilical sepsis, grunt Checklist:
Any 2 of 7 signs 
SEARCH Checklist 5 Cry weak or stopped, sucking reduced or stopped,* limbs loose,* infant was cold,* vomiting* or abdominal distension, umbilical sepsis, grunt or chest indrawing Checklist:
Any 2 of 7 signs 
SEARCH Checklist 6 Cry weak or stopped, sucking reduced or stopped,* limbs loose,* infant was cold,* vomiting* or abdominal distension, umbilical sepsis, respiratory rate ≥60 or chest indrawing Checklist:
Any 2 of 7 signs 
Darmstadt 2011 Bangladesh
(Mirzapur community) 
Home visits among intervention arm of cohort; infants aged 0–9 d Within 10 d of life 6924 Neonatal mortality Projahnmo Revised Convulsion, RR (70/min), severe chest indrawing, severe fever (T >101 F), severe hypothermia (T <95.5 F), weak, abnormal, or absent cry, unconscious, lethargic/less than normal movement, not able to feed or suck at all, severe skin infection, umbilical erythema Checklist:
Any 1 of 11 signs 
Projahnmo Modification F Severe chest indrawing, moderate to severe fever (T = 100 F), moderate to severe hypothermia (T <97.5 F), lethargic or less than normal movement, history of feeding problems, jaundice Checklist:
Any 1 of 6 signs 
YIS-2 History of convulsion, RR (60/min), severe chest indrawing present, fever (T >99.5 F), hypothermia (T <95.9 F), lethargic or less than normal movement, history of feeding problems Checklist:
Any 1 of 7 signs 
YIS-2 Modification Z History of convulsion, severe chest indrawing present, moderate to severe fever (T = 100 F), moderate to severe hypothermia (T <97.5 F), lethargic/less than normal movement, history of feeding problems, jaundice Checklist:
Any 1 of 7 signs 
SEARCH Checklist 1 Stopped sucking, weak or no cry, limbs becoming limp, vomiting or abdominal distension, infant cold to touch, severe chest indrawing, umbilical infection Checklist:
Any 2 of 7 signs 
SEARCH Checklist 1
Modification 
Stopped sucking, weak or no cry, limbs becoming limp, vomiting or abdominal distension, infant cold to touch, severe chest indrawing, umbilical infection Checklist:
Any 1 of 7 signs 
Khan 2020 Bangladesh
(Gaibandha and Rangpur Districts) 
All live-born infants in cohort cluster-randomized trial of home visits who lived >48 h;
infants aged 2–28 d 
Within 2–28 d of life 19 927
(Derivation: 14 944 Validation: 4983) 
Neonatal death (verbal autopsy) Khan Checklist Lethargy, cyanosis, non-cephalic presentation, trouble suckling Checklist:
Any 1 of 4 signs 
Khan Model 1 Birth wt, GA, lethargy, cyanosis, non-cephalic presentation and trouble suckling 6-sign regression formula 
Khan Model 2 GA, non-cephalic presentation, lethargy, trouble suckling 4-sign regression formula 
Hospital-based algorithms 
Hailemeskel 2022 Ethiopia
(South Gondar Zone) 
Preterm infants in NICU in 4 public hospitals; infants aged 0–3 d Within 72 h of life 456 Death Hailemeskel nomogram for clinical risk prediction GA, respiratory distress syndrome, multiple neonates, low birth weight, and kangaroo mother care Nomogram 
Lee 2001 Canada All outborn infants transported from community hospitals to 15 tertiary NICUs; infants aged 0–28 d Within 7 d of NICU admission, or total NICU mortality 1723
(Derivation: 1115 Validation: 608) 
Mortality TRIPS-I Score Temperature, respiratory status, blood pressure, and response to noxious stimuli 4-sign weighted score 
TRIPS-I Modification Temperature, respiratory status, blood pressure, and response to noxious stimuli, GA, small for GA, 5-min Apgar score <7, cesarean delivery 
Lee 2013 Canada All outborn and inborn infants admitted to 8 tertiary NICUs; infants aged 0–28 d Within 7 d of NICU admission, or total NICU mortality 17 075
(Derivation: 11 383 Validation: 5692) 
Mortality TRIPS-II Score Temperature, respiratory status, blood pressure, and response to noxious stimuli 4-sign weighted score
(updated TRIPS-I weighting) 
TRIPS-II Modification Temperature, respiratory status, blood pressure, and response to noxious stimuli, GA, small for GA, 5 min Apgar <7 and cesarean section 
Mediratta 2020 Ethiopia
(Gondar) 
Admitted neonates in university NICU; infants aged 0–28 d Within 28 d 1085
(Derivation: 812
Validation: 246) 
Mortality in NICU Mediratta Neonatal Mortality Score Admission level of consciousness, admission respiratory distress, GA, and birth weight 4-sign weighted score 
Singhi 1995 India
(Chandigarh) 
Pediatric emergency of Nehru Hospital, infants aged 0–60 d Within 60 d of life 116 Serious illness, bacteremia, or death
(culture-
confirmed) 
Singhi Score Consciousness, feeding, hydration, color, consolability, facial expression 6-sign weighted score 
Russell 2023 Bangladesh, China, India, Thailand, Vietnam, Kenya, South Africa, Uganda, Italy, Greece, Brazil Admitted infants treated with antibiotics for new episode of sepsis at 19 hospital sites (secondary and tertiary referral hospital) in 11 countries; infants aged 0–60 d Within 28 d after enrollment 3204
(Derivation: 2726
Validation: 478) 
Sepsis death (culture-
confirmed) 
NeoSep Severity Score Birth weight, GA, hospitalization duration, congenital anomalies, level of respiratory support, clinical signs (abnormal temp, abdominal distension, lethargy/reduced movement, difficulty feeding, evidence of shock) 10-sign weighted score 
NeoSep Recovery Score Cyanosis, level or respiratory support, clinical signs (abnormal temp, abdominal distension, lethargy/reduced movement, difficulty feeding, evidence of shock) 7-sign time-varying (daily) weighted score 
Aluvaala 2021 Kenya
(Nairobi) 
Neonatal unit admissions to a large urban maternity hospital; infants aged 0–28 d Within hospital
neonatal unit stay (most occurred in the first week of life) 
7054
(Derivation: 5427
Validation:
1627) 
All-cause mortality SENSS Score Difficulty feeding, convulsions, indrawing, central cyanosis, and floppy/inability to suck, as assessed at admission, birth weight (category), sex 7-sign regression formula 
Tuti 2022 Kenya Newborns admitted to NBUs in 16 hospitals; infants aged 0–28 d Within hospital neonatal unit stay 53 909 All-cause mortality SENSS Score Difficulty feeding, convulsions, indrawing, central cyanosis, and floppy/inability to suck, as assessed at admission, birth weight (category), sex 7-sign regression formula 

NBUs, newborn units; NR, not reported;  RR, respiratory rate; T, Temperature.

*

Historical sign ascertained by maternal report

FIGURE 2

ROB Bar Charts (Quality Assessment of Prognostic Accuracy Studies), n = 11.

FIGURE 2

ROB Bar Charts (Quality Assessment of Prognostic Accuracy Studies), n = 11.

Close modal

Summary ROB assessments are shown in Fig 2, with individual study ROB assessments shown in the Supplemental Fig S1. Among the 11 studies, 6 studies had no serious ROB,20,21,24,27–29 4 had serious ROB,22,23,25,26 and 1 study had very serious ROB.19 In the participant selection domain, 1 study had serious ROB because of the case-control design.25 Three studies had serious ROB for the index test assessment, primarily because of the failure to prespecify the index test threshold.19,22,26 For the outcome and mortality assessment, 1 study had uncertain ROB, with the remaining studies having low risk.19 For participant flow and timing, most studies were low-risk, with 1 study having uncertain risk due to loss to follow-up.27 For the analysis domain, 1 study had high ROB due to failure to analyze the full cohort,19 and 1 study was high-risk because of missing data.23 For model applicability across domains, all studies had low ROB.

Three nonhospital, community-based studies (Bang 2005, Darmstadt 2011, Khan 2020; n = 30 418) validated 13 checklists to screen for infants with the presence of any 1 or 2 of a range of signs that were equally weighted.19–21 Checklists included 4 to 11 signs, most commonly temperature, respiratory and feeding status, and level of consciousness. The Khan 4-sign checklist also included a risk factor of non-cephalic presentation. The details of individual clinical signs used in each checklist are provided in Table 1.

In a field trial in Gadchiroli, India (Society for Education Action and Research in Community Health [SEARCH]), Bang et al derived and internally validated several checklist-based algorithms to predict sepsis deaths among neonates aged 0 to 28 days (see Table 1 for signs included in each checklist).19 SEARCH checklist 1, which required any 2 of 7 signs, had a sensitivity of 98% (95% confidence interval [CI]: 88% to 100%) and specificity of 94% (95% CI: 93% to 95%) among 3567 neonates.19 The remaining algorithms were developed in a subset of 2804 newborns in a different time period (April 1996–October 1999); the sensitivity and specificity for SEARCH checklist 2 was 81% (95% CI: 58% to 95%) and 96% (95% CI: 95% to 96%), for checklist 3 was 86% (95% CI: 64% to 97%) and 95% (95% CI: 94% to 95%), for checklist 4 was 91% (95% CI: 70% to 99%) and 95% (95% CI: 94% to 95%), for checklist 5 was 95% (95% CI: 76% to 100%) and 94% (95% CI: 93% to 95%), and for checklist 6 was 95% (95% CI: 76% to 100%) and 94% (95% CI: 94% to 95%), respectively. All SEARCH algorithms reported in Bang et al had very low COE for both sensitivity and specificity (Table 2).19 

TABLE 2

Evidence Summary

Author (y) of StudyAlgorithmInfant Age (d)OutcomeN Participants in AnalysisPrevalence of MortalitySensitivity (95% CI)Specificity (95% CI)AUCCOE SensitivityCOE SpecificityCertainty
of Evidence
AUC
Nonhospital algorithms 
Bang 2005 SEARCH Checklist 1 0–28 Sepsis Death 3567 0.01 0.98 (0.88–1.00) 0.94 (0.93–0.95) — ⨁◯◯◯
Very low 
⨁◯◯◯
Very low 
— 
SEARCH Checklist 2 2804 0.00 0.81 (0.58–0.95) 0.96 (0.95–0.96) — ⨁◯◯◯
Very low 
⨁◯◯◯
Very low 
— 
SEARCH Checklist 3 0.00 0.86 (0.64–0.97) 0.95 (0.94–0.95) — ⨁◯◯◯
Very low 
⨁◯◯◯
Very low 
— 
SEARCH Checklist 4 0.00 0.91 (0.70–0.99) 0.95 (0.94–0.95) — ⨁◯◯◯
Very low 
⨁◯◯◯
Very low 
— 
SEARCH Checklist 5 0.00 0.95 (0.76–1.00) 0.94 (0.93–0.95) — ⨁◯◯◯
Very low 
⨁◯◯◯
Very low 
— 
SEARCH Checklist 6 0.00 0.95 (0.76–1.00) 0.94 (0.94–0.95) — ⨁◯◯◯
Very low 
⨁◯◯◯
Very low 
— 
Darmstadt 2011 Projahnmo Revised 0–9 Death 6924 0.06 0.58 (0.46–0.70) 0.95 (0.94–0.95) — ⨁⨁◯◯
Low 
⨁⨁⨁◯
Moderate 
— 
Projahnmo Modification F 0.07 0.58 (0.46–0.70) 0.93 (0.93–0.94) — ⨁◯◯◯
Very low 
⨁⨁⨁◯
Moderate 
— 
YIS-2 0.07 0.57 (0.44–0.68) 0.93 (0.92–0.94) — ⨁◯◯◯
Very low 
⨁⨁⨁◯
Moderate 
— 
YIS-2 Modification Z 0.07 0.58 (0.46–0.70) 0.93 (0.93–0.94) — ⨁◯◯◯
Very low 
⨁⨁⨁◯
Moderate 
— 
SEARCH Checklist 1 0.00 0.03 (0.00–0.10) 0.99 (0.99–0.99) — ⨁⨁◯◯
Low 
⨁⨁⨁◯
Moderate 
— 
SEARCH Checklist 1 Modification 0.02 0.16 (0.08–0.27) 0.98 (0.98–0.98) — ⨁⨁◯◯
Low 
⨁⨁⨁◯
Moderate 
— 
Khan 2020 Khan Checklist 2–28 Death 4983* 0.01* 0.62* (CI NR) 0.60* (CI NR) — ⨁⨁◯◯
Low 
⨁⨁◯◯
Low 
— 
Khan Model 1 — — 0.80* (0.73–0.87) — — ⨁⨁◯◯
Low 
Khan Model 2 — — 0.74* (0.66–0.81) — — ⨁⨁◯◯
Low 
Hospital algorithms 
Hailemeskel 2022 Hailemeskel Nomogram 0–3 Death 456 0.29 0.77* (0.69–0.84) 0.95* (0.93–0.98) 0.93* (0.90–0.95) ⨁◯◯◯
Very low 
⨁⨁◯◯
Low 
* 
Lee 2001 TRIPS-I Score “Neonatal” (Days NR) 7-d 608* — — — 0.83* (CI NR) — — ⨁◯◯◯
Very low 
Total NICU Death — — 0.76* (CI NR) — — ⨁◯◯◯
Very low 
TRIPS-I Modification 7-d — — 0.91* (CI NR) — — ⨁◯◯◯
Very low 
Total NICU Death — — 0.85* (CI NR) — — ⨁◯◯◯
Very low 
Lee 2013 TRIPS-II Score “Neonatal” (d NR) 7-d 5692* — — — 0.90* (CI NR) — — ⨁⨁◯◯
Low 
Total NICU Death — — 0.87* (CI NR) — — ⨁⨁◯◯
Low 
TRIPS-II Modification 7-d — — 0.91* (CI NR) — — ⨁⨁◯◯
Low 
Total NICU Death — — 0.90* (CI NR) — — ⨁⨁◯◯
Low 
Mediratta 2020 Mediratta Neonatal Mortality Score 0–28 Death 812 0.26 0.81 (CI NR) 0.80 (CI NR) — ⨁◯◯◯
Very low 
⨁◯◯◯
Very low 
— 
246* 0.50* — — 0.85* (0.80–0.89) — — ⨁◯◯◯
Very low 
Singhi 1995 Singhi Score 0–60 Death 116 0.09 0.80 (CI NR) 0.89 (CI NR) — ⨁◯◯◯
Very low 
⨁◯◯◯
Very low 
— 
Russell 2023 NeoSep Severity Score 0–59 Sepsis Death 478* 0.11* — — 0.76* (0.69–0.82) — — ⨁⨁◯◯
Low 
NeoSep Recovery Score 0.87* (0.60–0.98) 0.76* (0.71–0.79) 0.85* (0.78–0.93) ⨁◯◯◯
Very low 
⨁⨁⨁◯
Moderate 
⨁⨁◯◯
Low 
Aluvaala 2021 SENSS Score “Neonatal” (d NR) Death 1627* 0.01* — — 0.89* (0.84–0.93) — — ⨁⨁⨁◯
Moderate 
Tuti 2022 SENSS Score “Neonatal” (d NR) Death 53 909* 0.14* — — 0.83** (0.83–0.84) — — ⨁⨁⨁◯
Moderate 
Author (y) of StudyAlgorithmInfant Age (d)OutcomeN Participants in AnalysisPrevalence of MortalitySensitivity (95% CI)Specificity (95% CI)AUCCOE SensitivityCOE SpecificityCertainty
of Evidence
AUC
Nonhospital algorithms 
Bang 2005 SEARCH Checklist 1 0–28 Sepsis Death 3567 0.01 0.98 (0.88–1.00) 0.94 (0.93–0.95) — ⨁◯◯◯
Very low 
⨁◯◯◯
Very low 
— 
SEARCH Checklist 2 2804 0.00 0.81 (0.58–0.95) 0.96 (0.95–0.96) — ⨁◯◯◯
Very low 
⨁◯◯◯
Very low 
— 
SEARCH Checklist 3 0.00 0.86 (0.64–0.97) 0.95 (0.94–0.95) — ⨁◯◯◯
Very low 
⨁◯◯◯
Very low 
— 
SEARCH Checklist 4 0.00 0.91 (0.70–0.99) 0.95 (0.94–0.95) — ⨁◯◯◯
Very low 
⨁◯◯◯
Very low 
— 
SEARCH Checklist 5 0.00 0.95 (0.76–1.00) 0.94 (0.93–0.95) — ⨁◯◯◯
Very low 
⨁◯◯◯
Very low 
— 
SEARCH Checklist 6 0.00 0.95 (0.76–1.00) 0.94 (0.94–0.95) — ⨁◯◯◯
Very low 
⨁◯◯◯
Very low 
— 
Darmstadt 2011 Projahnmo Revised 0–9 Death 6924 0.06 0.58 (0.46–0.70) 0.95 (0.94–0.95) — ⨁⨁◯◯
Low 
⨁⨁⨁◯
Moderate 
— 
Projahnmo Modification F 0.07 0.58 (0.46–0.70) 0.93 (0.93–0.94) — ⨁◯◯◯
Very low 
⨁⨁⨁◯
Moderate 
— 
YIS-2 0.07 0.57 (0.44–0.68) 0.93 (0.92–0.94) — ⨁◯◯◯
Very low 
⨁⨁⨁◯
Moderate 
— 
YIS-2 Modification Z 0.07 0.58 (0.46–0.70) 0.93 (0.93–0.94) — ⨁◯◯◯
Very low 
⨁⨁⨁◯
Moderate 
— 
SEARCH Checklist 1 0.00 0.03 (0.00–0.10) 0.99 (0.99–0.99) — ⨁⨁◯◯
Low 
⨁⨁⨁◯
Moderate 
— 
SEARCH Checklist 1 Modification 0.02 0.16 (0.08–0.27) 0.98 (0.98–0.98) — ⨁⨁◯◯
Low 
⨁⨁⨁◯
Moderate 
— 
Khan 2020 Khan Checklist 2–28 Death 4983* 0.01* 0.62* (CI NR) 0.60* (CI NR) — ⨁⨁◯◯
Low 
⨁⨁◯◯
Low 
— 
Khan Model 1 — — 0.80* (0.73–0.87) — — ⨁⨁◯◯
Low 
Khan Model 2 — — 0.74* (0.66–0.81) — — ⨁⨁◯◯
Low 
Hospital algorithms 
Hailemeskel 2022 Hailemeskel Nomogram 0–3 Death 456 0.29 0.77* (0.69–0.84) 0.95* (0.93–0.98) 0.93* (0.90–0.95) ⨁◯◯◯
Very low 
⨁⨁◯◯
Low 
* 
Lee 2001 TRIPS-I Score “Neonatal” (Days NR) 7-d 608* — — — 0.83* (CI NR) — — ⨁◯◯◯
Very low 
Total NICU Death — — 0.76* (CI NR) — — ⨁◯◯◯
Very low 
TRIPS-I Modification 7-d — — 0.91* (CI NR) — — ⨁◯◯◯
Very low 
Total NICU Death — — 0.85* (CI NR) — — ⨁◯◯◯
Very low 
Lee 2013 TRIPS-II Score “Neonatal” (d NR) 7-d 5692* — — — 0.90* (CI NR) — — ⨁⨁◯◯
Low 
Total NICU Death — — 0.87* (CI NR) — — ⨁⨁◯◯
Low 
TRIPS-II Modification 7-d — — 0.91* (CI NR) — — ⨁⨁◯◯
Low 
Total NICU Death — — 0.90* (CI NR) — — ⨁⨁◯◯
Low 
Mediratta 2020 Mediratta Neonatal Mortality Score 0–28 Death 812 0.26 0.81 (CI NR) 0.80 (CI NR) — ⨁◯◯◯
Very low 
⨁◯◯◯
Very low 
— 
246* 0.50* — — 0.85* (0.80–0.89) — — ⨁◯◯◯
Very low 
Singhi 1995 Singhi Score 0–60 Death 116 0.09 0.80 (CI NR) 0.89 (CI NR) — ⨁◯◯◯
Very low 
⨁◯◯◯
Very low 
— 
Russell 2023 NeoSep Severity Score 0–59 Sepsis Death 478* 0.11* — — 0.76* (0.69–0.82) — — ⨁⨁◯◯
Low 
NeoSep Recovery Score 0.87* (0.60–0.98) 0.76* (0.71–0.79) 0.85* (0.78–0.93) ⨁◯◯◯
Very low 
⨁⨁⨁◯
Moderate 
⨁⨁◯◯
Low 
Aluvaala 2021 SENSS Score “Neonatal” (d NR) Death 1627* 0.01* — — 0.89* (0.84–0.93) — — ⨁⨁⨁◯
Moderate 
Tuti 2022 SENSS Score “Neonatal” (d NR) Death 53 909* 0.14* — — 0.83** (0.83–0.84) — — ⨁⨁⨁◯
Moderate 

—/NR, not reported.

*

Internal validation statistics (N, prevalence, sensitivity, specificity, AUC) given when available.

**

External validation statistics (N, prevalence, sensitivity, specificity, AUC).

Darmstadt et al validated several previously published checklists, including the SEARCH algorithms, and derived new algorithms in rural Mirzapur, Bangladesh for predicting all-cause neonatal deaths within the first 10 days of life on 6924 neonates (Table 1).20 The Projahnmo revised 11-sign checklist had a sensitivity of 58% (95% CI: 46% to 70%, low COE) and a specificity of 95% (95% CI: 94% to 95%, moderate COE) for predicting death.20 A modified 6-sign version of Projahnmo (Modification F) had the same sensitivity of 58% (95% CI: 46% to 70%, very low COE) and specificity of 93% (95% CI: 93% to 94%, moderate COE).20 The Young Infants Signs-2 (YIS-2), a 7-sign checklist, had a sensitivity of 57% (95% CI: 44% to 68%, very low COE) and a specificity of 93% (95% CI: 92% to 94%, moderate COE).20 The modification of the YIS-2 (Modification Z) that modified temperature thresholds and included jaundice yielded a sensitivity of 58% (95% CI: 46% to 70%, very low COE) and specificity of 93% (95% CI: 93% to 94%, moderate COE).20 The SEARCH checklist 1 was validated in this study in Bangladesh and had a much lower sensitivity of 3% (95% CI: 0% to 10%, low COE) compared with the original study in India, as well as a specificity of 99% (95% CI: 99% to 99%, moderate COE) for identifying 10-day all-cause mortality.20 Darmstadt et al also applied a modification of the original SEARCH checklist 1 by requiring only 1 of 7 signs and found a sensitivity of 16% (95% CI: 8% to 27%, low COE) and specificity of 98% (95% CI: 98% to 98%, moderate COE; Table 2).20 

In another nonhospital study in Gaibanda, Bangladesh, Khan et al developed and internally validated a 4-sign checklist using data from 14 944 infants 2 to 28 days from a cluster-randomized trial in Bangladesh (n = 4983 in validation sample; Table 1).21 The checklist-based algorithm was associated with a sensitivity of 62% (CI: not reported) and specificity of 60% (CI: not reported) and had low COE for both sensitivity and specificity (Table 2).21 

Nomograms

In a hospital-based study in Gondar, Ethiopia, Hailemeskel et al developed a 5-sign weighted-score nomogram in a prospective cohort of 456 preterm neonates aged 0 to 3 days in public hospitals in Ethiopia (Tables 1 and 3).22 The model was internally validated and had a sensitivity of 77% (95% CI: 69% to 84% very low COE) and specificity of 95% (95% CI: 93% to 98%, low COE) for preterm mortality, with excellent discrimination (AUC 0.93, 95% CI: 0.90–0.95, low COE; Table 2).22 

TABLE 3

Characteristics of the Models Included in the Systematic Review

Author, y, Model NameEPV or EPPSelection (N) of
Candidate and Final Predictors
N (%) and Handling of Missing DataCalibration Measures (Calibration Plot, Slope, Hosmer–Lemeshow)Discrimination Measures (C-statistic, D-statistic, AUC, logrank)Diagnostic Accuracy (Sensitivity, Specificity, PPV, NPV)Overall Measures (R2)Type of Validation and MethodsRegression Equation
Hailemeskel 2022 Hailemeskel Nomogram for clinical risk prediction NR Candidate (5): Variable selection in multivariable model made using least absolute shrinkage; Final (5): Lasso regression N (%): NR; Method: NR Calibrated P = 0.52 AUROC = 0.93 (0.90–0.9) Se = 0.77; Sp = 0.95; False-positive rate = 0.04; Prediction accuracy = 0.90 NR Internal: validated using 2000 bootstrap replicates; External: NR Linear predictor of the equation model = −1.54 + 2.66 ∗GA <32 wk + 0.77 ∗multiple pregnancy 4.4; ∗has no RDS, + 2.1; ∗has not gotten KMC, + 1.52; ∗low birth wt 
Lee 2001 TRIPS-I & TRIPS-I modification NR Candidate (4): Based on univariable associations; Final (4): Significance of coefficients and clinical judgment, stepwise selection 226 (9); Method: NR Calibration of TRIPS for prediction of 7-d mortality (against validation cohort of pretransport data):
Hosmer–Lemeshow χ2 = 3.54 (df = 4), (P = .47) 
Mortality within 7d of NICU admission: ROC (P): TRIPS-I = 0.83 (0.47); TRIPS-I mod. = 0.91 (0.67); Total NICU mortality:
ROC (P): TRIPS-I = 0.76 (0.22); TRIPS-I mod = 0.85 (0.70) 
NR for models NR Internal: Random split data; External: None β-coefficients were used to wt each variable’s impact, with the sum of these weighted scores constituting the TRIPS Score; Temperature (°C): <36.1 or >37.6 = 8; 36.1–36.5 or 37.2–37.6 = 1; 36.6–37.1 = 0; Respiratory status: Severe (apnea, gasping, intubated) = 14; Moderate (RR >60/min &/or SpO2 <85) = 5; None (RR <60/min & SpO2 >85) = 0; Systolic BP (mm Hg): <20 = 26; 20–40 = 16; >40 = 0; Response to noxious stimuli: None, seizure, muscle relaxant= 17; Lethargic response, no cry = 6; Withdraws vigorously, cries = 0 
Lee 2013 TRIPS-II & TRIPS-II modification NR Candidate (4): Based on univariable associations; Final (4): Significance of coefficients and clinical judgment, stepwise selection 42; Method: NR Hosmer–Lemeshow χ2 = 8.3 (df = 5) (P = .14); TRIPS-II predicted mortality well across the full range of GA groups and TRIPS-II scores. Overall, the total NICU mortality rate increases as TRIPS-II increases Mortality within 7 d of NICU admission: ROC (P): TRIPS = 0.90 (0.30); TRIPS-II mod = 0.91 (0.19); Total NICU Mortality: ROC (P): TRIPS = 0.87 (0.33); TRIPS-II mod = 0.90 (0.54) NR models NR Internal: Random split data; External: None β-coefficients were used to wt each variable’s impact, with the sum of these weighted scores constituting the TRIPS-II score; Temperature (°C): <36.1 or >37.6 = 5; 36.1–37.6 = 0; Respiratory status: Severe = 23; Moderate or none = 0; Systolic BP (mm Hg): <30 = 13; 30–40 = 8; >40 = 0; Response to noxious stimuli: None, seizure, muscle relaxant= 13; Lethargic response = 5; Withdraws vigorously, cries = 0 
Mediratta, 2020 Mediratta Neonatal Mortality Score 10 Candidate: (12)
Based on univariable associations; Final (4): Stepwise selection 
N (%): NR; Method: variables missing ≥15% of data excluded; Single imputation Derivation data set: Calibration slope= 0.84; Hosmer–Lemeshow = 16.5 (P = .09); Validation data set: calibration slope = 0.85; Hosmer–Lemeshow = 17.0 (P = .07) Derivation AUC: 0.88 (0.85–0.91); Validation AUC: 0.85 (0.80–0.89) Se = 0.81; Sp = 0.80; PPV = 0.58; NPV = 0.83 NR Internal: Bootstrap; Temporal: Different time period at the same hospital Each variable in model assigned point value from 0 to 16 based on β coefficients in the multivariate model. Cutoff value for the score corresponding to 50% probability of mortality was 12 
Singhi 1995 Singhi Score NR Candidate (11): Pearson's correlation; Final (6): Value of abnormal findings graded on a 3-point scale (0, 1, and 2) in the ascending order of severity (multiple stepwise regression) N (%): NR; Method: NR NR NR An impaired consciousness level with a total score of >7 had Se = 0.80; Sp = 0.89; PPV = 0.52; NPV = 0.97 in derivation (validation NR) Consciousness = 0.19; Feeding = 0.20; Hydration = 0.22; Color = 0.23; Consolability = 0.24; Facial expression = 0.26 NR NR 
Russell 2023 NeoSep Severity Score NeoSep Recovery Score NR NeoSep severity: Candidate predictors (10), Final model (10), backward elimination (exit P = .05); NeoSep Recovery: Candidate predictors (7), Final model (7) Forward selection (entry P = .05) N (%): NR
Method: Factors with missing values >10% excluded 
NeoSep Hosmer–Lemeshow P = .53 NeoSep Severity Derivation:
C-statistic = 0.77 (0.75–0.80); Validation: C-statistic = 0.76 (0.69–0.82); NeoSep Recovery
Derivation: AUROC = 0.82 (0.78–0.85); Validation: AUROC = 0.85 (0.78–0.93) 
NeoSep Severity: a score ≥5 at baseline was associated with 28-d mortality >10% (exact NR); NeoSep Recovery: Score >4; Se = 0.87 (0.60–0.98); Sp = 0.76 (0.71–0.79) in validation NR 15% randomly selected sample per site was reserved for model validation NeoSep Severity: a points-based risk score, in which each predictor of death is assigned a number of points developed from model coefficients; NeoSep Recovery: a points-based risk score derived similarly to the baseline severity score 
Khan 2020 Regression formulas (Model 1 & 2) NR Candidate (16): Based on univariable associations; Final (4): Statistical significance 638 (3.2)
Method: NR 
Derivation Hosmer–Lemeshow:
Khan Model 1 = 6.28 (P = .62); Khan Model 2 = 10.25 (P = .25); Validation Hosmer–Lemeshow: Khan Model 1 = 6.56 (P = .77); Khan Model 2 = 10.12 (P = .43) 
Derivation AUC: Khan Model 1 = 0.80 (0.76–0.84); Khan Model 2 = 0.75 (0.71–0.80); Validation AUC:
Khan Model 1 = 0.80 (0.73–0.87); Khan Model 2 = 0.74 (0.66–0.81) 
NR for models NR Internal: Random split data; External: None Khan Model 1: 1 + e 7.787 − 4.853 (birth weight ≤1.5 kg) −1.904 (birth weight >1.5 kg) −0.097 (GA) + 0.426 (lethargy) + 0.326 (cyanosis) + 0.72 (non-cephalic presentation) + 0.947 (poor suckling); Khan Model 2: e2.707 − 0.203 (GA) + 0.668 (lethargy) + 0.904 (non-cephalic presentation) + +1.381 (poor suckling) 
Aluvaala 2021 SENSS Score Derivation EPV = 45 (445/10); Validation: 151 deaths and 1476 nonevents Candidate (11): Selected on the basis of availability in clinical practice; Final (7): Logistic regression with no variable selection Derivation: 0.2% to 16%; Validation: 0.1% to 14%; Method: Missing at random Multiple imputation (chained equation approach) Calibration intercept = −0.33 (−0.56 to −0.11) Derivation C-statistic = 0.91 (0.89 to 0.93); External validation C-statistic = 0.89 (0.84 to 0.93) NR NR Internal: bootstrapping; Temporal: applying the model coefficients obtained at derivation to temporal data Linear predictor = −3.8583 + 5.7580 ∗ ELBW + 3.7082+ VLBW + 0.9232 ∗ LBW − 0.4918 ∗ Macrosomia − 0.1336 ∗ Male + 1.3596 ∗ Difficulty feeding + 1.3977 ∗ Convulsion + 1.9790 ∗ Indrawing + 0.9584 ∗ Cyanosis+ 1.6266 ∗ Floppy unable to suck 
Tuti 2022 SENSS Score 10 predictor parameters Candidate (7): N/A; Final (7): Existing SENSS model Predictor missingness ranged from 1.19% to 14.63%; Method: Multiple imputation After model updating, SENSS calibration intercept improved to 0.35 (0.32–0.38); Calibration slope improved to 1.029 (1.01–1.05); Brier score = 0.09 (0.08–0.10) The C-statistic (discrimination) = 0.83 (0.83–0.84) NR 0.453 External: validation on separate Kenyan cohort Linear predictor = −3.8583 + 5.7580 * ELBW + 3.7082 * VLBW + 0.9232 * LBW − 0.4918 * macrosomia − 0.1336 * Male + 1.3596 * difficulty feeding + 1.3977 * convulsion + 1.9790 * indrawing + 0.9584 * cyanosis + 1.6266 * floppy unable to suck 
Author, y, Model NameEPV or EPPSelection (N) of
Candidate and Final Predictors
N (%) and Handling of Missing DataCalibration Measures (Calibration Plot, Slope, Hosmer–Lemeshow)Discrimination Measures (C-statistic, D-statistic, AUC, logrank)Diagnostic Accuracy (Sensitivity, Specificity, PPV, NPV)Overall Measures (R2)Type of Validation and MethodsRegression Equation
Hailemeskel 2022 Hailemeskel Nomogram for clinical risk prediction NR Candidate (5): Variable selection in multivariable model made using least absolute shrinkage; Final (5): Lasso regression N (%): NR; Method: NR Calibrated P = 0.52 AUROC = 0.93 (0.90–0.9) Se = 0.77; Sp = 0.95; False-positive rate = 0.04; Prediction accuracy = 0.90 NR Internal: validated using 2000 bootstrap replicates; External: NR Linear predictor of the equation model = −1.54 + 2.66 ∗GA <32 wk + 0.77 ∗multiple pregnancy 4.4; ∗has no RDS, + 2.1; ∗has not gotten KMC, + 1.52; ∗low birth wt 
Lee 2001 TRIPS-I & TRIPS-I modification NR Candidate (4): Based on univariable associations; Final (4): Significance of coefficients and clinical judgment, stepwise selection 226 (9); Method: NR Calibration of TRIPS for prediction of 7-d mortality (against validation cohort of pretransport data):
Hosmer–Lemeshow χ2 = 3.54 (df = 4), (P = .47) 
Mortality within 7d of NICU admission: ROC (P): TRIPS-I = 0.83 (0.47); TRIPS-I mod. = 0.91 (0.67); Total NICU mortality:
ROC (P): TRIPS-I = 0.76 (0.22); TRIPS-I mod = 0.85 (0.70) 
NR for models NR Internal: Random split data; External: None β-coefficients were used to wt each variable’s impact, with the sum of these weighted scores constituting the TRIPS Score; Temperature (°C): <36.1 or >37.6 = 8; 36.1–36.5 or 37.2–37.6 = 1; 36.6–37.1 = 0; Respiratory status: Severe (apnea, gasping, intubated) = 14; Moderate (RR >60/min &/or SpO2 <85) = 5; None (RR <60/min & SpO2 >85) = 0; Systolic BP (mm Hg): <20 = 26; 20–40 = 16; >40 = 0; Response to noxious stimuli: None, seizure, muscle relaxant= 17; Lethargic response, no cry = 6; Withdraws vigorously, cries = 0 
Lee 2013 TRIPS-II & TRIPS-II modification NR Candidate (4): Based on univariable associations; Final (4): Significance of coefficients and clinical judgment, stepwise selection 42; Method: NR Hosmer–Lemeshow χ2 = 8.3 (df = 5) (P = .14); TRIPS-II predicted mortality well across the full range of GA groups and TRIPS-II scores. Overall, the total NICU mortality rate increases as TRIPS-II increases Mortality within 7 d of NICU admission: ROC (P): TRIPS = 0.90 (0.30); TRIPS-II mod = 0.91 (0.19); Total NICU Mortality: ROC (P): TRIPS = 0.87 (0.33); TRIPS-II mod = 0.90 (0.54) NR models NR Internal: Random split data; External: None β-coefficients were used to wt each variable’s impact, with the sum of these weighted scores constituting the TRIPS-II score; Temperature (°C): <36.1 or >37.6 = 5; 36.1–37.6 = 0; Respiratory status: Severe = 23; Moderate or none = 0; Systolic BP (mm Hg): <30 = 13; 30–40 = 8; >40 = 0; Response to noxious stimuli: None, seizure, muscle relaxant= 13; Lethargic response = 5; Withdraws vigorously, cries = 0 
Mediratta, 2020 Mediratta Neonatal Mortality Score 10 Candidate: (12)
Based on univariable associations; Final (4): Stepwise selection 
N (%): NR; Method: variables missing ≥15% of data excluded; Single imputation Derivation data set: Calibration slope= 0.84; Hosmer–Lemeshow = 16.5 (P = .09); Validation data set: calibration slope = 0.85; Hosmer–Lemeshow = 17.0 (P = .07) Derivation AUC: 0.88 (0.85–0.91); Validation AUC: 0.85 (0.80–0.89) Se = 0.81; Sp = 0.80; PPV = 0.58; NPV = 0.83 NR Internal: Bootstrap; Temporal: Different time period at the same hospital Each variable in model assigned point value from 0 to 16 based on β coefficients in the multivariate model. Cutoff value for the score corresponding to 50% probability of mortality was 12 
Singhi 1995 Singhi Score NR Candidate (11): Pearson's correlation; Final (6): Value of abnormal findings graded on a 3-point scale (0, 1, and 2) in the ascending order of severity (multiple stepwise regression) N (%): NR; Method: NR NR NR An impaired consciousness level with a total score of >7 had Se = 0.80; Sp = 0.89; PPV = 0.52; NPV = 0.97 in derivation (validation NR) Consciousness = 0.19; Feeding = 0.20; Hydration = 0.22; Color = 0.23; Consolability = 0.24; Facial expression = 0.26 NR NR 
Russell 2023 NeoSep Severity Score NeoSep Recovery Score NR NeoSep severity: Candidate predictors (10), Final model (10), backward elimination (exit P = .05); NeoSep Recovery: Candidate predictors (7), Final model (7) Forward selection (entry P = .05) N (%): NR
Method: Factors with missing values >10% excluded 
NeoSep Hosmer–Lemeshow P = .53 NeoSep Severity Derivation:
C-statistic = 0.77 (0.75–0.80); Validation: C-statistic = 0.76 (0.69–0.82); NeoSep Recovery
Derivation: AUROC = 0.82 (0.78–0.85); Validation: AUROC = 0.85 (0.78–0.93) 
NeoSep Severity: a score ≥5 at baseline was associated with 28-d mortality >10% (exact NR); NeoSep Recovery: Score >4; Se = 0.87 (0.60–0.98); Sp = 0.76 (0.71–0.79) in validation NR 15% randomly selected sample per site was reserved for model validation NeoSep Severity: a points-based risk score, in which each predictor of death is assigned a number of points developed from model coefficients; NeoSep Recovery: a points-based risk score derived similarly to the baseline severity score 
Khan 2020 Regression formulas (Model 1 & 2) NR Candidate (16): Based on univariable associations; Final (4): Statistical significance 638 (3.2)
Method: NR 
Derivation Hosmer–Lemeshow:
Khan Model 1 = 6.28 (P = .62); Khan Model 2 = 10.25 (P = .25); Validation Hosmer–Lemeshow: Khan Model 1 = 6.56 (P = .77); Khan Model 2 = 10.12 (P = .43) 
Derivation AUC: Khan Model 1 = 0.80 (0.76–0.84); Khan Model 2 = 0.75 (0.71–0.80); Validation AUC:
Khan Model 1 = 0.80 (0.73–0.87); Khan Model 2 = 0.74 (0.66–0.81) 
NR for models NR Internal: Random split data; External: None Khan Model 1: 1 + e 7.787 − 4.853 (birth weight ≤1.5 kg) −1.904 (birth weight >1.5 kg) −0.097 (GA) + 0.426 (lethargy) + 0.326 (cyanosis) + 0.72 (non-cephalic presentation) + 0.947 (poor suckling); Khan Model 2: e2.707 − 0.203 (GA) + 0.668 (lethargy) + 0.904 (non-cephalic presentation) + +1.381 (poor suckling) 
Aluvaala 2021 SENSS Score Derivation EPV = 45 (445/10); Validation: 151 deaths and 1476 nonevents Candidate (11): Selected on the basis of availability in clinical practice; Final (7): Logistic regression with no variable selection Derivation: 0.2% to 16%; Validation: 0.1% to 14%; Method: Missing at random Multiple imputation (chained equation approach) Calibration intercept = −0.33 (−0.56 to −0.11) Derivation C-statistic = 0.91 (0.89 to 0.93); External validation C-statistic = 0.89 (0.84 to 0.93) NR NR Internal: bootstrapping; Temporal: applying the model coefficients obtained at derivation to temporal data Linear predictor = −3.8583 + 5.7580 ∗ ELBW + 3.7082+ VLBW + 0.9232 ∗ LBW − 0.4918 ∗ Macrosomia − 0.1336 ∗ Male + 1.3596 ∗ Difficulty feeding + 1.3977 ∗ Convulsion + 1.9790 ∗ Indrawing + 0.9584 ∗ Cyanosis+ 1.6266 ∗ Floppy unable to suck 
Tuti 2022 SENSS Score 10 predictor parameters Candidate (7): N/A; Final (7): Existing SENSS model Predictor missingness ranged from 1.19% to 14.63%; Method: Multiple imputation After model updating, SENSS calibration intercept improved to 0.35 (0.32–0.38); Calibration slope improved to 1.029 (1.01–1.05); Brier score = 0.09 (0.08–0.10) The C-statistic (discrimination) = 0.83 (0.83–0.84) NR 0.453 External: validation on separate Kenyan cohort Linear predictor = −3.8583 + 5.7580 * ELBW + 3.7082 * VLBW + 0.9232 * LBW − 0.4918 * macrosomia − 0.1336 * Male + 1.3596 * difficulty feeding + 1.3977 * convulsion + 1.9790 * indrawing + 0.9584 * cyanosis + 1.6266 * floppy unable to suck 

AUROC, area under the receiver operating characteristic curve; BP, blood pressure; df, degrees of freedom; ELBW, extremely low birthweight; EPV, events per variable; EPP, events per parameter; KMC, kangaroo mother care; LBW, low birthweight; Mod., modification; NPV, negative predictive value; NR, not reported; PPV, positive predictive value; R2, coefficient of determination; RDS, respiratory distress syndrome; ROC, receiver operating characteristic; RR, respiratory rate; Se, sensitivity; Sp, specificity; SpO2, saturation of peripheral oxygen; VLBW, very low birthweight.

Weighted Scores and Score Charts

All the studies that used weighted scores or score charts were hospital-based.23–27 Lee et al developed and internally validated the Transport Risk Index of Physiologic Stability (TRIPS) score among 1723 (608 in the internal validation sample) outborn newborns at 8 tertiary-level Canadian neonatal intensive care units (NICU; Tables 1 and 3). The TRIPS score predicted mortality within 7 days of NICU admission, with an AUC of 0.83 (CI: not reported; Table 2).23 A TRIPS score modification with additional variables, including small for GA, 5-minute Apgar score <7, and cesarean delivery, increased the AUC to 0.91 (CI: not reported).23 The TRIPS score and the TRIPS modification predicted total NICU mortality with AUCs of 0.76 (CI: not reported) and 0.85 (CI: not reported), respectively. All algorithms in Lee et al had very low COE.

Subsequently, Lee et al developed TRIPS version II (TRIPS-II) to predict 7-day NICU mortality in a larger sample of 17 075 newborns (Tables 1 and 3).24 The TRIPS-II model had an AUC of 0.90 (CI: not reported) in the validation sample of 5692 infants (Table 2).24 Combining TRIPS-II with GA and other variables like small for GA, 5-minute Apgar score <7, and cesarean delivery did not substantially improve discrimination (AUC 0.91 [CI: not reported]).24 The TRIPS-II score and the TRIPS-II modification predicted total NICU mortality with AUCs of 0.87 (CI: not reported) and 0.90 (CI: not reported), respectively. All algorithms in Lee et al had low certainty.

Mediratta et al derived and validated a 4-sign weighted Neonatal Mortality score for death during NICU admission in a retrospective case-control study in 1085 infants in Ethiopia (Tables 1 and 3).25 This score had sensitivity of 81% (CI: not reported) and specificity of 80% (CI: not reported) in the derivation sample of 812 infants. The Neonatal Mortality score had very low COE for both sensitivity and specificity (Table 2).25 In a temporal validation data set comprising a cohort of 246 infants from the same hospital in a different time period, the AUC was 0.85 (95% CI: 0.80–0.89, very low COE).25 

Singhi et al developed and internally validated a 6-sign score among 116 infants aged 0 to 28 days presenting at pediatric emergency in India, predicting death due to serious illness (Tables 1 and 3).26 The authors of the study reported a sensitivity of 80% (CI: not reported) and specificity of 89% (CI: not reported) to classify mortality as a result of serious illness.26 The Singhi score had a very low COE for both sensitivity and specificity (Table 2).

Russell et al developed and validated the NeoSep Severity and NeoSep Recovery scores to predict neonatal mortality in a cohort of 3204 infants aged <60 days with clinical sepsis from 19 hospitals in 11 countries (Asia, Africa, Europe and South America; Tables 1 and 3).27 The NeoSep Severity score is a 10-sign weighted score that had an AUC of 0.76 (95% CI 0.69–0.82, low COE) to predict neonatal mortality in the internal validation sample of 478 infants (Table 2).27 Using the 7-sign time-varying (daily) weighted NeoSep Recovery, a score of ≥4 had a sensitivity of 87% (95% CI: 60% to 98%, very low COE) and specificity of 76% (95% CI: 71% to 79% moderate COE) and a good discriminatory ability (AUC 0.85, 95% CI: 0.78–0.93, low COE) to predict neonatal mortality in the internal validation sample of 478 infants27 (Table 2).

Regression Formulas

Khan et al also developed and validated a 6-sign prediction model and a 4-sign prediction model in the format of logistic regression formulas in a sample of 14 944 infants (4983 in the internal validation sample; Tables 1 and 3).21 The 6-sign model incorporated birth weight, GA, lethargy, cyanosis, non-cephalic presentation, and trouble suckling, and it demonstrated good discriminatory ability for predicting neonatal death with AUC of 0.80 (95% CI: 0.73–0.87) in the validation cohort.21 A more simplified version of the equation in the same study excluding birth weight and cyanosis had fair discrimination at AUC 0.74 (95% CI: 0.66–0.81) in the validation set.21 Both Khan regression formula models had low COE.

Aluvaala et al derived and externally validated the Score for Essential Neonatal Symptoms and Signs (SENSS) in a large maternity hospital in Nairobi, Kenya on a sample of 7054 neonates (1627 infants in the internal validation sample; Tables 1 and 3).28 The score is a 7-sign multivariable regression formula and the AUC for temporal internal validation was 0.89 (95% CI: 0.84–0.93, moderate COE; Table 2).28 

Tuti et al externally validated and updated the SENSS score to predict all-cause in-hospital neonatal mortality among 53 909 infants in a large multicountry study using retrospectively collected routine clinical data from 16 hospitals in Kenya (Tables 1 and 3).29 The score had a AUC of 0.83 (95% CI: 0.83–0.84, moderate COE; Table 2).29 The calibration of the original SENSS model was poor, reflected by the calibration intercept and slope as reported by the authors.29 

Early and accurate identification of infants at the highest risk of mortality is the critical first step required to deliver evidence-based interventions to avert death. In this systematic review, we identified 11 studies in which the authors reported on 26 clinical sign algorithms to identify young infants at risk for mortality between 0 and 59 days of life. Algorithm formats ranged from simple checklists, most often used at the community level, to regression formulas used in neonatal intensive care settings. The algorithms included 4 to 11 signs, including GA, birth weight, temperature abnormality, feeding difficulty, level of consciousness and respiratory distress. Overall, all studies were of very low to moderate COE, and only 2 algorithms were externally validated in 4 studies.19,20,28,29 

An adaptation of the maternal “Three Delays Model”30 outlines key time points at which timely interventions are critical to reduce neonatal and infant morbidity and mortality as follows: (1) the recognition of danger signs and decision to seek care, (2) reaching an appropriate source of care, and (3) obtaining adequate and appropriate treatment. The algorithms included in this review can be implemented at these different stages of the Three Delays Model continuum, from home to transport to hospital to inpatient care.31 During home visits or at primary health facilities, identifying high-mortality risk infants may allow for interventions including urgent referral to hospital or initiating empirical antibiotics to cover possible sepsis. We identified 13 checklists including signs and symptoms feasible for frontline health workers that were developed and validated at the community level to identify infants at high-mortality risk during home visits.19,20 These checklists tended to rely on signs ascertained by history and postnatal physical examination and did not include birth history, risk factors, or measures such as birth weight or GA. The SEARCH algorithms had high sensitivity and specificity for predicting sepsis-specific death, as identified by a neonatologist in the original study in which it was derived and internally validated. However, in the external validation cohort of Darmstadt et al in Bangladesh, the sensitivity was substantially lower. This marked difference in performance may have been due to the different outcome (ie, all-cause mortality as opposed to sepsis-specific mortality in the original Bang et al study), different age group (0–10 days in Darmstadt et al versus 0–28 days in Bang et al), different setting, different population and population-to-health worker ratio, and different epidemiological characteristics among the study neonates. Among the other community level sign-based checklists, none had adequate sensitivity (≥80%), but all had high specificity (>90%) for predicting mortality.20,21 The high specificity suggests that young infants who survive will commonly have a negative test result based on the checklists’ criteria and will be correctly identified as surviving in nonhospital settings. However, the low sensitivity suggests that the checklists may fail to identify a large number of infants who die. With the inclusion of birth weight and GA, the regression formulas developed in the Khan study for community-level use had better performance.21 Small size at birth (preterm, low birth weight, or small for GA) contributes to half of neonatal deaths globally.32 In a machine learning model recently developed by the Global Network, birth weight was the strongest predictor of neonatal mortality,33 although this study was excluded from the current review because the model did not include postnatal clinical signs. In LMICs, birth weight and GA are often unavailable when antenatal care is limited, GA is unknown, and many deliveries occur at home. Alternative methods of clinically estimating GA using anthropometric, physical, and neuromuscular signs may allow for more feasible and accurate GA estimation in low-resource settings.34 

The identification of high-mortality risk among outborn infants transported to NICUs may be useful to prepare interventions and personnel resources at the NICU while the infants are en route. The TRIPS score consisting of 4 physical examination signs is also feasible in the nonhospital setting before accessing an appropriate source of care. The addition of perinatal risk factors (GA, 5-min Apgar score, and cesarean delivery) increased the TRIPS algorithm’s discriminatory value. These risk factors may therefore be important predictors of mortality in outborn infants who are in the process of being transported to appropriate places of care.

The hospital-based algorithms contained certain signs more applicable to the hospital or NICU settings, including respiratory status and support, vital signs, including temperature and blood pressure, and, in some cases, kangaroo mother care and evidence of shock. At least 1 of the prediction models developed in each of these LMIC studies demonstrated good discrimination with 1 prediction model (Hailemeskel score), demonstrating excellent discrimination. The SENSS score was externally validated and had good discriminatory value with moderate COE. Thus, using only 4 to 10 clinical signs without the support of laboratory investigations in resource-limited settings, algorithms still achieved good to excellent discriminatory value in predicting young infants at risk for future death. The hospital- and NICU-based infant clinical sign algorithms may therefore hold promise for rapid bedside identification of infants at high mortality risk in low-resource settings. Once identified, timely hospital interventions may be implemented, including the rewarming of hypothermic infants, intravenous hydration, septic workups, timely antibiotic administration, and escalation in respiratory support when available. Algorithms may also help prioritize resource allocation to infants at highest risk of death.

After an infant is admitted to the hospital or NICU, their clinical status may continue to change on a daily basis. The NeoSep Recovery score was a time-varying prediction model allowing for an infant’s risk estimate to be updated as new information becomes available.27 This time-varying model demonstrated higher predictive accuracy compared with the baseline model (NeoSep Severity score).27 During hospitalization, time-varying models may better reflect evolving patient clinical trajectories and dynamic decision-making in clinical practice.35,36 

In LMICs, the presentation format, practical application, and feasibility of use of regression formulas are an important consideration. Weighted scores or score charts and nomograms are simpler ways of applying prediction models than regression formulas or equations. Although both methodologies are designed to predict outcomes on the basis of regression formulas using multiple variables, weighted scores offer a more accessible alternative by allowing health care providers to compute results through basic tabulation or summation of scores through a chart or simple calculation because variables are assigned numeric values. In contrast, regression formulas, which may require more complex calculation, necessitating digital tools such as a web app or online calculator, may be prone to error if individually calculated. This requirement often poses a challenge in LMIC settings because of limited digital infrastructure, often unreliable internet connection, and power outages. Consequently, despite the potential for marginally superior accuracy with regression formulas, the operational feasibility of weighted scores may make them a pragmatic choice in LMIC contexts.

Previous systematic reviews of prediction models for infant mortality have included laboratory tests as predictors (eg, blood gas, hematologic parameters, etc).37–39 In 2011, Medlock et al identified 41 prediction model development studies for prediction of mortality in very premature infants with fair to excellent discriminatory ability (AUCs ranging from 0.70 to 0.96).37 Our review excluded laboratory tests because the focus was on infant signs alone and to inform WHO guidelines. However, the AUC range across studies included in our review was similar to this previous review (ie, 0.76 to 0.93). This suggests that the discriminatory ability of prediction models that rely solely on infant clinical signs may not be inferior to models that include laboratory tests, and these clinical sign-based models have the advantage of being more feasible in different levels of the health system or community in LMIC settings.

There were several limitations of the current evidence, particularly the considerable heterogeneity and lack of external validation of the included algorithms and prediction models. Robust external validation is needed before the widespread use of such algorithms and scores. However, conducting external validation of prediction models is challenging in low-resource settings with limited data availability, particularly of input covariates such as GA or birth weight. Improving data collection in low-resource settings and external validation of existing algorithms should be prioritized and will aid in implementing high-performing models into clinical practice. The COE of algorithms where GRADE was performed was very low to moderate.

Algorithms leveraging infant clinical signs have demonstrated fair to excellent discriminatory value to predict young infant mortality in a range of settings, including LMICs. Risk prediction is instrumental in the early identification of critically ill infants, thereby expediting the initiation of targeted therapeutic interventions and the appropriate allocation of scarce resources, which is pivotal for young infant survival. Limited external validation impedes the translation of these algorithms into practical and feasible clinical decision tools. Improving data collection and management in low-resource settings may allow for the external validation of well-performing young infant mortality prediction algorithms.

This work is dedicated to our colleague and dear friend Rebecca E. Rosenberg (1977–2023), who passed away during the study and made critical contributions to data extraction and synthesis from the initial study conception. Becca’s wit and humor made us laugh at every meeting, and her own research and contributions to newborn health worldwide will be everlasting. Further, Yasir Shafiq has joined this research work under the framework of the International PhD in Global Health, Humanitarian Aid, and Disaster Medicine jointly organized by Università del Piemonte Orientale (UPO).

Mr Shafiq and Dr Fung conceptualized and designed the study, designed the data collection instruments, screened studies, collected data, conducted data analysis, and drafted the initial manuscript; Ms Driker screened studies, collected data, and conducted the data analysis; Dr Rosenberg screened studies, collected data, and extracted data; Ms Hussaini and Ms Adnan screened studies and collected data; Drs Rees and Mediratta screened studies, collected data, and assisted with interpretation of the results; Ms Wade designed the search strategies and conducted the searches across all databases; Dr Chou provided inputs on the methodology and presentation of the results; Dr Edmond conceptualized the study and provided inputs on the presentation of the results; Dr North conceptualized and designed the study and interpreted results; Dr Lee conceptualized and designed the study, conducted data extraction, collected data, and interpreted the results; and all authors reviewed and revised the manuscript, approved the final manuscript as submitted, and agreed to be accountable for all aspects of the work.

This trial has been registered at www.crd.york.ac.uk/prospero (identifier CRD42023431387).

FUNDING: Brigham and Women’s Hospital received funding from the World Health Organization (WHO) to complete this work. The sponsor commissioned the review for the guideline development group meeting for the development of WHO recommendations on the management of serious bacterial infection in young infants aged 0 to 59 days. The sponsor provided inputs on the presentation of the results and manuscript.

CONFLICT OF INTEREST DISCLOSURES: Karen Edmond is an employee of the sponsor, the WHO. Roger Chou is the GRADE methodologist for the WHO guidelines for the management of severe bacterial infections in infants aged 0 to 59 days. The remaining authors have indicated they have no potential conflicts of interest relevant to this article to disclose.

AUC

area under the curve

CI

confidence interval

COE

certainty of evidence

GA

gestational age

GRADE

Grading of Recommendations Assessment Development and Evaluation

HICs

high-income countries

LMIC

low- and middle-income country

ROB

risk of bias

SEARCH

Society for Education Action and Research in Community Health

SENSS

Score for Essential Neonatal Symptoms and Signs

TRIPS

Transport Risk Index of Physiologic Stability

TRIPS-II

TRIPS version II

WHO

World Health Organization

YIS-2

Young Infants Signs-2

1
United Nations Children’s Fund
. UNICEF and partners in the UN Inter-Agency Group for Child Mortality Estimation. Levels and trends in Child Mortality Report 2022: estimates developed by the UN Inter-Agency Group for Child Mortality Estimation. Available at: https://data.unicef.org/resources/levels-and-trends-in-child-mortality/. Accessed December 15, 2023
2
Sharrow
D
,
Hug
L
,
You
D
, et al
.;
UN Inter-agency Group for Child Mortality Estimation and its Technical Advisory Group
.
Global, regional, and national trends in under-5 mortality between 1990 and 2019 with scenario-based projections until 2030: a systematic analysis by the UN Inter-agency Group for Child Mortality Estimation
.
Lancet Glob Health
.
2022
;
10
(
2
):
e195
e206
3
Lawn
JE
,
Bhutta
ZA
,
Ezeaka
C
,
Saugstad
O
.
Ending preventable neonatal deaths: multicountry evidence to inform accelerated progress to the sustainable development goal by 2030
.
Neonatology
.
2023
;
120
(
4
):
491
499
4
Bhutta
ZA
,
Das
JK
,
Bahl
R
, et al
.;
Lancet Newborn Interventions Review Group
;
Lancet Every Newborn Study Group
.
Can available interventions end preventable deaths in mothers, newborn babies, and stillbirths, and at what cost?
Lancet
.
2014
;
384
(
9940
):
347
370
5
World Health Organization
. Child health and development | Strategy (IMCI). Available at: https://www.emro.who.int/child-health/imci-strategy/integrated-management-childhood-illness.html. Accessed December 15, 2023
6
Mansoor
KP
,
Ravikiran
SR
,
Kulkarni
V
, et al
.
Modified sick neonatal score (MSNS): a novel neonatal disease severity scoring system for resource-limited settings
.
Crit Care Res Pract
.
2019
;
2019
:
9059073
7
Richardson
DK
,
Corcoran
JD
,
Escobar
GJ
,
Lee
SK
.
SNAP-II and SNAPPE-II: Simplified newborn illness severity and mortality risk scores
.
J Pediatr
.
2001
;
138
(
1
):
92
100
8
Lah Tomulic
K
,
Mestrovic
J
,
Zuvic
M
, et al
.
Neonatal risk mortality scores as predictors for health-related quality of life of infants treated in NICU: a prospective cross-sectional study
.
Qual Life Res
.
2017
;
26
(
5
):
1361
1369
9
Page
MJ
,
McKenzie
JE
,
Bossuyt
PM
, et al
.
The PRISMA 2020 statement: an updated guideline for reporting systematic reviews
.
Int J Surg
.
2021
;
88
:
105906
10
Collins
GS
,
Reitsma
JB
,
Altman
DG
,
Moons
KG
;
TRIPOD Group
.
Transparent reporting of a multivariable prediction model for individual prognosis or diagnosis (TRIPOD): the TRIPOD statement
.
Circulation
.
2015
;
131
(
2
):
211
219
11
Moons
KG
,
de Groot
JA
,
Bouwmeester
W
, et al
.
Critical appraisal and data extraction for systematic reviews of prediction modelling studies: the CHARMS checklist
.
PLoS Med
.
2014
;
11
(
10
):
e1001744
12
Whiting
PF
,
Rutjes
AW
,
Westwood
ME
, et al
.;
QUADAS-2 Group
.
QUADAS-2: a revised tool for the quality assessment of diagnostic accuracy studies
.
Ann Intern Med
.
2011
;
155
(
8
):
529
536
13
Debray
TP
,
Damen
JA
,
Snell
KI
, et al
.
A guide to systematic review and meta-analysis of prediction model performance
.
BMJ
.
2017
;
356
:
i6460
14
Collins
GS
,
Dhiman
P
,
Ma
J
, et al
.
Evaluation of clinical prediction models (part 1): from development to external validation
.
BMJ
.
2024
;
384
:
e074819
15
de Hond
AAH
,
Steyerberg
EW
,
van Calster
B
.
Interpreting area under the receiver operating characteristic curve
.
Lancet Digit Health
.
2022
;
4
(
12
):
e853
e855
16
Lee
J
,
Mulder
F
,
Leeflang
M
, et al
.
QUAPAS: an adaptation of the QUADAS-2 tool to assess prognostic accuracy studies
.
Ann Intern Med
.
2022
;
175
(
7
):
1010
1018
17
Schünemann
HJ
,
Mustafa
RA
,
Brozek
J
, et al
.;
GRADE Working Group
.
GRADE guidelines: 21 part 2. Test accuracy: inconsistency, imprecision, publication bias, and other domains for rating the certainty of evidence and presenting it in evidence profiles and summary of findings tables
.
J Clin Epidemiol
.
2020
;
122
:
142
152
18
Schünemann
HJ
,
Oxman
AD
,
Brozek
J
, et al
.;
GRADE Working Group
.
Grading quality of evidence and strength of recommendations for diagnostic tests and strategies
.
BMJ
.
2008
;
336
(
7653
):
1106
1110
19
Bang
AT
,
Bang
RA
,
Reddy
MH
, et al
.
Simple clinical criteria to identify sepsis or pneumonia in neonates in the community needing treatment or referral
.
Pediatr Infect Dis J
.
2005
;
24
(
4
):
335
341
20
Darmstadt
GL
,
Baqui
AH
,
Choi
Y
, et al
.;
Bangladesh Projahnmo-2 (Mirzapur) Study
.
Validation of a clinical algorithm to identify neonates with severe illness during routine household visits in rural Bangladesh
.
Arch Dis Child
.
2011
;
96
(
12
):
1140
1146
21
Khan
FA
,
Mullany
LC
,
Wu
LF
, et al
.
Predictors of neonatal mortality: development and validation of prognostic models using prospective data from rural Bangladesh
.
BMJ Glob Health
.
2020
;
5
(
1
):
e001983
22
Hailemeskel
HS
,
Tiruneh
SA
.
Development of a nomogram for clinical risk prediction of preterm neonate death in Ethiopia
.
Front Pediatr
.
2022
;
10
:
877200
23
Lee
SK
,
Zupancic
JA
,
Pendray
M
, et al
.;
Canadian Neonatal Network
.
Transport risk index of physiologic stability: a practical system for assessing infant transport care
.
J Pediatr
.
2001
;
139
(
2
):
220
226
24
Lee
SK
,
Aziz
K
,
Dunn
M
, et al
.
Transport Risk Index of Physiologic Stability, version II (TRIPS-II): a simple and practical neonatal illness severity score
.
Am J Perinatol
.
2013
;
30
(
5
):
395
400
25
Mediratta
RP
,
Amare
AT
,
Behl
R
, et al
.
Derivation and validation of a prognostic score for neonatal mortality in Ethiopia: a case-control study
.
BMC Pediatr
.
2020
;
20
(
1
):
238
26
Singhi
S
,
Chaudhuri
M
.
Functional and behavioral responses as marker of illness, and outcome in infants under 2 months
.
Indian Pediatr
.
1995
;
32
(
7
):
763
771
27
Russell
NJ
,
Stöhr
W
,
Plakkal
N
, et al
.
Patterns of antibiotic use, pathogens, and prediction of mortality in hospitalized neonates and young infants with sepsis: a global neonatal sepsis observational cohort study (NeoOBS)
.
PLoS Med
.
2023
;
20
(
6
):
e1004179
28
Aluvaala
J
,
Collins
G
,
Maina
B
, et al
.
Prediction modelling of inpatient neonatal mortality in high-mortality settings
.
Arch Dis Child
.
2020
;
106
(
5
):
449
454
29
Tuti
T
,
Collins
G
,
English
M
,
Aluvaala
J
;
Clinical Information Network
.
External validation of inpatient neonatal mortality prediction models in high-mortality settings
.
BMC Med
.
2022
;
20
(
1
):
236
30
Thaddeus
S
,
Maine
D
.
Too far to walk: maternal mortality in context
.
Soc Sci Med
.
1994
;
38
(
8
):
1091
1110
31
Save the Children
. Applying the three delays model: improving access to care for newborns with danger signs. Available at: https://www.google.com/url?q=https://www.healthynewbornnetwork.org/hnn-content/uploads/Applying-the-three-delays-model_Final.pdf&sa=D&source=docs&ust=1709338008424478&usg=AOvVaw3cJ16Ni-LNq7TcFEN0IiaE. Accessed March 1, 2024
32
Lawn
JE
,
Ohuma
EO
,
Bradley
E
, et al
.;
Lancet Small Vulnerable Newborn Steering Committee
;
WHO/UNICEF Preterm Birth Estimates Group
;
National Vulnerable Newborn Measurement Group
;
Subnational Vulnerable Newborn Measurement Group
.
Small babies, big risks: global estimates of prevalence and mortality for vulnerable newborns to accelerate change and improve counting
.
Lancet
.
2023
;
401
(
10389
):
1707
1719
33
Shukla
VV
,
Eggleston
B
,
Ambalavanan
N
, et al
.
Predictive modeling for perinatal mortality in resource-limited settings
.
JAMA Netw Open
.
2020
;
3
(
11
):
e2026750
34
Alliance for Maternal and Newborn Health Improvement (AMANHI) Gestational Age Study Group
;
Alliance for Maternal and Newborn Health Improvement (AMANHI) GA Study Group
.
Simplified models to assess newborn gestational age in low-middle income countries: findings from a multicountry, prospective cohort study
.
BMJ Glob Health
.
2021
;
6
(
9
):
e005688
35
Wen
B
,
Brals
D
,
Bourdon
C
, et al
.
Predicting the risk of mortality during hospitalization in sick severely malnourished children using daily evaluation of key clinical warning signs
.
BMC Med
.
2021
;
19
(
1
):
222
36
Plate
JDJ
,
van de Leur
RR
,
Leenen
LPH
, et al
.
Incorporating repeated measurements into prediction models in the critical care setting: a framework, systematic review and meta-analysis
.
BMC Med Res Methodol
.
2019
;
19
(
1
):
199
37
Medlock
S
,
Ravelli
AC
,
Tamminga
P
, et al
.
Prediction of mortality in very premature infants: a systematic review of prediction models
.
PLoS One
.
2011
;
6
(
9
):
e23441
38
Zeng
Z
,
Shi
Z
,
Li
X
.
Comparing different scoring systems for predicting mortality risk in preterm infants: a systematic review and network meta-analysis
.
Front Pediatr
.
2023
;
11
:
1287774
39
Mangold
C
,
Zoretic
S
,
Thallapureddy
K
, et al
.
Machine learning models for predicting neonatal mortality: a systematic review
.
Neonatology
.
2021
;
118
(
4
):
394
405
This is an open access article distributed under the terms of the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License (https://creativecommons.org/licenses/by-nc-nd/4.0/), which permits noncommercial distribution and reproduction in any medium, provided the original author and source are credited.

Supplementary data