Video Abstract

Video Abstract

Close modal
CONTEXT:

Prediction models can be a valuable tool in performing risk assessment of mortality in preterm infants.

OBJECTIVE:

Summarizing prognostic models for predicting mortality in very preterm infants and assessing their quality.

DATA SOURCES:

Medline was searched for all articles (up to June 2020).

STUDY SELECTION:

All developed or externally validated prognostic models for mortality prediction in liveborn infants born <32 weeks’ gestation and/or <1500 g birth weight were included.

DATA EXTRACTION:

Data were extracted by 2 independent authors. Risk of bias (ROB) and applicability assessment was performed by 2 independent authors using Prediction model Risk of Bias Assessment Tool.

RESULTS:

One hundred forty-two models from 35 studies reporting on model development and 112 models from 33 studies reporting on external validation were included. ROB assessment revealed high ROB in the majority of the models, most often because of inadequate (reporting of) analysis. Internal and external validation was lacking in 41% and 96% of these models. Meta-analyses revealed an average C-statistic of 0.88 (95% confidence interval [CI]: 0.83–0.91) for the Clinical Risk Index for Babies score, 0.87 (95% CI: 0.81–0.92) for the Clinical Risk Index for Babies II score, and 0.86 (95% CI: 0.78–0.92) for the Score for Neonatal Acute Physiology Perinatal Extension II score.

LIMITATIONS:

Occasionally, an external validation study was included, but not the development study, because studies developed in the presurfactant era or general NICU population were excluded.

CONCLUSIONS:

Instead of developing additional mortality prediction models for preterm infants, the emphasis should be shifted toward external validation and consecutive adaption of the existing prediction models.

Very preterm birth (<32 completed weeks’ gestation) and very low birth weight (<1500 g) infants are associated with increased mortality and neonatal morbidity and as such are considered a major challenge in perinatal health care.1  Very preterm birth occurs in 1.3% of all live births in developed regions.2  Despite this low prevalence, complications associated with preterm birth are responsible for 35% of the world’s annual neonatal deaths.3  Accurate risk assessment of postnatal death in very preterm infants can help caregivers and parents to decide whether and when to intervene in a pregnancy or to adjust postnatal care.4  Prediction models can be a helpful tool in performing such risk assessment.5,6 

In a 2011 systematic review, Medlock et al7  reported on the availability of >40 prediction models to assess the risk of neonatal mortality in infants born very preterm. When the review was conducted, no standardized tool for risk of bias (ROB) assessment of prediction models was yet available. The recently published Prediction study Risk of Bias Assessment Tool (PROBAST)8,9  provides the opportunity to formally assess the ROB of newly published models as well as of the models that were identified by Medlock et al.7  Second, since its publication in 2011, the review has not been updated, whereas many development and validation studies of prognostic models for the prediction of mortality in preterm infants have been published. Third, Medlock et al7  excluded external validation studies, thereby limiting the information on external validity of identified models and the possibility to perform any quantitative analyses.

Therefore, the aim with this study was to update the systematic review of Medlock et al7  on prognostic models for predicting postnatal mortality in liveborn very preterm infants (PICOTS [population, index model, comparator model, outcome, timing, setting] framework presented in Supplemental Table 5) and to extend it by the addition of a ROB assessment using PROBAST, inclusion of studies externally validating existing models, and meta-analysis of model performance measures of the models most often validated.

To obtain an update of the systematic review by Medlock et al7  from 2011, the same search strategy was used. Medline was searched for all articles from May 2010 (last search date of Medlock et al7 ) up to June 2020 by using a search that followed the general form: “prediction model AND preterm AND infant AND mortality.” The detailed search strategy is shown in Supplemental Table 6. The current review is reported according to the Preferred Reporting Items for Systematic Reviews and Meta-analyses statement.10 

Inclusion and exclusion criteria were similar to the criteria used in Medlock et al.7  The criteria following the critical appraisal and data extraction for systematic reviews of prediction modelling studies (CHARMS) checklist11  are given in Table 1. In brief, all prognostic models that aim to predict mortality at any time point in infants born at <32 weeks and/or <1500 g birth weight were included. Studies were classified as pre- or postsurfactant on the basis of the authors’ report of surfactant use, whereafter presurfactant studies were excluded. In studies in which surfactant use was not reported, surfactant was assumed to be in routine use after 1990.

TABLE 1

Criteria to Guide the Literature Search Following the CHARMS Checklist

ItemCriteria
Prognostic versus diagnostic prediction model Prognostic prediction models 
Intended scope of the review Purpose of the included models is to predict the probability of mortality, rather than to investigate a single specific risk factor. 
Type of prediction modeling studies Prediction model development studies with or without external validation and external model validation studies, in which researchers report at least 1 measure of model performance on preterm infants 
Target population to whom the prediction model applies Population of liveborn infants or admitted infants born at <32 wk gestational age and/or <1500 g birth wt; inclusion of prognostic models for gestational age– or birth wt–specific population; inclusion of studies in which researchers used a slightly broader definition of VLGA/BW or ELGA/BW; exclusion of models derived for a subpopulation with a specific disease or condition (eg, NEC); exclusion of models for general NICU population, unless separately reported performance for VLGA/BW infants; exclusion of studies from the presurfactant era 
Outcome to be predicted Outcome that the model predicts is mortality or survival. Studies in which authors report on models developed for combined outcome measures (eg, morbidity and mortality) are excluded. 
Time span of prediction Mortality at any time point 
Intended moment of using the model Models using liveborn infants as well as admitted infants will be included, so intended moment of using the model depends on this population. 
ItemCriteria
Prognostic versus diagnostic prediction model Prognostic prediction models 
Intended scope of the review Purpose of the included models is to predict the probability of mortality, rather than to investigate a single specific risk factor. 
Type of prediction modeling studies Prediction model development studies with or without external validation and external model validation studies, in which researchers report at least 1 measure of model performance on preterm infants 
Target population to whom the prediction model applies Population of liveborn infants or admitted infants born at <32 wk gestational age and/or <1500 g birth wt; inclusion of prognostic models for gestational age– or birth wt–specific population; inclusion of studies in which researchers used a slightly broader definition of VLGA/BW or ELGA/BW; exclusion of models derived for a subpopulation with a specific disease or condition (eg, NEC); exclusion of models for general NICU population, unless separately reported performance for VLGA/BW infants; exclusion of studies from the presurfactant era 
Outcome to be predicted Outcome that the model predicts is mortality or survival. Studies in which authors report on models developed for combined outcome measures (eg, morbidity and mortality) are excluded. 
Time span of prediction Mortality at any time point 
Intended moment of using the model Models using liveborn infants as well as admitted infants will be included, so intended moment of using the model depends on this population. 

NEC, necrotizing enterocolitis.

Titles and abstracts were independently screened by 2 authors each (P.E.v.B., P.A., W.O., and E.S.) and included if considered relevant. Before full text screening, all studies included in Medlock et al7  were added. In addition, Medlock was contacted for external validation studies that were excluded from their 2011 review, and these were added as well. Subsequently, full texts of the selected articles were screened in duplicate for final inclusion by 2 authors (P.E.v.B., P.A., W.O., and E.S.). Likewise, data extraction and ROB assessment were conducted in duplicate (P.E.v.B., P.A., W.O., and E.S.). In case of discrepancies, a third reviewer was involved to establish consensus.

Eligible articles were categorized into 2 groups: development studies and external validation studies, with separate data extraction forms for each group. Relevant items were extracted from each selected article by using the domains described in the CHARMS checklist, which included information on population, candidate predictors (only for development studies), outcome to be predicted, model development (only for development studies), and model performance.11  If an article described the development or external validation of multiple (existing) models, separate data extraction for each model was conducted for each model. Additionally, a ROB and applicability assessment were performed by using PROBAST.8,9  PROBAST is organized into 4 domains (participants, predictors, outcome, and analysis) and contains a total of 20 signaling questions to facilitate structured judgment of ROB. Signaling questions are answered as “yes,” “probably yes,” “no,” “probably no,” or “no information.” A domain in which all signaling questions are answered as “yes” or “probably yes” should be judged as low ROB, whereas a “no” or “probably no” on 1 or more questions in a domain flags the potential for bias. Insufficient information on 1 or more questions might result in unclear ROB as well as in low or high ROB, depending on judgment of the reviewers. Applicability of the study to the review question is assessed for the 3 domains participants, predictors, and outcomes and is rated as low, high, or unclear, with low concern regarding applicability if the review question and the study are a good match. To achieve consistent data extraction and ROB assessment, the standardized data extraction forms were piloted, modified, and finalized after discussion with all authors. The full list of final extracted items is available on request.

Details of the protocol for this systematic review were registered on PROSPERO (IDCRD42019141434).12  In the protocol, prediction of mortality 1 year after birth was registered as the maximum time span of prediction, but in the final review, no such maximum was used, meaning all articles on mortality were included, regardless of the time point at which mortality was predicted, giving a comprehensive overview of all available models. Consequently, the current review provides a comprehensive list of prediction models for mortality. Furthermore, in the protocol, it was stated that the aim of the article was to give a narrative overview, but in the final review, a quantitative analysis was also added. During the selection process, it became apparent that certain models had been externally validated frequently, thereby allowing quantitative (meta-)analysis of model performance.

Results of development and external validation studies were summarized by using descriptive statistics. Prognostic models that were externally validated in at least 5 studies were analyzed quantitatively by using random effects meta-analyses. If researchers of a study performed multiple external validations of 1 model, the validation with characteristics most similar to the development study was used for meta-analysis. Furthermore, meta-analysis was performed in each subgroup separately, on the basis of whether the study population was extremely low gestational age or birth weight (ELGA/BW) (defined as a gestational age <28 weeks or birth weight <1000 g) or very low gestational age or birth weight (VLGA/BW) (applicable to all infants that were not ELGA/BW). Subgroup analysis was performed if at least 5 studies were included in a subgroup. If no C-statistic was reported despite presentation of the receiver operating characteristic curve, WebPlotDigitizer was used to reconstruct the curve and to calculate the area under the curve (ie, the C-statistic). Logit transformation for the C-statistics was used during meta-analyses to overcome the poor statistical properties of the normal distribution when the C-statistic was close to 0 or 1 or when sample sizes were relatively small.13  Between-study heterogeneity was quantified by using the I2 statistic.14,15  A rough guide to interpretation of the I2 statistic is as follows: 0% to 40% might not be important; 30% to 60% may represent moderate heterogeneity; 50% to 90% may represent substantial heterogeneity; 75% to 100% is considerable heterogeneity.16  Furthermore, 95% confidence intervals (CIs) were calculated to indicate the precision of the summary performance estimate, and 95% prediction intervals (PIs) were calculated to provide boundaries on the likely performance in future model validation studies that are comparable to the studies included in the meta-analysis and thus can be seen as an indication of model generalizability.17  In addition, we calculated the probability that the C-statistic of the validated models will be larger than 0.70 and 0.80 in future validation studies. All analyses were performed in R version 3.5.2.

The initial search yielded 2159 unique articles, as shown in the flowchart in Fig 1. After title and abstract screening, 61 articles were provisionally selected for full text screening. All 41 articles identified by Medlock et al plus an additional 18 articles reporting on the external validation of existing models were added for full text screening. Out of those 120 articles, 59 articles, including 29 articles from Medlock et al, met the inclusion criteria and were selected for data extraction. The 30 articles from Medlock were excluded because the article did not concern an individual prediction model (n = 7), the study was performed in the presurfactant era (n = 9), the population was not applicable to our research question (n = 8), the outcome was not applicable to our research question (n = 2), no full text was available (n = 3), or the article was written in a foreign language (n = 1). From the 35 studies reporting on model development, 142 unique models were identified (Table 2). In the 33 studies reporting on external validation, 112 models were validated (Table 3). Of these 33 studies reporting on external validation, 23 studies were used for meta-analysis of the Clinical Risk Index for Babies (CRIB) (n = 15), CRIB-II (n = 12), and Score for Neonatal Acute Physiology Perinatal Extension (SNAPPE) II (n = 6) scores.

FIGURE 1

Flowchart of the study selection process.

FIGURE 1

Flowchart of the study selection process.

Close modal
TABLE 2

Studies in Which Authors Report on Model Development

ArticleNo. ModelsDifferences Between Models Caused by Differences in the Following:Inclusion CriterionTiming of Death% Mortality
Pishevaret al, 202047  NA GA <27 Unclear 17 
Podda et al, 201848  Modeling method GA <30 and BW <1500 Discharge from the hospital 12 
Oltman et al, 201849  NA GA <26 7 d 20 
Beltempo et al, 201833  Timing of death; predictors GA <29 7 d (3); discharge from the hospital (3) 6/14 
Cnattingius et al, 201750  Predictors GA <31 28 d 
Koller-Smith et al, 201751  Inclusion criterion GA <32/BW <1500 Discharge from the hospital 9/7 
Steurer et al, 201752  Age at inclusion GA <28 1 y NR 
Sullivan et al, 201653  Predictors BW <1500 Discharge from the hospital 
Jeschke et al, 201654  NA BW <1500 180 d 11 
Rüdiger et al, 201555  Timing of death; predictors GA <32 28 d (3); discharge from the hospital (3) 11 
Vincer et al, 201456  NA GA <30 28 d 12 
Ravelli et al, 201457  NA GA <32 28 d 
Wu et al, 201458  Predictors BW <1500 7 d 10 
Manktelow et al, 201322  NA GA <32 Discharge from the hospital 
Dong et al, 201259  NA BW <1500 Discharge from the hospital 29 
Ambalavanan et al, 201234  Predictors BW <1000 Discharge from the hospital 6–34 
Lee et al, 201260  Timing of death; predictors GA <32 7 d (4); discharge from the hospital (4) NR 
Phillips et al, 201161  NA BW <1500 Discharge from the hospital 12 
Schenone et al, 201062  NA GA <26 and BW<1397 Discharge from the hospital 35 
Cole et al, 201063  Predictors GA <31 Term age 16–17 
Gargus et al, 200964  NA BW <1000 18–22 mo 34 
Forsblad et al, 200865  Inclusion criterion GA = 23/GA = 24 180 d 22 
Zupancic et al, 200766  Predictors BW <1500 Discharge from the hospital 19/14 
Forsblad et al, 200667  NA GA <25 180 d 22 
Evans et al, 200668  Age at inclusion GA <32 and BW <1500 Discharge from the hospital 
Marshall et al, 200569  NA BW <1500 Discharge from the hospital 27 
Locatelli et al, 200570  NA BW <750 120 d 49 
Ambalavanan et al, 200545  10 Age at inclusion; predictors BW <1000 Unclear NR 
Parry et al, 200319  NA GA <32 Discharge from the hospital NR 
Janota et al, 200171  Inclusion criterion; timing of death GA <31 and BW <1500 28 d (2); discharge from the hospital (2) 11/17 
Ambalavanan et al, 200172  20 Predictors; modeling method BW <1000 Discharge from the hospital 34 
Doyle et al, 200173  NA GA <27 5 y 33 
Pollack et al, 200074  10 Predictors BW <1500 Discharge from the hospital 14 
Draper et al, 199975  Inclusion criterion GA <32 Discharge from the hospital 20/9 
Zernikow et al, 199876  17 Predictors; modeling method GA <32 and BW <1500 28 d 
ArticleNo. ModelsDifferences Between Models Caused by Differences in the Following:Inclusion CriterionTiming of Death% Mortality
Pishevaret al, 202047  NA GA <27 Unclear 17 
Podda et al, 201848  Modeling method GA <30 and BW <1500 Discharge from the hospital 12 
Oltman et al, 201849  NA GA <26 7 d 20 
Beltempo et al, 201833  Timing of death; predictors GA <29 7 d (3); discharge from the hospital (3) 6/14 
Cnattingius et al, 201750  Predictors GA <31 28 d 
Koller-Smith et al, 201751  Inclusion criterion GA <32/BW <1500 Discharge from the hospital 9/7 
Steurer et al, 201752  Age at inclusion GA <28 1 y NR 
Sullivan et al, 201653  Predictors BW <1500 Discharge from the hospital 
Jeschke et al, 201654  NA BW <1500 180 d 11 
Rüdiger et al, 201555  Timing of death; predictors GA <32 28 d (3); discharge from the hospital (3) 11 
Vincer et al, 201456  NA GA <30 28 d 12 
Ravelli et al, 201457  NA GA <32 28 d 
Wu et al, 201458  Predictors BW <1500 7 d 10 
Manktelow et al, 201322  NA GA <32 Discharge from the hospital 
Dong et al, 201259  NA BW <1500 Discharge from the hospital 29 
Ambalavanan et al, 201234  Predictors BW <1000 Discharge from the hospital 6–34 
Lee et al, 201260  Timing of death; predictors GA <32 7 d (4); discharge from the hospital (4) NR 
Phillips et al, 201161  NA BW <1500 Discharge from the hospital 12 
Schenone et al, 201062  NA GA <26 and BW<1397 Discharge from the hospital 35 
Cole et al, 201063  Predictors GA <31 Term age 16–17 
Gargus et al, 200964  NA BW <1000 18–22 mo 34 
Forsblad et al, 200865  Inclusion criterion GA = 23/GA = 24 180 d 22 
Zupancic et al, 200766  Predictors BW <1500 Discharge from the hospital 19/14 
Forsblad et al, 200667  NA GA <25 180 d 22 
Evans et al, 200668  Age at inclusion GA <32 and BW <1500 Discharge from the hospital 
Marshall et al, 200569  NA BW <1500 Discharge from the hospital 27 
Locatelli et al, 200570  NA BW <750 120 d 49 
Ambalavanan et al, 200545  10 Age at inclusion; predictors BW <1000 Unclear NR 
Parry et al, 200319  NA GA <32 Discharge from the hospital NR 
Janota et al, 200171  Inclusion criterion; timing of death GA <31 and BW <1500 28 d (2); discharge from the hospital (2) 11/17 
Ambalavanan et al, 200172  20 Predictors; modeling method BW <1000 Discharge from the hospital 34 
Doyle et al, 200173  NA GA <27 5 y 33 
Pollack et al, 200074  10 Predictors BW <1500 Discharge from the hospital 14 
Draper et al, 199975  Inclusion criterion GA <32 Discharge from the hospital 20/9 
Zernikow et al, 199876  17 Predictors; modeling method GA <32 and BW <1500 28 d 

The number in parentheses in the column “Timing of death” represents the number of models with this timing of death. BW, birth weight; GA, gestational age; NA, not applicable; NR, not reported.

TABLE 3

Studies in Which Authors Report on External Validation

Name of ModelArticleNo. StudiesNo. ModelsC-Statistic Original ArticleC-Statistics External Validations, Range
CRIB International Neonatal Network18  15 (refs 19,22,61,74,7786,9416 0.90 Presented in meta-analysis (Fig 5A
CRIB-II Parry et al, 200319  12 (refs 22,58,61,7779,879218 0.92 Presented in meta-analysis (Fig 5B
SNAPPE-II Richardson et al, 200120  6 (refs 66,79,80,87,90,940.85 Presented in meta-analysis (Fig 5C
Apgar Apgar, 195321  5 (refs 50,55,88,93,9521 NA Presented in Fig 5D  
NICHD calculator Horbar et al, 199396  3 (refs 74,97,980.82 0.56–0.87 
SNAP-II Richardson et al, 200120  3 (refs 33,66,87NA 0.68–0.82 
SNAPPE Richardson et al, 199399  3 (refs 74,80,860.92 0.79–0.93 
SNAP Richardson et al, 1993100  1 (ref 86NA 0.82 
Other models Podda et al, 201848  1 (ref 480.91 0.77–0.91 
 Tyson et al, 2008101  2 (refs 48,1020.75 0.67–0.83 
 BW + GA 1 (ref 48NA 0.72–0.89 
 Manktelow et al, 201322  1 (ref 480.86 0.69–0.86 
 Zupancic et al, 200766  1 (ref 480.85 0.76–0.90 
 Gray et al, 1992103  1 (ref 58NA 0.91–0.96 
 Draper et al, 199975  1 (ref 4NA 0.82–0.92 
 Maier et al, 1997104  1 (ref 820.86 0.82 
Name of ModelArticleNo. StudiesNo. ModelsC-Statistic Original ArticleC-Statistics External Validations, Range
CRIB International Neonatal Network18  15 (refs 19,22,61,74,7786,9416 0.90 Presented in meta-analysis (Fig 5A
CRIB-II Parry et al, 200319  12 (refs 22,58,61,7779,879218 0.92 Presented in meta-analysis (Fig 5B
SNAPPE-II Richardson et al, 200120  6 (refs 66,79,80,87,90,940.85 Presented in meta-analysis (Fig 5C
Apgar Apgar, 195321  5 (refs 50,55,88,93,9521 NA Presented in Fig 5D  
NICHD calculator Horbar et al, 199396  3 (refs 74,97,980.82 0.56–0.87 
SNAP-II Richardson et al, 200120  3 (refs 33,66,87NA 0.68–0.82 
SNAPPE Richardson et al, 199399  3 (refs 74,80,860.92 0.79–0.93 
SNAP Richardson et al, 1993100  1 (ref 86NA 0.82 
Other models Podda et al, 201848  1 (ref 480.91 0.77–0.91 
 Tyson et al, 2008101  2 (refs 48,1020.75 0.67–0.83 
 BW + GA 1 (ref 48NA 0.72–0.89 
 Manktelow et al, 201322  1 (ref 480.86 0.69–0.86 
 Zupancic et al, 200766  1 (ref 480.85 0.76–0.90 
 Gray et al, 1992103  1 (ref 58NA 0.91–0.96 
 Draper et al, 199975  1 (ref 4NA 0.82–0.92 
 Maier et al, 1997104  1 (ref 820.86 0.82 

In total, within 33 studies, 112 external validations were performed. The column named “Article” reflects the original article in which the model was published. The number of studies reflect the number of articles that were published presenting external validation of the model. A study might perform external validation of >1 model; therefore, the column total of number of studies exceeds 33. The number of models reflects the number of external validations of the model, which might exceed the number of studies because of multiple external validations of a model in 1 study when the model was applied in for example different populations or with different time spans of the outcome. NA, not available; NICHD, Eunice Kennedy Shriver National Institute of Child Health and Human Development.

Table 4 shows key characteristics of the study design, sample size, predictors, outcome, modeling method, and predictive performance of the included model development studies. The majority of the included studies originated from registry or retrospective cohorts (n = 31, 88%). Of all 142 models, 60 (42%) models used birth weight as their inclusion criterion, 52 (37%) models used gestational age as their inclusion criterion, and 30 (21%) models used both birth weight and gestational age as inclusion criterion. The number of participants used for developing the models varied from 57 to 29 180 (median 828), and the number of events ranged between 16 and 4448 (median 171). The median mortality rate was 12%, with an interquartile range of 9% to 23%. The number of events per variable (EPV) could be calculated for 118 (83%) models, ranged from 0 to 426 (median 10), and was <10 for 52% of the models. Although the majority of prediction models were focused on mortality during hospital admission (n = 70, 49%) and within 28 days after birth (n = 31, 22%), 7 other outcome measures were identified, including mortality before term age, or within 7, 120, 180 days, 1 year, 18 to 22 months, and 5 years. The C-statistic varied from 0.70 to 0.95, with a similar range in subgroups for VLGA/BW and ELGA/BW. For 36 (25%) models, both discrimination and calibration were reported, with 10 (25%) models presenting calibration by using a calibration plot and the majority presenting the resulting P value of a Hosmer–Lemeshow test (n = 35, 88%). In total, 84 of the 142 models (59%) were internally validated, most often by using a random split of the data into development and validation data sets (n = 42, 50%) or cross-validation (n = 18, 21%). For 64 (45%) models, insufficient information was presented to allow calculation of individual risks.

TABLE 4

Characteristics of the Included Model Development Studies and External Validation Studies

Item and CategoriesDevelopment StudiesExternal Validation Studies
Per study n = 35 n = 33 
 Study design and study population   
  Years of publication, minimum–maximum 1998–2020 1994–2020 
  No. models per study 2 (1–6) 2 (1–3) 
  Data source   
   Registry 19 (54) 12 (36) 
   Retrospective cohort 12 (34) 10 (30) 
   Prospective cohort 2 (5.7) 9 (27) 
   Other 1 (2.9) 1 (3.0) 
   Unclear 1 (2.9) 1 (3.0) 
  Countrya   
   Europe 16 (46) 14 (42) 
   North America 16 (46) 6 (18) 
   Oceania 3 (8.6) 4 (12) 
   Asia 3 (8.6) 8 (24) 
   South America 1 (2.9) 2 (6.1) 
   Africa 0 (0.0) 1 (3.0) 
Per model n = 142 n = 112 
 Inclusion criteria   
  Birth wt only 60 (42) 40 (36) 
   ≤1000 g 36 (60) 6 (15) 
   ≤1500 g 24 (40) 34 (85) 
  Gestational age only 52 (37) 51 (46) 
   ≤28 wk 12 (23) 12 (24) 
   ≤32 wk 40 (77) 39 (76) 
  Birth wt and gestational age 30 (21) 21 (19) 
   ≤28 wk and ≤1000 g 1 (3.3) 7 (33) 
   ≤32 wk and ≤1500 g 29 (97) 14 (67) 
 Sample size   
  No. participants 828 (476–5808) 835 (232–3362) 
  No. events 171 (51–411) 66 (41–197) 
   Not reported 18 (13) 27 (24) 
  EPV 10 (4–64) NA 
   EPV <10 61 (52) NA 
   EPV 10–20 6 (4.2) NA 
   EPV >20 51 (43) NA 
   Not possible to calculate 24 (17) NA 
 Predictors   
  No. candidate predictors 12 (6–22) NA 
   Not reported 6 (3.9) NA 
  No. predictors in final model 7 (4–12) NA 
 Outcome   
  Mortality rate, % 12 (9–23) 12 (10–19) 
   Not reported 26 (18) 27 (24) 
  Time span of outcome   
   Discharge from the hospital 70 (49) 80 (71) 
   28 postnatal d 31 (22) 14 (13) 
   7 postnatal d 11 (7.7) 9 (8.0) 
   Term age 6 (4.2) 0 (0.0) 
   1 y of age 6 (4.1) 0 (0.0) 
   180 postnatal d 4 (2.8) 0 (0.0) 
   18–22 postnatal mo 1 (0.7) 0 (0.0) 
   120 postnatal d 1 (0.7) 0 (0.0) 
   5 y of age 1 (0.7) 0 (0.0) 
   2 y of age 0 (0.0) 1 (0.9) 
   2–3 y corrected age 0 (0.0) 5 (4.5) 
   Unclear 11 (7.7) 3 (2.7) 
 Modeling method and model presentation   
  Modeling method   
   Logistic regression 100 (70) NA 
   Neural networks 32 (23) NA 
   Other 4 (2.8) NA 
   Unclear 6 (4.2) NA 
  Model presentation   
   Final model presented, including intercept 45 (32) NA 
   Final model presented without intercept 17 (12) NA 
   Alternative presentation 16 (11) NA 
   Insufficient information to allow individual risk calculation 64 (45) NA 
 Predictive performance   
  Discrimination   
   C-statistic, range 0.70–0.95 0.56–0.97 
    ≤28 wk and ≤1000 g 0.71–0.89 0.56–0.95 
    ≤32 wk and ≤1500 g 0.70–0.95 0.67–0.97 
   Not reported 9 (6.3) 6 (5.4) 
  Calibrationa   
   Reported 40 (28) 25 (20) 
   Hosmer–Lemeshow 35 (88) 20 (80) 
   Calibration plot 10 (25) 6 (24) 
   Observed/expected ratio 2 (5.0) 0 (0.0) 
  Both discrimination and calibration reported 36 (25) 30 (25) 
 Internal validation   
  Internally validated models 84 (59) NA 
  Method of validationa   
   Random split of data 42 (50) NA 
   Cross-validation 18 (21) NA 
   Nonrandom split of data 20 (24) NA 
   Resampling 3 (3.6) NA 
   Other 2 (2.4) NA 
Item and CategoriesDevelopment StudiesExternal Validation Studies
Per study n = 35 n = 33 
 Study design and study population   
  Years of publication, minimum–maximum 1998–2020 1994–2020 
  No. models per study 2 (1–6) 2 (1–3) 
  Data source   
   Registry 19 (54) 12 (36) 
   Retrospective cohort 12 (34) 10 (30) 
   Prospective cohort 2 (5.7) 9 (27) 
   Other 1 (2.9) 1 (3.0) 
   Unclear 1 (2.9) 1 (3.0) 
  Countrya   
   Europe 16 (46) 14 (42) 
   North America 16 (46) 6 (18) 
   Oceania 3 (8.6) 4 (12) 
   Asia 3 (8.6) 8 (24) 
   South America 1 (2.9) 2 (6.1) 
   Africa 0 (0.0) 1 (3.0) 
Per model n = 142 n = 112 
 Inclusion criteria   
  Birth wt only 60 (42) 40 (36) 
   ≤1000 g 36 (60) 6 (15) 
   ≤1500 g 24 (40) 34 (85) 
  Gestational age only 52 (37) 51 (46) 
   ≤28 wk 12 (23) 12 (24) 
   ≤32 wk 40 (77) 39 (76) 
  Birth wt and gestational age 30 (21) 21 (19) 
   ≤28 wk and ≤1000 g 1 (3.3) 7 (33) 
   ≤32 wk and ≤1500 g 29 (97) 14 (67) 
 Sample size   
  No. participants 828 (476–5808) 835 (232–3362) 
  No. events 171 (51–411) 66 (41–197) 
   Not reported 18 (13) 27 (24) 
  EPV 10 (4–64) NA 
   EPV <10 61 (52) NA 
   EPV 10–20 6 (4.2) NA 
   EPV >20 51 (43) NA 
   Not possible to calculate 24 (17) NA 
 Predictors   
  No. candidate predictors 12 (6–22) NA 
   Not reported 6 (3.9) NA 
  No. predictors in final model 7 (4–12) NA 
 Outcome   
  Mortality rate, % 12 (9–23) 12 (10–19) 
   Not reported 26 (18) 27 (24) 
  Time span of outcome   
   Discharge from the hospital 70 (49) 80 (71) 
   28 postnatal d 31 (22) 14 (13) 
   7 postnatal d 11 (7.7) 9 (8.0) 
   Term age 6 (4.2) 0 (0.0) 
   1 y of age 6 (4.1) 0 (0.0) 
   180 postnatal d 4 (2.8) 0 (0.0) 
   18–22 postnatal mo 1 (0.7) 0 (0.0) 
   120 postnatal d 1 (0.7) 0 (0.0) 
   5 y of age 1 (0.7) 0 (0.0) 
   2 y of age 0 (0.0) 1 (0.9) 
   2–3 y corrected age 0 (0.0) 5 (4.5) 
   Unclear 11 (7.7) 3 (2.7) 
 Modeling method and model presentation   
  Modeling method   
   Logistic regression 100 (70) NA 
   Neural networks 32 (23) NA 
   Other 4 (2.8) NA 
   Unclear 6 (4.2) NA 
  Model presentation   
   Final model presented, including intercept 45 (32) NA 
   Final model presented without intercept 17 (12) NA 
   Alternative presentation 16 (11) NA 
   Insufficient information to allow individual risk calculation 64 (45) NA 
 Predictive performance   
  Discrimination   
   C-statistic, range 0.70–0.95 0.56–0.97 
    ≤28 wk and ≤1000 g 0.71–0.89 0.56–0.95 
    ≤32 wk and ≤1500 g 0.70–0.95 0.67–0.97 
   Not reported 9 (6.3) 6 (5.4) 
  Calibrationa   
   Reported 40 (28) 25 (20) 
   Hosmer–Lemeshow 35 (88) 20 (80) 
   Calibration plot 10 (25) 6 (24) 
   Observed/expected ratio 2 (5.0) 0 (0.0) 
  Both discrimination and calibration reported 36 (25) 30 (25) 
 Internal validation   
  Internally validated models 84 (59) NA 
  Method of validationa   
   Random split of data 42 (50) NA 
   Cross-validation 18 (21) NA 
   Nonrandom split of data 20 (24) NA 
   Resampling 3 (3.6) NA 
   Other 2 (2.4) NA 

Numbers are presented as n (%) or median (quartiles 1–3), unless stated otherwise. If “missing” or “unclear” was not reported, it means characteristic was available for all studies or models. If percentage was calculated relative to specific characteristic or category instead of per study or model, numbers are indented. NA, not applicable.

a

Percentages do not add up to 100% because studies and models might belong to >1 category.

Figure 2 summarizes all predictors included in the final models. Variables concerning size and maturity of the infant and variables concerning birth and delivery were most often included (in 76% and 63% of the final models, respectively).

FIGURE 2

Predictors included in the final development models. The bars reflect the percentage of the 153 models including this predictor; the number at the end of each bar reflects the absolute number of models including this predictor. The upper bar of each category shows the total number and percentage of models including a predictor in this category; subsequently, the categories are subdivided into the lighter-color bars showing the specific predictors in a certain category. Models might have included >1 predictor of a category. BPD, bronchopulmonary disease; CPAP, continuous positive airway pressure; Fio2, fraction of inspired oxygen; GA, gestational age; IVH, intraventricular hemorrhage; NEC, necrotizing enterocolitis; NICHD, Eunice Kennedy Shriver National Institute of Child Health and Human Development; PPHN, persistent pulmonary hypertension of the newborn; PPROM, preterm prelabor rupture of membranes; SNAP, Score for Neonatal Acute Physiology; TRIPS-II, Transport Risk Index of Physiologic Stability, version II.

FIGURE 2

Predictors included in the final development models. The bars reflect the percentage of the 153 models including this predictor; the number at the end of each bar reflects the absolute number of models including this predictor. The upper bar of each category shows the total number and percentage of models including a predictor in this category; subsequently, the categories are subdivided into the lighter-color bars showing the specific predictors in a certain category. Models might have included >1 predictor of a category. BPD, bronchopulmonary disease; CPAP, continuous positive airway pressure; Fio2, fraction of inspired oxygen; GA, gestational age; IVH, intraventricular hemorrhage; NEC, necrotizing enterocolitis; NICHD, Eunice Kennedy Shriver National Institute of Child Health and Human Development; PPHN, persistent pulmonary hypertension of the newborn; PPROM, preterm prelabor rupture of membranes; SNAP, Score for Neonatal Acute Physiology; TRIPS-II, Transport Risk Index of Physiologic Stability, version II.

Close modal

Figure 3 shows a summary of ROB and applicability for all models. Across nearly all models, ROB related to outcome and predictors was considered low. ROB related to the participants’ domain was high in 14% of the models because of inappropriate inclusion and exclusion criteria of participants; for example, including <50% of the eligible infants or exclusion of infants that died later than the prediction horizon. By contrast, ROB related to the statistical analysis was high in every single model, mostly because of inappropriate handling of missing data (100%), not presenting all relevant performance measures (96%), a low number of participants with the outcome in relation to the number of candidate predictors (45%), and no correction for overfitting when indicated (94%). In summary, the overall ROB was high across all models.

FIGURE 3

ROB and applicability assessment of developed models by using PROBAST.

FIGURE 3

ROB and applicability assessment of developed models by using PROBAST.

Close modal

The concern of the model not being applicable to our research question was high in 30% of the models, mainly because of inclusion of participants different from those in our research question (eg, studies excluding outborn infants).

Table 4, shows key characteristics of the study design, sample size, outcome, and predictive performance of the included model development studies. Although in 33 articles, 112 external validations were performed, the majority of the 142 developed models (n = 136, 96%) had not been externally validated. For some models, an external validation study was included in this review, but not the original development study, because the model was developed in the presurfactant era or in a population not applicable to this review but was externally validated in a time period or population that was applicable. In total, 16 different models were externally validated (Table 3). Median mortality rate was 12% (interquartile range: 10%–19%), which was comparable to the mortality rate in the development studies. The C-statistic was reported for 106 (95%) models, with a range of 0.56 to 0.97. For 30 (25%) models, both discrimination and calibration were reported, with 6 (24%) models presenting calibration using a calibration plot and the majority presenting the resulting P value of a Hosmer–Lemeshow test (n = 20, 80%). Figure 4 shows a summary of ROB and applicability by domain. Across almost all models, ROB related to outcome, predictors, and participants was low. By contrast, ROB related to the analysis was high in almost all models, mostly because of inappropriate handling of missing data (95%) and not presenting a calibration plot (93%). This resulted in an overall high ROB for the validation of 108 (96%) models.

FIGURE 4

ROB and applicability assessment of externally validated models by using PROBAST.

FIGURE 4

ROB and applicability assessment of externally validated models by using PROBAST.

Close modal

The CRIB18  was validated most (n = 15), followed by the CRIB-II19  (n = 12), the SNAPPE-II20  (n = 6), and the Apgar score21  (n = 5) (Table 3). However, the Apgar score was unsuitable for meta-analysis because of substantial heterogeneity across the external validations, caused by differences in moment of prediction, prediction horizon, and type of Apgar score (conventional, specified or expanded score). Results from the studies on external validation of the Apgar score are presented without meta-analysis in Fig 5D.

FIGURE 5

A, Meta-analysis for the CRIB score, showing a forest plot with study-specific C-statistics, the average C-statistic (summary estimate), and the PI. Shown in the first row of the table are characteristics of the original development article on the CRIB score.18  The CRIB score includes 6 parameters: BW, gestation, congenital malformations, maximum base excess, minimum appropriate fraction of inspired oxygen, and maximum appropriate fraction of inspired oxygen in the first 12 hours. Although 16 studies externally validated the CRIB score, 1 study could not be used in the meta-analysis because the C-statistic was not presented, and 1 study could not be used in the meta-analyses because the 95% CI could not be calculated because of missing information on the number of outcomes, resulting in 14 studies used for the meta-analysis. Subgroup analyses was not applicable because all studies were performed in a VLGA/BW population. B, Meta-analysis for the CRIB-II score, showing a forest plot with study-specific C-statistics, the average C-statistic (summary estimate), and the PI. Shown in the first row of the table are characteristics of the original development article on the CRIB-II score.19  The CRIB-II score includes 5 parameters: sex, BW, gestation, temperature at admission, and base excess. Subgroup analyses in VLGA/BW infants are shown in Supplemental Fig 6A. C, Meta-analysis for the SNAPPE-II score, revealing a forest plot with study-specific C-statistics, the average C-statistic (summary estimate), and the PI. Shown in the first row of the table are characteristics of the original development article on the SNAPPE-II score.20  The SNAPPE-II score includes 9 parameters: mean blood pressure, lowest temperature, Po2/fraction of inspired oxygen ratio, lowest serum pH, multiple seizures, urine output, BW, small for GA, and Apgar score at 5 minutes. Subgroup analyses in VLGA/BW infants are shown in Supplemental Fig 6B. D, Results from all external validations of the Apgar score, without meta-analysis. “Conventional” refers to the original scoring system as introduced by Virginia Apgar21  in 1953, including 5 items: heart rate, respiratory effort, reflex irritability, muscle tone, and color. “Specified” refers to scoring the items of the conventional Apgar score independent of the requirements need to achieve condition. “Expanded” refers to scoring the interventions that are required to achieve a condition. “Combined” refers to scoring both the specified and expanded Apgar scores. BW, birth weight; GA, gestational age.

FIGURE 5

A, Meta-analysis for the CRIB score, showing a forest plot with study-specific C-statistics, the average C-statistic (summary estimate), and the PI. Shown in the first row of the table are characteristics of the original development article on the CRIB score.18  The CRIB score includes 6 parameters: BW, gestation, congenital malformations, maximum base excess, minimum appropriate fraction of inspired oxygen, and maximum appropriate fraction of inspired oxygen in the first 12 hours. Although 16 studies externally validated the CRIB score, 1 study could not be used in the meta-analysis because the C-statistic was not presented, and 1 study could not be used in the meta-analyses because the 95% CI could not be calculated because of missing information on the number of outcomes, resulting in 14 studies used for the meta-analysis. Subgroup analyses was not applicable because all studies were performed in a VLGA/BW population. B, Meta-analysis for the CRIB-II score, showing a forest plot with study-specific C-statistics, the average C-statistic (summary estimate), and the PI. Shown in the first row of the table are characteristics of the original development article on the CRIB-II score.19  The CRIB-II score includes 5 parameters: sex, BW, gestation, temperature at admission, and base excess. Subgroup analyses in VLGA/BW infants are shown in Supplemental Fig 6A. C, Meta-analysis for the SNAPPE-II score, revealing a forest plot with study-specific C-statistics, the average C-statistic (summary estimate), and the PI. Shown in the first row of the table are characteristics of the original development article on the SNAPPE-II score.20  The SNAPPE-II score includes 9 parameters: mean blood pressure, lowest temperature, Po2/fraction of inspired oxygen ratio, lowest serum pH, multiple seizures, urine output, BW, small for GA, and Apgar score at 5 minutes. Subgroup analyses in VLGA/BW infants are shown in Supplemental Fig 6B. D, Results from all external validations of the Apgar score, without meta-analysis. “Conventional” refers to the original scoring system as introduced by Virginia Apgar21  in 1953, including 5 items: heart rate, respiratory effort, reflex irritability, muscle tone, and color. “Specified” refers to scoring the items of the conventional Apgar score independent of the requirements need to achieve condition. “Expanded” refers to scoring the interventions that are required to achieve a condition. “Combined” refers to scoring both the specified and expanded Apgar scores. BW, birth weight; GA, gestational age.

Close modal

At meta-analyses, estimated approximate average C-statistics across the included studies were 0.88 (95% CI: 0.83–0.91, I2 = 91%) for the CRIB score (Fig 5A), 0.87 (95% CI: 0.81–0.92, I2 = 94%) for the CRIB-II score (Fig 5B), and 0.86 (95% CI: 0.78–0.92, I2 = 90%) for the SNAPPE-II score (Fig 5C). The 95% PIs were 0.63–0.97, 0.59–0.97, and 0.60–0.96 for CRIB, CRIB-II, and SNAPPE-II scores, respectively. Based on the forest plot in Fig 5A, the study of Asker et al was an outlier in comparison with other studies and as such may be great source of heterogeneity. Exclusion of this study lowered the I2 to 69% and improved the 95% PI to 0.80–0.94. The probabilities that the scores would achieve a discrimination >0.7 and >0.8 in future validation studies were 93% and 78%, respectively, for the CRIB score, 92% and 78%, respectively, for the CRIB-II score, and 93% and 78% for the SNAPPE-II score. A calibration plot was presented for 2 external validation studies of the CRIB score and for 1 external validation study of the CRIB-II score, showing poor and good calibration for the CRIB score19,22  and good calibration for the CRIB-II score.22  Subgroup analyses in a VLGA/BW population for the CRIB-II and SNAPPE-II scores showed similar results (Supplemental Fig 6).

In this systematic review, we summarized all available prognostic models for mortality prediction in liveborn very preterm infants. In total, 142 models from 35 studies on model development and 112 models from 33 studies on external validation were identified, revealing that there is an abundance of mortality risk prediction models for very preterm infants. ROB assessment showed high ROB in the majority of the models, most often because of inadequate (reporting of the) analysis. Furthermore, internal and external validation of these models is often lacking.

Four main identified methodologic flaws within the analysis domain need addressing. First, at development, 61 (52%) models had a number of participants with the outcome in relation to the number of candidate predictors (EPV) <10, resulting in high ROB according to PROBAST because of the risk of overfitting. With such a small EPV, it is recommended to account for overfitting and optimism to decrease the ROB,9  but this was scarcely done in the included models. Historically, and as such in PROBAST, sample size consideration have been based on the EPV; however, it has been recently suggested to also include the total number of participants, the outcome incidence in the study population, and the expected predictive performance.23 

Second, none of the included studies handled participants with missing data correctly according to PROBAST. Use of missing data as an exclusion criterion or excluding enrolled participants with any missing data from the analysis leads to biased associations and model performance.2433  Therefore, multiple imputation is recommended to handle missing data because it leads to the least biased results with correct SEs and P values.2429,3133 

Third, information on both calibration and discrimination was presented for only 25% of the models. Calibration was most often assessed by using a Hosmer–Lemeshow test, whereas this statistical test indicates neither the presence nor the magnitude of any miscalibration and is known to be dependent on the sample size.9  Therefore, it is recommended to present a calibration plot instead, which unfortunately was hardly ever reported in the included articles.

Fourth, 84 of the 142 models (59%) were internally validated, most often by using a random split of the data into development and validation data sets (n = 42, 50%). However, this has been shown to be an insufficient way of data use and as an inadequate way to measure optimism.33,34  Instead, bootstrapping or cross-validation is recommended to quantify overfitting of the developed model and optimism in its predictive performance.35  Furthermore, the majority of the studies performing internal validation seemingly failed to replicate the exact model development procedure and thus may still underestimate the actual optimism and thus overestimate the actual performance of their model.36,37 

Methodologic flaws identified within the ROB assessment of the participants domain included using a nested case control design without correction for baseline risk, inclusion of <50% of the eligible infants, and exclusion of all infants who died after 7 days. Within the applicability assessment, issues raised included exclusion of outborn infants, a study conducted in a high altitude NICU, and exclusion of all infants who died within 72 hours.

This review reveals that development of new prediction models for mortality in preterm infants is an ongoing practice. However, many models are of unknown value for daily practice because of lack of validation. Therefore, future emphasis should be shifted toward external validation and adaption of existing prediction models, which applies to a broader field of prediction modeling and has been stated before.38,39  Ideally, these validation studies are performed by using prospectively collected data because validation studies have higher potential for ROB when participant data are from existing sources with data collected for a purpose other than validation or updating of prediction models. Consecutively, impact studies are warranted to quantify the effect of a prognostic model on physician’s behavior and patient outcome.5 

In the majority of development studies, participants, predictors, and outcome were described sufficiently clear and did not introduce bias. Contrastingly, high ROB occurred in the analysis section of practically all studies because of inappropriate analysis methods or omission of important statistical considerations. Moreover, for >40% of the models, information to allow others to correctly apply the models in new individuals (ie, information on predictors and coefficients of the final developed model including intercept) was insufficient. Improvements in studies on mortality risk prediction in very preterm infants are needed and can be achieved from better (reporting of) analyses. A first step in that direction would be better adherence to the TRIPOD (transparent reporting of a multivariable prediction model for individual prognosis or diagnosis) statement40  and consideration of PROBAST.8,9 

This review showed that variables concerning size and maturity of the infant (in 76% of the models), variables concerning birth and delivery (63%), and maternal variables (39%) were most often included. Specifically, gestational age, Apgar score, birth weight, sex, multiplicity, antenatal corticosteroids, and ethnicity were used as predictors in >40 models. This reveals the importance of these variables in mortality risk prediction in preterm infants. Nevertheless, because the vast majority of these models were considered of low quality and calibration of these models was not reported, their actual in value in mortality risk prediction remains unclear.

At meta-analysis of the C-statistic, the CRIB, CRIB-II, and SNAPPE-II, all revealed excellent performance (C-statistic >0.85), comparable to a recently published meta-analysis.41  However, considerable heterogeneity across the included studies was found (I2 ≥ 90% for all models), which can originate from differences between study populations and study designs.42,43  Important characteristics of the included studies, including inclusion criteria, moment of prediction, and time span of the outcome, are shown in Fig 5 A–C and indicate substantial differences in study population. However, it is difficult to draw conclusions on the defining sources of heterogeneity, meaning further research will be necessary. Although the 3 models revealed great discriminative performance, information on calibration is largely lacking. To provide a complete and accurate judgment of the performance of these models, information on calibration, ideally by providing a calibration plot, will be needed.

Health care decisions for individual patients should be informed by using the best available evidence. Systematic reviews summarizing large amounts of information are very powerful tools to facilitate clinical decision-making but also to identify gaps in our knowledge or room for improvement. In our article, we clearly show a lack of evidence regarding the external validity of the majority of models, poor (reporting of) analyses, and absence of calibration plots in the majority of the models. The abundant availability of insufficiently validated models is not useful for clinical practice.44  In our systematic review, the extensive ROB assessment revealed that the model published by Manktelow et al22  had the highest quality among all 142 developed models. Furthermore, the external validity of the CRIB, CRIB-II, and SNAPPE-II models has been assessed often and show good discriminative performance.18,20,45  Unfortunately, information on their calibration is still lacking. Based on the currently available evidence, we consider these 4 prediction models to have the highest potential for use in clinical practice. A first step would be to (again) externally validate these models, but now also focus on calibration. Presenting discrimination will be sufficient when the aim is to distinguish high and low risk populations, but for individual prediction information on calibration is essential. During such external validation, the original model may require an update, thereby addressing the potential issue of miscalibration associated with differences in mortality rate between the development and validation population. Ideally, such external validations are followed by impact studies to quantify the effect of a prognostic model on physician’s behavior and patient outcome.

Since Medlock et al7  published their systematic review of models for the prediction of mortality in very premature infants in 2011, only 1 systematic review in Spanish has been published.46  Large improvements of our review compared with both existing reviews are (1) the use of a standard tool for ROB assessment, which is an essential step in any systematic review8,9 ; (2) the inclusion of articles externally validating models and meta-analysis of the models most often validated, giving additional insight in their quality and value for clinical practice; and (3) the vast amount of newly published prediction models since 2011, showing the need for an update to provide a comprehensive overview of prediction models for mortality in very preterm infants.

However, this study has several limitations, too. First, for some models, an external validation study was included, but not the development study, because studies developed in the presurfactant era or in the general NICU population that included very preterm born infants but also infants born >32 weeks’ gestational age were excluded. Second, PROBAST is a recently developed tool using contemporary expertise and knowledge, which was applied to models of which some were developed and published decades ago. Information currently necessary for assessment of bias (eg, calibration) was often not reported, leading to high ROB in the analysis domain across all models. Third, the majority of the included studies originated from developed countries, making this review less applicable to developing countries. In future research, validating prediction models in developing countries might require more attention because there is much to be gained with respect to postnatal mortality in preterm infants.

There is an abundance of mortality risk prediction models for very preterm infants. Improvement in studies on mortality risk prediction in very preterm infants can be achieved from improved (reporting of) analyses. Many of the models are of unknown value for daily practice because of lack of external validation. Meta-analysis on the widely used CRIB, CRIB-II and SNAPPE-II scores revealed good discriminative performance of these scores, but calibration is currently unknown. Instead of developing additional mortality prediction models for preterm infants, the emphasis should be shifted toward external validation and consecutive adaption of the existing prediction models for mortality in preterm infants.

Dr van Beek designed the study, performed the literature search, conducted the study selection process, data extraction, and critical appraisal, analyzed the data, and wrote the first draft of the manuscript; Drs Andriessen and Onland conducted the study selection process, data extraction, and critical appraisal, provided critical feedback, and helped shape the research, analysis, and manuscript; Dr Schuit designed the study, conducted the study selection process, data extraction, and critical appraisal, provided critical feedback, helped shape the research, analysis, and manuscript, and supervised the project; and all authors approved the final manuscript as submitted and agree to be accountable for all aspects of the work.

FUNDING: Dr van Beek was supported by an unrestricted grant from Stichting Tiny & Anny van Doorne Fonds. The funding source had no role in the design, conduct, analyses, or reporting of the study or in the decision to submit the manuscript for publication. The other authors received no external funding.

     
  • CHARMS

    critical appraisal and data extraction for systematic reviews of prediction modelling studies

  •  
  • CI

    confidence interval

  •  
  • CRIB

    clinical risk index for babies

  •  
  • ELGA/BW

    extremely low gestational age or birth weight

  •  
  • EPV

    events per variable

  •  
  • PI

    prediction interval

  •  
  • PROBAST

    prediction model risk of bias assessment tool

  •  
  • ROB

    risk of bias

  •  
  • SNAPPE

    score for neonatal acute physiology perinatal extension

  •  
  • VLGA/BW

    very low gestational age or birth weight

1
Tucker
J
,
McGuire
W
.
Epidemiology of preterm birth
.
BMJ
.
2004
;
329
(
7467
):
675
678
2
World Health Organization
.
Preterm birth. fact sheet
:
2018
. Available at: www.who.int/news-room/fact-sheets/detail/preterm-birth. Accessed December 20, 2019
3
Blencowe
H
,
Cousens
S
,
Oestergaard
MZ
, et al
.
National, regional, and worldwide estimates of preterm birth rates in the year 2010 with time trends since 1990 for selected countries: a systematic analysis and implications
.
Lancet
.
2012
;
379
(
9832
):
2162
2172
4
Schuit
E
,
Hukkelhoven
CW
,
Manktelow
BN
, et al
.
Prognostic models for stillbirth and neonatal death in very preterm birth: a validation study
.
Pediatrics
.
2012
;
129
(
1
). Available at: www.pediatrics.org/cgi/content/full/129/1/e120
5
Moons
KG
,
Altman
DG
,
Vergouwe
Y
,
Royston
P
.
Prognosis and prognostic research: application and impact of prognostic models in clinical practice
.
BMJ
.
2009
;
338
:
b606
6
Leushuis
E
,
van der Steeg
JW
,
Steures
P
, et al
.
Prediction models in reproductive medicine: a critical appraisal
.
Hum Reprod Update
.
2009
;
15
(
5
):
537
552
7
Medlock
S
,
Ravelli
AC
,
Tamminga
P
,
Mol
BW
,
Abu-Hanna
A
.
Prediction of mortality in very premature infants: a systematic review of prediction models
.
PLoS One
.
2011
;
6
(
9
):
e23441
8
Wolff
RF
,
Moons
KGM
,
Riley
RD
, et al.;
PROBAST Group
.
PROBAST: a tool to assess the risk of bias and applicability of prediction model studies
.
Ann Intern Med
.
2019
;
170
(
1
):
51
58
9
Moons
KGM
,
Wolff
RF
,
Riley
RD
, et al
.
PROBAST: a tool to assess risk of bias and applicability of prediction model studies: explanation and elaboration
.
Ann Intern Med
.
2019
;
170
(
1
):
W1
W33
10
Liberati
A
,
Altman
DG
,
Tetzlaff
J
, et al
.
The PRISMA statement for reporting systematic reviews and meta-analyses of studies that evaluate healthcare interventions: explanation and elaboration
.
BMJ
.
2009
;
339
:
b2700
11
Moons
KG
,
de Groot
JA
,
Bouwmeester
W
, et al
.
Critical appraisal and data extraction for systematic reviews of prediction modelling studies: the CHARMS checklist
.
PLoS Med
.
2014
;
11
(
10
):
e1001744
12
van Beek
P
,
Andriessen
P
,
Onland
W
,
Schuit
E
.
Prognostic models for mortality in very preterm infants
.
PROSPERO 2019 CRD42019141434. Available at
: https://www.crd.york.ac.uk/prospero/display_record.php?ID=CRD42019141434. Accessed November 18, 2020
13
Snell
KI
,
Ensor
J
,
Debray
TP
,
Moons
KG
,
Riley
RD
.
Meta-analysis of prediction model performance across multiple studies: which scale helps ensure between-study normality for the C-statistic and calibration measures?
Stat Methods Med Res
.
2018
;
27
(
11
):
3505
3522
14
Higgins
JP
,
Thompson
SG
.
Quantifying heterogeneity in a meta-analysis
.
Stat Med
.
2002
;
21
(
11
):
1539
1558
15
Debray
TP
,
Damen
JA
,
Snell
KI
, et al
.
A guide to systematic review and meta-analysis of prediction model performance
.
BMJ
.
2017
;
356
:
i6460
16
Deeks
JJ
,
Higgins
J
,
Altman
DG
. Analysing Data and Undertaking Meta-Analyses. In:
Higgins
JPT
,
Green
S
, eds.
Cochrane Handbook for Systematic Reviews of Interventions
.
London, United Kingdom
:
Cochrane Collaboration
;
2011
17
Damen
JA
,
Pajouheshnia
R
,
Heus
P
, et al
.
Performance of the Framingham risk models and pooled cohort equations for predicting 10-year risk of cardiovascular disease: a systematic review and meta-analysis
.
BMC Med
.
2019
;
17
(
1
):
109
18
The International Neonatal Network
.
The CRIB (clinical risk index for babies) score: a tool for assessing initial neonatal risk and comparing performance of neonatal intensive care units. [published correction appears in Lancet. 1993;342(8871):626]
.
Lancet
.
1993
;
342
(
8865
):
193
198
19
Parry
G
,
Tucker
J
,
Tarnow-Mordi
W
;
UK Neonatal Staffing Study Collaborative Group
.
CRIB II: an update of the clinical risk index for babies score
.
Lancet
.
2003
;
361
(
9371
):
1789
1791
20
Richardson
DK
,
Corcoran
JD
,
Escobar
GJ
,
Lee
SK
.
SNAP-II and SNAPPE-II: simplified newborn illness severity and mortality risk scores
.
J Pediatr
.
2001
;
138
(
1
):
92
100
21
Apgar
V
.
A proposal for a new method of evaluation of the newborn infant
.
Curr Res Anest Anal
.
1953
;
32
(
4
):
260
267
22
Manktelow
BN
,
Seaton
SE
,
Field
DJ
,
Draper
ES
.
Population-based estimates of in-unit survival for very preterm infants
.
Pediatrics
.
2013
;
131
(
2
). Available at: www.pediatrics.org/cgi/content/full/131/2/e425
23
Riley
RD
,
Ensor
J
,
Snell
KIE
, et al
.
Calculating the sample size required for developing a clinical prediction model
.
BMJ
.
2020
;
368
:
m441
24
Schafer
JL
.
Multiple imputation: a primer
.
Stat Methods Med Res
.
1999
;
8
(
1
):
3
15
25
van Buuren
S
,
Boshuizen
HC
,
Knook
DL
.
Multiple imputation of missing blood pressure covariates in survival analysis
.
Stat Med
.
1999
;
18
(
6
):
681
694
26
White
IR
,
Royston
P
,
Wood
AM
.
Multiple imputation using chained equations: issues and guidance for practice
.
Stat Med
.
2011
;
30
(
4
):
377
399
27
Donders
AR
,
van der Heijden
GJ
,
Stijnen
T
,
Moons
KG
.
Review: a gentle introduction to imputation of missing values
.
J Clin Epidemiol
.
2006
;
59
(
10
):
1087
1091
28
Janssen
KJ
,
Donders
AR
,
Harrell
FE
 Jr.
, et al
.
Missing covariate data in medical research: to impute is better than to ignore
.
J Clin Epidemiol
.
2010
;
63
(
7
):
721
727
29
Marshall
A
,
Altman
DG
,
Royston
P
,
Holder
RL
.
Comparison of techniques for handling missing covariate data within prognostic modelling studies: a simulation study
.
BMC Med Res Methodol
.
2010
;
10
:
7
30
Sterne
JA
,
White
IR
,
Carlin
JB
, et al
.
Multiple imputation for missing data in epidemiological and clinical research: potential and pitfalls
.
BMJ
.
2009
;
338
:
b2393
31
Vergouwe
Y
,
Royston
P
,
Moons
KG
,
Altman
DG
.
Development and validation of a prediction model with missing predictor data: a practical approach
.
J Clin Epidemiol
.
2010
;
63
(
2
):
205
214
32
Groenwold
RH
,
White
IR
,
Donders
AR
,
Carpenter
JR
,
Altman
DG
,
Moons
KG
.
Missing covariate data in clinical research: when and when not to use the missing-indicator method for analysis
.
CMAJ
.
2012
;
184
(
11
):
1265
1269
33
Beltempo
M
,
Shah
PS
,
Ye
XY
,
Afifi
J
,
Lee
S
,
McMillan
DD
;
Canadian Neonatal Network Investigators
.
SNAP-II for prediction of mortality and morbidity in extremely preterm infants
.
J Matern Fetal Neonatal Med
.
2019
;
32
(
16
):
2694
2701
34
Ambalavanan
N
,
Carlo
WA
,
Tyson
JE
, et al.;
Generic Database
;
Subcommittees of the Eunice Kennedy Shriver National Institute of Child Health and Human Development Neonatal Research Network
.
Outcome trajectories in extremely preterm infants
.
Pediatrics
.
2012
;
130
(
1
). Available at: www.pediatrics.org/cgi/content/full/130/1/e115
35
Steyerberg
EW
,
Harrell
FE
 Jr.
.
Prediction models need appropriate internal, internal-external, and external validation
.
J Clin Epidemiol
.
2016
;
69
:
245
247
36
Steyerberg
EW
,
Harrell
FE
,
Borsboom
GJ
,
Eijkemans
MJ
,
Vergouwe
Y
,
Habbema
JD
.
Internal validation of predictive models: efficiency of some procedures for logistic regression analysis
.
J Clin Epidemiol
.
2001
;
54
(
8
):
774
781
37
Austin
PC
,
Steyerberg
EW
.
Events per variable (EPV) and the relative performance of different strategies for estimating the out-of-sample validity of logistic regression models
.
Stat Methods Med Res
.
2017
;
26
(
2
):
796
808
38
Kleinrouweler
CE
,
Cheong-See
FM
,
Collins
GS
, et al
.
Prognostic models in obstetrics: available, but far from applicable
.
Am J Obstet Gynecol
.
2016
;
214
(
1
):
79
90.e36
39
Damen
JA
,
Hooft
L
,
Schuit
E
, et al
.
Prediction models for cardiovascular disease risk in the general population: systematic review
.
BMJ
.
2016
;
353
:
i2416
40
Moons
KG
,
Altman
DG
,
Reitsma
JB
, et al
.
Transparent Reporting of a multivariable prediction model for Individual Prognosis or Diagnosis (TRIPOD): explanation and elaboration
.
Ann Intern Med
.
2015
;
162
(
1
):
W1
W73
41
McLeod
JS
,
Menon
A
,
Matusko
N
, et al
.
Comparing mortality risk models in VLBW and preterm infants: systematic review and meta-analysis
.
J Perinatol
.
2020
;
40
(
5
):
695
703
42
Debray
TP
,
Vergouwe
Y
,
Koffijberg
H
,
Nieboer
D
,
Steyerberg
EW
,
Moons
KG
.
A new framework to enhance the interpretation of external validation studies of clinical prediction models
.
J Clin Epidemiol
.
2015
;
68
(
3
):
279
289
43
Vergouwe
Y
,
Moons
KG
,
Steyerberg
EW
.
External validity of risk models: use of benchmark values to disentangle a case-mix effect from incorrect coefficients
.
Am J Epidemiol
.
2010
;
172
(
8
):
971
980
44
Wynants
L
,
Van Calster
B
,
Collins
GS
, et al
.
Prediction models for diagnosis and prognosis of covid-19 infection: systematic review and critical appraisal
.
BMJ
.
2020
;
369
:
m1328
45
Ambalavanan
N
,
Carlo
WA
,
Bobashev
G
, et al.;
National Institute of Child Health and Human Development Neonatal Research Network
.
Prediction of death for extremely low birth weight neonates
.
Pediatrics
.
2005
;
116
(
6
):
1367
1373
46
Del Río
R
,
Thió
M
,
Bosio
M
,
Figueras
J
,
Iriondo
M
.
Prediction of mortality in premature neonates. An updated systematic review [in Spanish]
.
An Pediatr (Barc)
.
2020
;
93
(
1
):
24
33
47
Pishevar
N
,
Fathi
O
,
Backes
CH
,
Shepherd
EG
,
Nelin
LD
.
Predicting survival in infants born at <27 weeks gestation admitted to an all referral neonatal intensive care unit: a pilot study
.
J Perinatol
.
2020
;
40
(
5
):
750
757
48
Podda
M
,
Bacciu
D
,
Micheli
A
,
Bellù
R
,
Placidi
G
,
Gagliardi
L
.
A machine learning approach to estimating preterm infants survival: development of the Preterm Infants Survival Assessment (PISA) predictor
.
Sci Rep
.
2018
;
8
(
1
):
13743
13746
49
Oltman
SP
,
Rogers
EE
,
Baer
RJ
, et al
.
Initial metabolic profiles are associated with 7-day survival among infants born at 22-25 weeks of gestation
.
J Pediatr
.
2018
;
198
:
194
200.e3
50
Cnattingius
S
,
Norman
M
,
Granath
F
,
Petersson
G
,
Stephansson
O
,
Frisell
T
.
Apgar score components at 5 minutes: risks and prediction of neonatal mortality
.
Paediatr Perinat Epidemiol
.
2017
;
31
(
4
):
328
337
51
Koller-Smith
LI
,
Shah
PS
,
Ye
XY
, et al.;
Australian and New Zealand Neonatal Network
;
Canadian Neonatal Network
;
Swedish Neonatal Quality Register
.
Comparing very low birth weight versus very low gestation cohort methods for outcome analysis of high risk preterm infants
.
BMC Pediatr
.
2017
;
17
(
1
):
166
52
Steurer
MA
,
Anderson
J
,
Baer
RJ
, et al
.
Dynamic outcome prediction in a socio-demographically diverse population-based cohort of extremely preterm neonates
.
J Perinatol
.
2017
;
37
(
6
):
709
715
53
Sullivan
BA
,
McClure
C
,
Hicks
J
,
Lake
DE
,
Moorman
JR
,
Fairchild
KD
.
Early heart rate characteristics predict death and morbidities in preterm infants
.
J Pediatr
.
2016
;
174
:
57
62
54
Jeschke
E
,
Biermann
A
,
Günster
C
, et al.;
Routine Data-Based Quality Improvement Panel
.
Mortality and major morbidity of very-low-birth weight infants in Germany 2008-2012: a report based on administrative data
.
Front Pediatr
.
2016
;
4
:
23
55
Rüdiger
M
,
Braun
N
,
Aranda
J
, et al.;
TEST-Apgar Study-Group
.
Neonatal assessment in the delivery room–trial to evaluate a specified type of Apgar (TEST-Apgar)
.
BMC Pediatr
.
2015
;
15
:
18
56
Vincer
MJ
,
Armson
BA
,
Allen
VM
, et al
.
An algorithm for predicting neonatal mortality in threatened very preterm birth
.
J Obstet Gynaecol Can
.
2015
;
37
(
11
):
958
965
57
Ravelli
AC
,
Schaaf
JM
,
Mol
BW
, et al
.
Antenatal prediction of neonatal mortality in very premature infants
.
Eur J Obstet Gynecol Reprod Biol
.
2014
;
176
:
126
131
58
Wu
PL
,
Lee
WT
,
Lee
PL
,
Chen
HL
.
Predictive power of serial neonatal therapeutic intervention scoring system scores for short-term mortality in very-low-birth-weight infants
.
Pediatr Neonatol
.
2015
;
56
(
2
):
108
113
59
Dong
Y
,
Yue
G
,
Yu
JL
.
Changes in perinatal care and predictors of in-hospital mortality for very low birth weight preterm infants
.
Iran J Pediatr
.
2012
;
22
(
3
):
326
332
60
Lee
SK
,
Aziz
K
,
Dunn
M
, et al.;
Canadian Neonatal Network
.
Transport Risk Index of Physiologic Stability, version II (TRIPS-II): a simple and practical neonatal illness severity score
.
Am J Perinatol
.
2013
;
30
(
5
):
395
400
61
Phillips
LA
,
Dewhurst
CJ
,
Yoxall
CW
.
The prognostic value of initial blood lactate concentration measurements in very low birthweight infants and their use in development of a new disease severity scoring system
.
Arch Dis Child Fetal Neonatal Ed
.
2011
;
96
(
4
):
F275
F280
62
Schenone
MH
,
Aguin
E
,
Li
Y
,
Lee
C
,
Kruger
M
,
Bahado-Singh
RO
.
Prenatal prediction of neonatal survival at the borderline viability
.
J Matern Fetal Neonatal Med
.
2010
;
23
(
12
):
1413
1418
63
Cole
TJ
,
Hey
E
,
Richmond
S
.
The PREM score: a graphical tool for predicting survival in very preterm births
.
Arch Dis Child Fetal Neonatal Ed
.
2010
;
95
(
1
):
F14
F19
64
Gargus
RA
,
Vohr
BR
,
Tyson
JE
, et al
.
Unimpaired outcomes for extremely low birth weight infants at 18 to 22 months
.
Pediatrics
.
2009
;
124
(
1
):
112
121
65
Forsblad
K
,
Källén
K
,
Marsál
K
,
Hellström-Westas
L
.
Short-term outcome predictors in infants born at 23-24 gestational weeks
.
Acta Paediatr
.
2008
;
97
(
5
):
551
556
66
Zupancic
JA
,
Richardson
DK
,
Horbar
JD
, et al
.
Revalidation of the score for neonatal acute physiology in the Vermont oxford network
.
Pediatrics
.
2007
;
119
(
1
):
156
67
Forsblad
K
,
Källén
K
,
Marsál
K
,
Hellström-Westas
L
.
Apgar score predicts short-term outcome in infants born at 25 gestational weeks
.
Acta Paediatr
.
2007
;
96
(
2
):
166
171
68
Evans
N
,
Hutchinson
J
,
Simpson
JM
,
Donoghue
D
,
Darlow
B
,
Henderson-Smart
D
.
Prenatal predictors of mortality in very preterm infants cared for in the Australian and New Zealand neonatal network
.
Arch Dis Child Fetal Neonatal Ed
.
2007
;
92
(
1
):
F34
F40
69
Marshall
G
,
Tapia
JL
,
D’Apremont
I
, et al.;
Grupo Colaborativo NEOCOSUR
.
A new score for predicting neonatal very low birth weight mortality risk in the NEOCOSUR South American Network
.
J Perinatol
.
2005
;
25
(
9
):
577
582
70
Locatelli
A
,
Roncaglia
N
,
Andreotti
C
, et al
.
Factors affecting survival in infants weighing 750 g or less
.
Eur J Obstet Gynecol Reprod Biol
.
2005
;
123
(
1
):
52
55
71
Janota
J
,
Stranák
Z
,
Statecná
B
,
Dohnalová
A
,
Sípek
A
,
Simák
J
.
Characterization of multiple organ dysfunction syndrome in very low birthweight infants: a new sequential scoring system
.
Shock
.
2001
;
15
(
5
):
348
352
72
Ambalavanan
N
,
Carlo
WA
.
Comparison of the prediction of extremely low birth weight neonatal mortality by regression analysis and by neural networks
.
Early Hum Dev
.
2001
;
65
(
2
):
123
137
73
Doyle
LW
;
Victorian Infant Collaborative Study Group
.
Outcome at 5 years of age of children 23 to 27 weeks’ gestation: refining the prognosis
.
Pediatrics
.
2001
;
108
(
1
):
134
141
74
Pollack
MM
,
Koch
MA
,
Bartel
DA
, et al
.
A comparison of neonatal mortality risk prediction models in very low birth weight infants
.
Pediatrics
.
2000
;
105
(
5
):
1051
1057
75
Draper
ES
,
Manktelow
B
,
Field
DJ
,
James
D
.
Prediction of survival for preterm births by weight and gestational age: retrospective population based study
.
BMJ
.
1999
;
319
(
7217
):
1093
1097
76
Zernikow
B
,
Holtmannspoetter
K
,
Michel
E
, et al
.
Artificial neural network for risk assessment in preterm neonates
.
Arch Dis Child Fetal Neonatal Ed
.
1998
;
79
(
2
):
F129
F134
77
Bührer
C
,
Metze
B
,
Obladen
M
.
CRIB, CRIB-II, birth weight or gestational age to assess mortality risk in very low birth weight infants?
Acta Paediatr
.
2008
;
97
(
7
):
899
903
78
De Felice
C
,
Del Vecchio
A
,
Latini
G
.
Evaluating illness severity for very low birth weight infants: CRIB or CRIB-II?
J Matern Fetal Neonatal Med
.
2005
;
17
(
4
):
257
260
79
Gagliardi
L
,
Cavazza
A
,
Brunelli
A
, et al
.
Assessing mortality risk in very low birthweight infants: a comparison of CRIB, CRIB-II, and SNAPPE-II
.
Arch Dis Child Fetal Neonatal Ed
.
2004
;
89
(
5
):
F419
F422
80
Zardo
MS
,
Procianoy
RS
.
Comparison between different mortality risk scores in a neonatal intensive care unit [in Portuguese]
.
Rev Saude Publica
.
2003
;
37
(
5
):
591
596
81
Brito
AS
,
Matsuo
T
,
Gonzalez
MR
,
de Carvalho
AB
,
Ferrari
LS
.
CRIB score, birth weight and gestational age in neonatal mortality risk evaluation [in Portuguese]
.
Rev Saude Publica
.
2003
;
37
(
5
):
597
602
82
Maier
RF
,
Caspar-Karweck
UE
,
Grauel
EL
,
Bassir
C
,
Metze
BC
,
Obladen
M
.
A comparison of two mortality risk scores for very low birthweight infants: clinical risk index for babies and Berlin score
.
Intensive Care Med
.
2002
;
28
(
9
):
1332
1335
83
Khanna
R
,
Taneja
V
,
Singh
SK
,
Kumar
N
,
Sreenivas
V
,
Puliyel
JM
.
The clinical risk index of babies (CRIB) score in India
.
Indian J Pediatr
.
2002
;
69
(
11
):
957
960
84
Kaaresen
PI
,
Døhlen
G
,
Fundingsrud
HP
,
Dahl
LB
.
The use of CRIB (clinical risk index for babies) score in auditing the performance of one neonatal intensive care unit
.
Acta Paediatr
.
1998
;
87
(
2
):
195
200
85
de Courcy-Wheeler
RH
,
Wolfe
CD
,
Fitzgerald
A
,
Spencer
M
,
Goodman
JD
,
Gamsu
HR
.
Use of the CRIB (clinical risk index for babies) score in prediction of neonatal mortality and morbidity
.
Arch Dis Child Fetal Neonatal Ed
.
1995
;
73
(
1
):
F32
F36
86
Rautonen
J
,
Makela
A
,
Boyd
H
,
Apajasalo
M
,
Pohjavuori
M
.
CRIB and SNAP: assessing the risk of death for preterm neonates
.
Lancet
.
1994
;
343
(
8908
):
1272
1273
87
Sotodate
G
,
Oyama
K
,
Matsumoto
A
,
Konishi
Y
,
Toya
Y
,
Takashimizu
N
.
Predictive ability of neonatal illness severity scores for early death in extremely premature infants [published online ahead of print February 25, 2020]
.
J Matern Fetal Neonatal Med
. doi:
88
Park
JH
,
Chang
YS
,
Ahn
SY
,
Sung
SI
,
Park
WS
.
Predicting mortality in extremely low birth weight infants: comparison between gestational age, birth weight, Apgar score, CRIB II score, initial and lowest serum albumin levels
.
PLoS One
.
2018
;
13
(
2
):
e0192232
89
Ezz-Eldin
ZM
,
Hamid
TA
,
Youssef
MR
,
Nabil
H-D
.
Clinical risk index for babies (CRIB II) scoring system in prediction of mortality in premature babies
.
J Clin Diagn Res
.
2015
;
9
(
6
):
SC08
-
SC11
90
Reid
S
,
Bajuk
B
,
Lui
K
,
Sullivan
EA
;
NSW and ACT Neonatal Intensive Care Units Audit Group, PSN
.
Comparing CRIB-II and SNAPPE-II as mortality predictors for very preterm infants
.
J Paediatr Child Health
.
2015
;
51
(
5
):
524
528
91
Greenwood
S
,
Abdel-Latif
ME
,
Bajuk
B
,
Lui
K
;
NSW and ACT Neonatal Intensive Care Units Audit Group
.
Can the early condition at admission of a high-risk infant aid in the prediction of mortality and poor neurodevelopmental outcome? A population study in Australia
.
J Paediatr Child Health
.
2012
;
48
(
7
):
588
595
92
Rastogi
PK
,
Sreenivas
V
,
Kumar
N
.
Validation of CRIB II for prediction of mortality in premature babies
.
Indian Pediatr
.
2010
;
47
(
2
):
145
147
93
Dalili
H
,
Sheikh
M
,
Hardani
AK
,
Nili
F
,
Shariat
M
,
Nayeri
F
.
Comparison of the combined versus conventional Apgar scores in predicting adverse neonatal outcomes
.
PLoS One
.
2016
;
11
(
2
):
e0149464
94
Asker
HS
,
Satar
M
,
Yıldızdaş
HY
, et al
.
Evaluation of score for neonatal acute physiology and perinatal extension II and clinical risk index for babies with additional parameters
.
Pediatr Int (Roma)
.
2016
;
58
(
10
):
984
987
95
Mori
R
,
Shiraishi
J
,
Negishi
H
,
Fujimura
M
.
Predictive value of Apgar score in infants with very low birth weight
.
Acta Paediatr
.
2008
;
97
(
6
):
720
723
96
Horbar
JD
,
Onstad
L
,
Wright
E
.
Predicting mortality risk for infants weighing 501 to 1500 grams at birth: a National Institutes of Health Neonatal Research Network report
.
Crit Care Med
.
1993
;
21
(
1
):
12
18
97
Yeo
KT
,
Safi
N
,
Wang
YA
, et al
.
Prediction of outcomes of extremely low gestational age newborns in Australia and New Zealand
.
BMJ Paediatr Open
.
2017
;
1
(
1
):
e000205
98
Boland
RA
,
Davis
PG
,
Dawson
JA
,
Doyle
LW
;
Victorian Infant Collaborative Study Group
.
Predicting death or major neurodevelopmental disability in extremely preterm infants born in Australia
.
Arch Dis Child Fetal Neonatal Ed
.
2013
;
98
(
3
):
F201
F204
99
Richardson
DK
,
Phibbs
CS
,
Gray
JE
,
McCormick
MC
,
Workman-Daniels
K
,
Goldmann
DA
.
Birth weight and illness severity: independent predictors of neonatal mortality
.
Pediatrics
.
1993
;
91
(
5
):
969
975
100
Richardson
DK
,
Gray
JE
,
McCormick
MC
,
Workman
K
,
Goldmann
DA
.
Score for neonatal acute physiology: a physiologic severity index for neonatal intensive care
.
Pediatrics
.
1993
;
91
(
3
):
617
623
101
Tyson
JE
,
Parikh
NA
,
Langer
J
,
Green
C
,
Higgins
RD
;
National Institute of Child Health and Human Development Neonatal Research Network
.
Intensive care for extreme prematurity–moving beyond gestational age
.
N Engl J Med
.
2008
;
358
(
16
):
1672
1681
102
Marrs
CC
,
Pedroza
C
,
Mendez-Figueroa
H
,
Chauhan
SP
,
Tyson
JE
.
Infant outcomes after periviable birth: external validation of the neonatal research network estimator with the BEAM trial
.
Am J Perinatol
.
2016
;
33
(
6
):
569
576
103
Gray
JE
,
Richardson
DK
,
McCormick
MC
,
Workman-Daniels
K
,
Goldmann
DA
.
Neonatal therapeutic intervention scoring system: a therapy-based severity-of-illness index
.
Pediatrics
.
1992
;
90
(
4
):
561
567
104
Maier
RF
,
Rey
M
,
Metze
BC
,
Obladen
M
.
Comparison of mortality risk: a score for very low birthweight infants
.
Arch Dis Child Fetal Neonatal Ed
.
1997
;
76
(
3
):
F146
-
NaN–F151
105
Rubin
DB
,
Schenker
N
.
Multiple imputation in health-care databases: an overview and some applications
.
Stat Med
.
1991
;
10
(
4
):
585
598

Competing Interests

POTENTIAL CONFLICT OF INTEREST: The authors have indicated they have no potential conflicts of interest to disclose.

FINANCIAL DISCLOSURE: The authors have indicated they have no financial relationships relevant to this article to disclose.

Supplementary data