Video Abstract

Video Abstract

Close modal
CONTEXT

Prediction models can be a valuable tool in performing risk assessment of mortality in preterm infants.

OBJECTIVE

Summarizing prognostic models for predicting mortality in very preterm infants and assessing their quality.

DATA SOURCES

Medline was searched for all articles (up to June 2020).

STUDY SELECTION

All developed or externally validated prognostic models for mortality prediction in liveborn infants born <32 weeks’ gestation and/or <1500 g birth weight were included.

DATA EXTRACTION

Data were extracted by 2 independent authors. Risk of bias (ROB) and applicability assessment was performed by 2 independent authors using Prediction model Risk of Bias Assessment Tool.

RESULTS

One hundred forty-four models from 36 studies reporting on model development and 118 models from 34 studies reporting on external validation were included. ROB assessment revealed high ROB in the majority of the models, most often because of inadequate (reporting of) analysis. Internal and external validation was lacking in 42% and 94% of these models. Meta-analyses revealed an average C-statistic of 0.88 (95% confidence interval [CI]: 0.83–0.91) for the Clinical Risk Index for Babies score, 0.87 (95% CI: 0.81–0.92) for the Clinical Risk Index for Babies II score, 0.86 (95% CI: 0.78–0.92) for the Score for Neonatal Acute Physiology Perinatal Extension II score and 0.71 (95% CI 0.61–0.79) for the NICHD model.

LIMITATIONS

Occasionally, an external validation study was included, but not the development study, because studies developed in the presurfactant era or general NICU population were excluded.

CONCLUSIONS

Instead of developing additional mortality prediction models for preterm infants, the emphasis should be shifted toward external validation and consecutive adaption of the existing prediction models.

Very preterm birth (<32 completed weeks’ gestation) and very low birth weight (<1500 g) infants are associated with increased mortality and neonatal morbidity and as such are considered a major challenge in perinatal health care.1  Very preterm birth occurs in 1.3% of all live births in developed regions.2  Despite this low prevalence, complications associated with preterm birth are responsible for 35% of the world’s annual neonatal deaths.3  Accurate risk assessment of postnatal death in very preterm infants can help caregivers and parents to decide whether and when to intervene in a pregnancy or to adjust postnatal care.4  Prediction models can be a helpful tool in performing such risk assessment.5,6 

In a 2011 systematic review, Medlock et al7  reported on the availability of >40 prediction models to assess the risk of neonatal mortality in infants born very preterm. When the review was conducted, no standardized tool for risk of bias (ROB) assessment of prediction models was yet available. The recently published Prediction study Risk of Bias Assessment Tool (PROBAST)8,9  provides the opportunity to formally assess the ROB of newly published models as well as of the models that were identified by Medlock et al.7  Second, since its publication in 2011, the review has not been updated, whereas many development and validation studies of prognostic models for the prediction of mortality in preterm infants have been published. Third, Medlock et al7  excluded external validation studies, thereby limiting the information on external validity of identified models and the possibility to perform any quantitative analyses.

Therefore, the aim with this study was to update the systematic review of Medlock et al7  on prognostic models for predicting postnatal mortality in liveborn very preterm infants (PICOTS [population, index model, comparator model, outcome, timing, setting] framework presented in Supplemental Table 5) and to extend it by the addition of a ROB assessment using PROBAST, inclusion of studies externally validating existing models, and meta-analysis of model performance measures of the models most often validated.

To obtain an update of the systematic review by Medlock et al7  from 2011, the same search strategy was used. Medline was searched for all articles from May 2010 (last search date of Medlock et al7 ) up to June 2020 by using a search that followed the general form: “prediction model AND preterm AND infant AND mortality.” The detailed search strategy is shown in Supplemental Table 6. The current review is reported according to the Preferred Reporting Items for Systematic Reviews and Meta-analyses statement.10 

Inclusion and exclusion criteria were similar to the criteria used in Medlock et al.7  The criteria following the critical appraisal and data extraction for systematic reviews of prediction modelling studies (CHARMS) checklist11  are given in Table 1. In brief, all prognostic models that aim to predict mortality at any time point in infants born at <32 weeks and/or <1500 g birth weight were included. Studies were classified as pre- or postsurfactant on the basis of the authors’ report of surfactant use, whereafter presurfactant studies were excluded. In studies in which surfactant use was not reported, surfactant was assumed to be in routine use after 1990.

TABLE 1

Criteria to Guide the Literature Search Following the CHARMS Checklist

ItemCriteria
Prognostic versus diagnostic prediction model Prognostic prediction models 
Intended scope of the review Purpose of the included models is to predict the probability of mortality, rather than to investigate a single specific risk factor. 
Type of prediction modeling studies Prediction model development studies with or without external validation and external model validation studies, in which researchers report at least 1 measure of model performance on preterm infants 
Target population to whom the prediction model applies Population of liveborn infants or admitted infants born at <32 wk gestational age and/or <1500 g birth wt; inclusion of prognostic models for gestational age– or birth wt–specific population; inclusion of studies in which researchers used a slightly broader definition of VLGA/BW or ELGA/BW; exclusion of models derived for a subpopulation with a specific disease or condition (eg, NEC); exclusion of models for general NICU population, unless separately reported performance for VLGA/BW infants; exclusion of studies from the presurfactant era 
Outcome to be predicted Outcome that the model predicts is mortality or survival. Studies in which authors report on models developed for combined outcome measures (eg, morbidity and mortality) are excluded. 
Time span of prediction Mortality at any time point 
Intended moment of using the model Models using liveborn infants as well as admitted infants will be included, so intended moment of using the model depends on this population. 
ItemCriteria
Prognostic versus diagnostic prediction model Prognostic prediction models 
Intended scope of the review Purpose of the included models is to predict the probability of mortality, rather than to investigate a single specific risk factor. 
Type of prediction modeling studies Prediction model development studies with or without external validation and external model validation studies, in which researchers report at least 1 measure of model performance on preterm infants 
Target population to whom the prediction model applies Population of liveborn infants or admitted infants born at <32 wk gestational age and/or <1500 g birth wt; inclusion of prognostic models for gestational age– or birth wt–specific population; inclusion of studies in which researchers used a slightly broader definition of VLGA/BW or ELGA/BW; exclusion of models derived for a subpopulation with a specific disease or condition (eg, NEC); exclusion of models for general NICU population, unless separately reported performance for VLGA/BW infants; exclusion of studies from the presurfactant era 
Outcome to be predicted Outcome that the model predicts is mortality or survival. Studies in which authors report on models developed for combined outcome measures (eg, morbidity and mortality) are excluded. 
Time span of prediction Mortality at any time point 
Intended moment of using the model Models using liveborn infants as well as admitted infants will be included, so intended moment of using the model depends on this population. 

NEC, necrotizing enterocolitis.

Titles and abstracts were independently screened by 2 authors each (P.E.v.B., P.A., W.O., and E.S.) and included if considered relevant. Before full text screening, all studies included in Medlock et al7  were added. In addition, Medlock was contacted for external validation studies that were excluded from their 2011 review, and these were added as well. Subsequently, full texts of the selected articles were screened in duplicate for final inclusion by 2 authors (P.E.v.B., P.A., W.O., and E.S.). Likewise, data extraction and ROB assessment were conducted in duplicate (P.E.v.B., P.A., W.O., and E.S.). In case of discrepancies, a third reviewer was involved to establish consensus.

Eligible articles were categorized into 2 groups: development studies and external validation studies, with separate data extraction forms for each group. Relevant items were extracted from each selected article by using the domains described in the CHARMS checklist, which included information on population, candidate predictors (only for development studies), outcome to be predicted, model development (only for development studies), and model performance.11  If an article described the development or external validation of multiple (existing) models, separate data extraction for each model was conducted for each model. Additionally, a ROB and applicability assessment were performed by using PROBAST.8,9  PROBAST is organized into 4 domains (participants, predictors, outcome, and analysis) and contains a total of 20 signaling questions to facilitate structured judgment of ROB. Signaling questions are answered as “yes,” “probably yes,” “no,” “probably no,” or “no information.” A domain in which all signaling questions are answered as “yes” or “probably yes” should be judged as low ROB, whereas a “no” or “probably no” on 1 or more questions in a domain flags the potential for bias. Insufficient information on 1 or more questions might result in unclear ROB as well as in low or high ROB, depending on judgment of the reviewers. Applicability of the study to the review question is assessed for the 3 domains participants, predictors, and outcomes and is rated as low, high, or unclear, with low concern regarding applicability if the review question and the study are a good match. To achieve consistent data extraction and ROB assessment, the standardized data extraction forms were piloted, modified, and finalized after discussion with all authors. The full list of final extracted items is available on request.

Details of the protocol for this systematic review were registered on PROSPERO (IDCRD42019141434).12  In the protocol, prediction of mortality 1 year after birth was registered as the maximum time span of prediction, but in the final review, no such maximum was used, meaning all articles on mortality were included, regardless of the time point at which mortality was predicted, giving a comprehensive overview of all available models. Consequently, the current review provides a comprehensive list of prediction models for mortality. Furthermore, in the protocol, it was stated that the aim of the article was to give a narrative overview, but in the final review, a quantitative analysis was also added. During the selection process, it became apparent that certain models had been externally validated frequently, thereby allowing quantitative (meta-)analysis of model performance.

Results of development and external validation studies were summarized by using descriptive statistics. Prognostic models that were externally validated in at least 5 studies were analyzed quantitatively by using random effects meta-analyses. If researchers of a study performed multiple external validations of 1 model, the validation with characteristics most similar to the development study was used for meta-analysis. Furthermore, meta-analysis was performed in each subgroup separately, on the basis of whether the study population was extremely low gestational age or birth weight (ELGA/BW) (defined as a gestational age <28 weeks or birth weight <1000 g) or very low gestational age or birth weight (VLGA/BW) (applicable to all infants that were not ELGA/BW). Subgroup analysis was performed if at least 5 studies were included in a subgroup. If no C-statistic was reported despite presentation of the receiver operating characteristic curve, WebPlotDigitizer was used to reconstruct the curve and to calculate the area under the curve (ie, the C-statistic). Logit transformation for the C-statistics was used during meta-analyses to overcome the poor statistical properties of the normal distribution when the C-statistic was close to 0 or 1 or when sample sizes were relatively small.13  Between-study heterogeneity was quantified by using the I2 statistic.14,15  A rough guide to interpretation of the I2 statistic is as follows: 0% to 40% might not be important; 30% to 60% may represent moderate heterogeneity; 50% to 90% may represent substantial heterogeneity; 75% to 100% is considerable heterogeneity.16  Furthermore, 95% confidence intervals (CIs) were calculated to indicate the precision of the summary performance estimate, and 95% prediction intervals (PIs) were calculated to provide boundaries on the likely performance in future model validation studies that are comparable to the studies included in the meta-analysis and thus can be seen as an indication of model generalizability.17  In addition, we calculated the probability that the C-statistic of the validated models will be larger than 0.70 and 0.80 in future validation studies. All analyses were performed in R version 3.5.2.

The initial search yielded 2159 unique articles, as shown in the flowchart in Fig 1. After title and abstract screening, 62 articles were provisionally selected for full text screening. All 41 articles identified by Medlock et al plus an additional 18 articles reporting on the external validation of existing models were added for full text screening. Out of those 121 articles, 60 articles, including 29 articles from Medlock et al, met the inclusion criteria and were selected for data extraction. The 30 articles from Medlock were excluded because the article did not concern an individual prediction model (n = 7), the study was performed in the presurfactant era (n = 9), the population was not applicable to our research question (n = 8), the outcome was not applicable to our research question (n = 2), no full text was available (n = 3), or the article was written in a foreign language (n = 1). From the 36 studies reporting on model development, 144 unique models were identified (Table 2). In the 34 studies reporting on external validation, 118 models were validated (Table 3). Of these 34 studies reporting on external validation, 23 studies were used for meta-analysis of the Clinical Risk Index for Babies (CRIB) (n = 15), CRIB-II (n = 12), Score for Neonatal Acute Physiology Perinatal Extension (SNAPPE) II (n = 6) and National Institute of Child Health and Human Development (NICHD) calculator (N = 5) scores.

FIGURE 1

Flowchart of the study selection process.

FIGURE 1

Flowchart of the study selection process.

Close modal
Table 2.

Studies reporting on model development.

ArticleNo. modelsDifferences between models caused by differences inInclusion criterionTiming of death% mortality
Pishevar, 202050  NA GA <27 Unclear 17% 
Rysavy, 202024  Random intercept hospital variation GA <26/ BW <1000 Discharge home 37% 
Podda, 201851  Modelling method GA <30 & BW <1500 Discharge home 12% 
Oltman, 201852  NA GA <26 7 days 20% 
Beltempo, 201853  Timing of death; predictors GA <29 7 days (3)/discharge home (3) 6%/14% 
Cnattingius, 201754  Predictors GA <31 28 days 7% 
Koller-Smith, 201755  Inclusion criterion GA <32/BW <1500 Discharge home 9%/7% 
Steurer, 201756  Age at inclusion GA <28 1 year NR 
Sullivan, 201657  Predictors BW <1500 Discharge home 9% 
Jeschke, 201658  NA BW <1500 180 days 11% 
Rüdiger, 201559  Timing of death; predictors GA <32 28 days (3)/discharge home (3) 11% 
Vincer, 201460  NA GA <30 28 days 12% 
Ravelli, 201461  NA GA <32 28 days 9% 
Wu, 201462  Predictors BW <1500 7 days 10% 
Manktelow, 201348  NA GA <32 Discharge home 8% 
Dong, 201263  NA BW <1500 Discharge home 29% 
Ambalavanan, 201264  Predictors BW <1000 Discharge home 6%-34% 
Lee, 201265  Timing of death; predictors GA <32 7 days (4)/discharge home (4) NR 
Phillips, 201166  NA BW <1500 Discharge home 12% 
Schenone, 201067  NA GA <26 & BW<1397 Discharge home 35% 
Cole, 201068  Predictors GA <31 Term age 16%-17% 
Gargus, 200969  NA BW <1000 18-22 months 34% 
Forsblad, 200870  Inclusion criterion GA =23/GA =24 180 days 22% 
Zupancic, 200771  Predictors BW <1500 Discharge home 19%/14% 
Forsblad, 200772  NA GA <25 180 days 22% 
Evans, 200673  Age at inclusion GA <32 & BW <1500 Discharge home 7% 
Marshall, 200574  NA BW <1500 Discharge home 27% 
Locatelli, 200575  NA BW <750 120 days 49% 
Ambalavanan, 200576  10 Age at inclusion; predictors BW <1000 Unclear NR 
Parry, 200319  NA GA <32 Discharge home NR 
Janota, 200177  Inclusion crit.; timing of death GA <31 & BW <1500 28 days (2)/discharge home (2) 11%/17% 
Ambalavanan, 200178  20 Predictors; modelling method BW <1000 Discharge home 34% 
Doyle, 200179  NA GA <27 5 years 33% 
Pollack, 200080  10 Predictors BW <1500 Discharge home 14% 
Draper, 199981  Inclusion criterion GA <32 Discharge home 20%/9% 
Zernikow, 199882  17 Predictors; modelling method GA <32 & BW <1500 28 days 9% 
ArticleNo. modelsDifferences between models caused by differences inInclusion criterionTiming of death% mortality
Pishevar, 202050  NA GA <27 Unclear 17% 
Rysavy, 202024  Random intercept hospital variation GA <26/ BW <1000 Discharge home 37% 
Podda, 201851  Modelling method GA <30 & BW <1500 Discharge home 12% 
Oltman, 201852  NA GA <26 7 days 20% 
Beltempo, 201853  Timing of death; predictors GA <29 7 days (3)/discharge home (3) 6%/14% 
Cnattingius, 201754  Predictors GA <31 28 days 7% 
Koller-Smith, 201755  Inclusion criterion GA <32/BW <1500 Discharge home 9%/7% 
Steurer, 201756  Age at inclusion GA <28 1 year NR 
Sullivan, 201657  Predictors BW <1500 Discharge home 9% 
Jeschke, 201658  NA BW <1500 180 days 11% 
Rüdiger, 201559  Timing of death; predictors GA <32 28 days (3)/discharge home (3) 11% 
Vincer, 201460  NA GA <30 28 days 12% 
Ravelli, 201461  NA GA <32 28 days 9% 
Wu, 201462  Predictors BW <1500 7 days 10% 
Manktelow, 201348  NA GA <32 Discharge home 8% 
Dong, 201263  NA BW <1500 Discharge home 29% 
Ambalavanan, 201264  Predictors BW <1000 Discharge home 6%-34% 
Lee, 201265  Timing of death; predictors GA <32 7 days (4)/discharge home (4) NR 
Phillips, 201166  NA BW <1500 Discharge home 12% 
Schenone, 201067  NA GA <26 & BW<1397 Discharge home 35% 
Cole, 201068  Predictors GA <31 Term age 16%-17% 
Gargus, 200969  NA BW <1000 18-22 months 34% 
Forsblad, 200870  Inclusion criterion GA =23/GA =24 180 days 22% 
Zupancic, 200771  Predictors BW <1500 Discharge home 19%/14% 
Forsblad, 200772  NA GA <25 180 days 22% 
Evans, 200673  Age at inclusion GA <32 & BW <1500 Discharge home 7% 
Marshall, 200574  NA BW <1500 Discharge home 27% 
Locatelli, 200575  NA BW <750 120 days 49% 
Ambalavanan, 200576  10 Age at inclusion; predictors BW <1000 Unclear NR 
Parry, 200319  NA GA <32 Discharge home NR 
Janota, 200177  Inclusion crit.; timing of death GA <31 & BW <1500 28 days (2)/discharge home (2) 11%/17% 
Ambalavanan, 200178  20 Predictors; modelling method BW <1000 Discharge home 34% 
Doyle, 200179  NA GA <27 5 years 33% 
Pollack, 200080  10 Predictors BW <1500 Discharge home 14% 
Draper, 199981  Inclusion criterion GA <32 Discharge home 20%/9% 
Zernikow, 199882  17 Predictors; modelling method GA <32 & BW <1500 28 days 9% 

NA = not applicable; GA = gestational age; BW = birth weight; NR = not reported. The number between brackets (N) in the column “Timing of death” represents the number of models with this timing of death.

Table 3.

Studies reporting on external validation.

Name modelArticleNo. studiesNo. modelsC-statistic original articleC-statistics external validations (range)
CRIB Intern. Network 199318  1519,23,66,80,8393  16 0.90 Presented in meta-analysis (fig. 5A) 
CRIB-II Parry 200319  1223,62,66,9092,9499  18 0.92 Presented in meta-analysis (fig. 5B) 
SNAPPE-II Richardson 200120  671,89,90,93,96,99  0.85 Presented in meta-analysis (fig. 5C) 
NICHD Tyson 200821  524,51,100102  13 0.75 Presented in meta-analysis (fig. 5D) 
Apgar Apgar 195322  554,59,98,103,104  21 NA Presented in figure 5E 
      
SNAP-II Richardson 200120  353,71,99  NA 0.68–0.82 
SNAPPE Richardson 1993105  380,83,89  0.92 0.79–0.93 
SNAP Richardson 1993106  183  NA 0.82 
Other models Podda 201851  151  0.91 0.77–0.91 
      
 BW+GA 151  NA 0.72–0.89 
 Manktelow 201348 151  0.86 0.69–0.86 
 Zupancic 200771 151  0.85 0.76–0.90 
 Gray 1992107  162 NA 0.91–0.96 
 Draper 199981  14 NA 0.82–0.92 
 Maier 1997108  187 0.86 0.82 
 Horbar 199380  180  0.82 0.87 
 Rysavy 2020-124  124  NA 0.73 
 Rysavy 2020-224  124  NA 0.74 
Name modelArticleNo. studiesNo. modelsC-statistic original articleC-statistics external validations (range)
CRIB Intern. Network 199318  1519,23,66,80,8393  16 0.90 Presented in meta-analysis (fig. 5A) 
CRIB-II Parry 200319  1223,62,66,9092,9499  18 0.92 Presented in meta-analysis (fig. 5B) 
SNAPPE-II Richardson 200120  671,89,90,93,96,99  0.85 Presented in meta-analysis (fig. 5C) 
NICHD Tyson 200821  524,51,100102  13 0.75 Presented in meta-analysis (fig. 5D) 
Apgar Apgar 195322  554,59,98,103,104  21 NA Presented in figure 5E 
      
SNAP-II Richardson 200120  353,71,99  NA 0.68–0.82 
SNAPPE Richardson 1993105  380,83,89  0.92 0.79–0.93 
SNAP Richardson 1993106  183  NA 0.82 
Other models Podda 201851  151  0.91 0.77–0.91 
      
 BW+GA 151  NA 0.72–0.89 
 Manktelow 201348 151  0.86 0.69–0.86 
 Zupancic 200771 151  0.85 0.76–0.90 
 Gray 1992107  162 NA 0.91–0.96 
 Draper 199981  14 NA 0.82–0.92 
 Maier 1997108  187 0.86 0.82 
 Horbar 199380  180  0.82 0.87 
 Rysavy 2020-124  124  NA 0.73 
 Rysavy 2020-224  124  NA 0.74 

In total, within 34 studies, 118 external validations were performed. The column named “article” reflects the original paper in which the model was published. The number of studies reflect the number of papers that were published presenting external validation of the model. A study might perform external validation of more than one model, therefore the column total of number of studies exceeds 34. The number of models reflect the number of external validations of the model, which might exceed the number of studies due to multiple external validations of a model in one study when the model was applied in for example different populations or with different time spans of the outcome. NA = not available.

Table 4 shows key characteristics of the study design, sample size, predictors, outcome, modeling method, and predictive performance of the included model development studies. The majority of the included studies originated from registry or retrospective cohorts (n = 32, 89%). Of all 144 models, 60 (42%) models used birth weight as their inclusion criterion, 52 (36%) models used gestational age as their inclusion criterion, and 32 (22%) models used both birth weight and gestational age as inclusion criterion. The number of participants used for developing the models varied from 57 to 29 180 (median 828), and the number of events ranged between 16 and 4448 (median 171). The median mortality rate was 13%, with an interquartile range of 9% to 28%. The number of events per variable (EPV) could be calculated for 120 (83%) models, ranged from 0 to 426 (median 10), and was <10 for 51% of the models. Although the majority of prediction models were focused on mortality during hospital admission (n = 72, 50%) and within 28 days after birth (n = 31, 22%), 7 other outcome measures were identified, including mortality before term age, or within 7, 120, 180 days, 1 year, 18 to 22 months, and 5 years. The C-statistic varied from 0.70 to 0.95, with a similar range in subgroups for VLGA/BW and ELGA/BW. For 36 (25%) models, both discrimination and calibration were reported, with 10 (25%) models presenting calibration by using a calibration plot and the majority presenting the resulting P value of a Hosmer–Lemeshow test (n = 35, 88%). In total, 84 of the 144 models (58%) were internally validated, most often by using a random split of the data into development and validation data sets (n = 42, 50%) or cross-validation (n = 18, 21%). For 64 (44%) models, insufficient information was presented to allow calculation of individual risks.

Table 4.

Characteristics of the included model development studies and external validation studies.

ItemCategoriesDevelopment studiesExternal validation studies
Per study, total n = 44 N=36 N=34 
Study design and study population  
Years of publication (min-max) 1998-2020 1994-2020 
Number of models per study 2 (1–6) 2 (1–4) 
Data source Registry 20 (56) 13 (38) 
 Retrospective cohort 12 (33) 10 (29) 
 Prospective cohort 2 (5.6) 9 (27) 
 Other 1 (2.8) 1 (2.9) 
 Unclear 1 (2.8) 1 (2.9) 
Country* Europe 16 (44) 14 (41) 
 North America 17 (47) 7 (21) 
 Oceania 3 (8.3) 4 (12) 
 Asia 3 (8.3) 8 (24) 
 South America 1 (2.8) 2 (5.9) 
 Africa 0 (0.0) 1 (2.9) 
Per model, total n = 154 N=144 N=118 
Inclusion criteria   
Birth weight only  60 (42) 40 (34) 
 ≤1000g  36 (60)  6 (15) 
 ≤1500g  24 (40)  34 (85) 
Gestational age only  52 (36) 51 (43) 
 ≤28 weeks  12 (23)  12 (24) 
 ≤32 weeks  40 (77)  39 (76) 
Birth weight and gestational age 32 (22) 27 (23) 
 ≤28 weeks/≤1000g  1 (9.4)  13 (48) 
 ≤32 weeks/≤1500g  29 (91)  14 (52) 
Sample size   
Number of participants  828 (476–5,745) 842 (267–3,378) 
Number of events  171 (53-411) 81 (43-197) 
 Not reported 18 (13) 27 (23) 
EPV  10 (4-68) NA 
 EPV <10  61 (51)  
 EPV 10-20  6 (4.2)  
 EPV >20  53 (44)  
 Not possible to calculate 24 (17)  
Predictors NA 
No. candidate predictors  12 (6-22)  
 Not reported 6 (4.2)  
No. predictors in final model  7 (4-12)  
Outcome   
Mortality rate  13% (9%–28%) 12% (10%–23%) 
 Not reported 26 (18) 27 (23) 
Time span of outcome Discharge home 72 (50) 86 (73) 
 28 postnatal days 31 (22) 14 (12) 
 7 postnatal days 11 (7.6) 9 (7.6) 
 Term age 6 (4.2) 0 (0.0) 
 1 year of age 6 (4.1) 0 (0.0) 
 180 postnatal days 4 (2.8) 0 (0.0) 
 18-22 postnatal months 1 (0.7) 0 (0.0) 
 120 postnatal days 1 (0.7) 0 (0.0) 
 5 years of age 1 (0.7) 0 (0.0) 
 2 years of age 0 (0.0) 1 (0.8) 
 2-3 years’ corrected age 0 (0.0) 5 (4.2) 
 Unclear 11 (7.6) 3 (2.5) 
Modelling method and model presentation  NA 
Modelling method Logistic regression 102 (71)  
 Neural networks 32 (22)  
 Other 4 (2.8)  
 Unclear 6 (4.2)  
Model presentation Final model presented, including intercept 45 (31)  
 Final model presented without intercept 19 (13)  
 Alternative presentation 16 (11)  
 Insufficient information to allow individual risk calculation 64 (44)  
Predictive performance   
Discrimination c-statistic range 0.70 – 0.95 0.56–0.97 
  ≤28 weeks/≤1000g  0.71 – 0.89  0.56–0.95 
  ≤32 weeks/≤1500g  0.70 – 0.95  0.67–0.97 
 Not reported 11 (7.6) 9 (7.6) 
Calibration* Reported 40 (28) 31 (26) 
 Hosmer-Lemeshow  35 (88)  20 (65) 
 Calibration plot  10 (25)  14 (45) 
 Observed-expected ratio  2 (5.0)  0 (0.0) 
Both discrimination and calibration reported 36 (25) 26 (22) 
Internal validation  NA 
Internally validated models  84 (58)  
Method of validation* Random split of data  42 (50)  
 Cross-validation  18 (21)  
 Non-random split of data  20 (24)  
 Resampling  3 (3.6)  
 Other  2 (2.4)  
ItemCategoriesDevelopment studiesExternal validation studies
Per study, total n = 44 N=36 N=34 
Study design and study population  
Years of publication (min-max) 1998-2020 1994-2020 
Number of models per study 2 (1–6) 2 (1–4) 
Data source Registry 20 (56) 13 (38) 
 Retrospective cohort 12 (33) 10 (29) 
 Prospective cohort 2 (5.6) 9 (27) 
 Other 1 (2.8) 1 (2.9) 
 Unclear 1 (2.8) 1 (2.9) 
Country* Europe 16 (44) 14 (41) 
 North America 17 (47) 7 (21) 
 Oceania 3 (8.3) 4 (12) 
 Asia 3 (8.3) 8 (24) 
 South America 1 (2.8) 2 (5.9) 
 Africa 0 (0.0) 1 (2.9) 
Per model, total n = 154 N=144 N=118 
Inclusion criteria   
Birth weight only  60 (42) 40 (34) 
 ≤1000g  36 (60)  6 (15) 
 ≤1500g  24 (40)  34 (85) 
Gestational age only  52 (36) 51 (43) 
 ≤28 weeks  12 (23)  12 (24) 
 ≤32 weeks  40 (77)  39 (76) 
Birth weight and gestational age 32 (22) 27 (23) 
 ≤28 weeks/≤1000g  1 (9.4)  13 (48) 
 ≤32 weeks/≤1500g  29 (91)  14 (52) 
Sample size   
Number of participants  828 (476–5,745) 842 (267–3,378) 
Number of events  171 (53-411) 81 (43-197) 
 Not reported 18 (13) 27 (23) 
EPV  10 (4-68) NA 
 EPV <10  61 (51)  
 EPV 10-20  6 (4.2)  
 EPV >20  53 (44)  
 Not possible to calculate 24 (17)  
Predictors NA 
No. candidate predictors  12 (6-22)  
 Not reported 6 (4.2)  
No. predictors in final model  7 (4-12)  
Outcome   
Mortality rate  13% (9%–28%) 12% (10%–23%) 
 Not reported 26 (18) 27 (23) 
Time span of outcome Discharge home 72 (50) 86 (73) 
 28 postnatal days 31 (22) 14 (12) 
 7 postnatal days 11 (7.6) 9 (7.6) 
 Term age 6 (4.2) 0 (0.0) 
 1 year of age 6 (4.1) 0 (0.0) 
 180 postnatal days 4 (2.8) 0 (0.0) 
 18-22 postnatal months 1 (0.7) 0 (0.0) 
 120 postnatal days 1 (0.7) 0 (0.0) 
 5 years of age 1 (0.7) 0 (0.0) 
 2 years of age 0 (0.0) 1 (0.8) 
 2-3 years’ corrected age 0 (0.0) 5 (4.2) 
 Unclear 11 (7.6) 3 (2.5) 
Modelling method and model presentation  NA 
Modelling method Logistic regression 102 (71)  
 Neural networks 32 (22)  
 Other 4 (2.8)  
 Unclear 6 (4.2)  
Model presentation Final model presented, including intercept 45 (31)  
 Final model presented without intercept 19 (13)  
 Alternative presentation 16 (11)  
 Insufficient information to allow individual risk calculation 64 (44)  
Predictive performance   
Discrimination c-statistic range 0.70 – 0.95 0.56–0.97 
  ≤28 weeks/≤1000g  0.71 – 0.89  0.56–0.95 
  ≤32 weeks/≤1500g  0.70 – 0.95  0.67–0.97 
 Not reported 11 (7.6) 9 (7.6) 
Calibration* Reported 40 (28) 31 (26) 
 Hosmer-Lemeshow  35 (88)  20 (65) 
 Calibration plot  10 (25)  14 (45) 
 Observed-expected ratio  2 (5.0)  0 (0.0) 
Both discrimination and calibration reported 36 (25) 26 (22) 
Internal validation  NA 
Internally validated models  84 (58)  
Method of validation* Random split of data  42 (50)  
 Cross-validation  18 (21)  
 Non-random split of data  20 (24)  
 Resampling  3 (3.6)  
 Other  2 (2.4)  

Numbers are presented as N (%) or median (Q1-Q3), unless stated otherwise. If missing/unclear not reported, it means characteristic was available for all studies/models. If percentage calculated relative to specific characteristic/category instead of per study/model, numbers are indented. * percentages do not add up to 100%, because studies/models might belong to more than one category. EPV = events per variable. NA = not applicable.

Figure 2 summarizes all predictors included in the final models. Variables concerning size and maturity of the infant and variables concerning birth and delivery were most often included (in 77% and 64% of the final models, respectively).

FIGURE 2

Predictors included in the final development models. The bars reflect the percentage of the 153 models including this predictor; the number at the end of each bar reflects the absolute number of models including this predictor. The upper bar of each category shows the total number and percentage of models including a predictor in this category; subsequently, the categories are subdivided into the lighter-color bars showing the specific predictors in a certain category. Models might have included >1 predictor of a category. BPD, bronchopulmonary disease; CPAP, continuous positive airway pressure; Fio2, fraction of inspired oxygen; GA, gestational age; IVH, intraventricular hemorrhage; NEC, necrotizing enterocolitis; NICHD, Eunice Kennedy Shriver National Institute of Child Health and Human Development; PPHN, persistent pulmonary hypertension of the newborn; PPROM, preterm prelabor rupture of membranes; SNAP, Score for Neonatal Acute Physiology; TRIPS-II, Transport Risk Index of Physiologic Stability, version II.

FIGURE 2

Predictors included in the final development models. The bars reflect the percentage of the 153 models including this predictor; the number at the end of each bar reflects the absolute number of models including this predictor. The upper bar of each category shows the total number and percentage of models including a predictor in this category; subsequently, the categories are subdivided into the lighter-color bars showing the specific predictors in a certain category. Models might have included >1 predictor of a category. BPD, bronchopulmonary disease; CPAP, continuous positive airway pressure; Fio2, fraction of inspired oxygen; GA, gestational age; IVH, intraventricular hemorrhage; NEC, necrotizing enterocolitis; NICHD, Eunice Kennedy Shriver National Institute of Child Health and Human Development; PPHN, persistent pulmonary hypertension of the newborn; PPROM, preterm prelabor rupture of membranes; SNAP, Score for Neonatal Acute Physiology; TRIPS-II, Transport Risk Index of Physiologic Stability, version II.

Close modal

Figure 3 shows a summary of ROB and applicability for all models. Across nearly all models, ROB related to outcome and predictors was considered low. ROB related to the participants’ domain was high in 14% of the models because of inappropriate inclusion and exclusion criteria of participants; for example, including <50% of the eligible infants or exclusion of infants that died later than the prediction horizon. By contrast, ROB related to the statistical analysis was high in every single model, mostly because of inappropriate handling of missing data (100%), not presenting all relevant performance measures (96%), a low number of participants with the outcome in relation to the number of candidate predictors (51%), and no correction for overfitting when indicated (94%). In summary, the overall ROB was high across all models.

FIGURE 3

ROB and applicability assessment of developed models by using PROBAST.

FIGURE 3

ROB and applicability assessment of developed models by using PROBAST.

Close modal

The concern of the model not being applicable to our research question was high in 29% of the models, mainly because of inclusion of participants different from those in our research question (eg, studies excluding outborn infants).

Table 4, shows key characteristics of the study design, sample size, outcome, and predictive performance of the included external validation studies. Although in 34 articles, 118 external validations were performed, the majority of the 144 developed models (n = 135, 94%) had not been externally validated. For some models, an external validation study was included in this review, but not the original development study, because the model was developed in the presurfactant era or in a population not applicable to this review but was externally validated in a time period or population that was applicable. In total, 18 different models were externally validated (Table 3). Median mortality rate was 12% (interquartile range: 10%–23%), which was comparable to the mortality rate in the development studies. The C-statistic was reported for 109 (92%) models, with a range of 0.56 to 0.97. For 26 (22%) models, both discrimination and calibration were reported, with 14 (45%) models presenting calibration using a calibration plot and the majority presenting the resulting P value of a Hosmer–Lemeshow test (n = 20, 65%). Figure 4 shows a summary of ROB and applicability by domain. Across almost all models, ROB related to outcome, predictors, and participants was low. By contrast, ROB related to the analysis was high in almost all models, mostly because of inappropriate handling of missing data (95%) and not presenting a calibration plot (92%). This resulted in an overall high ROB for the validation of 114 (97%) models.

FIGURE 4

ROB and applicability assessment of externally validated models by using PROBAST.

FIGURE 4

ROB and applicability assessment of externally validated models by using PROBAST.

Close modal

The CRIB18  was validated most (n = 15), followed by the CRIB-II19  (n = 12), the SNAPPE-II20  (n = 6), the NICHD model21  (n = 5) and the Apgar score22  (n = 5) (Table 3). However, the Apgar score was unsuitable for meta-analysis because of substantial heterogeneity across the external validations, caused by differences in moment of prediction, prediction horizon, and type of Apgar score (conventional, specified or expanded score). Results from the studies on external validation of the Apgar score are presented without meta-analysis in Fig 5E.

FIGURE 5

A, Meta-analysis for the CRIB score, showing a forest plot with study specific C-statistics, the average C-statistic (Summary Estimate) and the prediction interval. The first row of the table shows characteristics of the original development paper on CRIB score18. The CRIB score includes six parameters: birth weight, gestation, congenital malformations, maximum base excess, minimum appropriate fraction of inspired oxygen and maximum appropriate fraction of inspired oxygen in first 12h. Although 16 studies externally validated the CRIB score, 1 study could not be used in the meta-analysis because the c-statistic was not presented and 1 study could not be used in the meta-analyses because the 95% confidence interval could not be calculated due to missing information on the number of outcomes, resulting in 14 studies used for the meta-analysis. Subgroup analyses was not applicable, as all studies were performed in a VLBW/GA population. B, Meta-analysis for the CRIB-II score, showing a forest plot with study specific C-statistics, the average C-statistic (Summary Estimate) and the prediction interval. The first row of the table shows characteristics of the original development paper on CRIB-II score.19 The CRIB-II score includes five parameters: sex, birth weight, gestation, temperature at admission and base excess. of subgroup analyses in VLBW/GA infants is shown in Figure 6a – online only. C, Meta-analysis for the SNAPPE-II score, showing a forest plot with study specific C-statistics, the average C-statistic (Summary Estimate) and the prediction interval. The first row of the table shows characteristics of the original development paper on SNAPPE-II score.20 The SNAPPE -II score includes nine parameters: mean blood pressure, lowest temperature, PO2FiO2 ratio, lowest serum pH, multiple seizures, urine output, birth weight, small for gestational age and Apgar score at 5 minutes. D, Meta-analysis for the NICHD score, showing a forest plot with study specific C-statistics, the average C-statistic (Summary Estimate) and the prediction interval. The first row of the table shows characteristics of the original development paper on NICHD score.21 The NICHD score includes five parameters: gestational age, birth weight, infant sex, singleton birth and antenatal steroids. E, Results from all external validations of the Apgar score, without meta-analysis. Conventional = original scoring system as introduced by Virginia Apgar in 195322, including five items: heart rate, respiratory effort, reflex irritability, muscle tone and color.; Specified = scoring the items of the conventional Apgar independent of the requirements need to achieve condition; Expanded = scoring the interventions that are required to achieve a condition; Combined = scoring both the Specified and Expanded Apgar scores.

FIGURE 5

A, Meta-analysis for the CRIB score, showing a forest plot with study specific C-statistics, the average C-statistic (Summary Estimate) and the prediction interval. The first row of the table shows characteristics of the original development paper on CRIB score18. The CRIB score includes six parameters: birth weight, gestation, congenital malformations, maximum base excess, minimum appropriate fraction of inspired oxygen and maximum appropriate fraction of inspired oxygen in first 12h. Although 16 studies externally validated the CRIB score, 1 study could not be used in the meta-analysis because the c-statistic was not presented and 1 study could not be used in the meta-analyses because the 95% confidence interval could not be calculated due to missing information on the number of outcomes, resulting in 14 studies used for the meta-analysis. Subgroup analyses was not applicable, as all studies were performed in a VLBW/GA population. B, Meta-analysis for the CRIB-II score, showing a forest plot with study specific C-statistics, the average C-statistic (Summary Estimate) and the prediction interval. The first row of the table shows characteristics of the original development paper on CRIB-II score.19 The CRIB-II score includes five parameters: sex, birth weight, gestation, temperature at admission and base excess. of subgroup analyses in VLBW/GA infants is shown in Figure 6a – online only. C, Meta-analysis for the SNAPPE-II score, showing a forest plot with study specific C-statistics, the average C-statistic (Summary Estimate) and the prediction interval. The first row of the table shows characteristics of the original development paper on SNAPPE-II score.20 The SNAPPE -II score includes nine parameters: mean blood pressure, lowest temperature, PO2FiO2 ratio, lowest serum pH, multiple seizures, urine output, birth weight, small for gestational age and Apgar score at 5 minutes. D, Meta-analysis for the NICHD score, showing a forest plot with study specific C-statistics, the average C-statistic (Summary Estimate) and the prediction interval. The first row of the table shows characteristics of the original development paper on NICHD score.21 The NICHD score includes five parameters: gestational age, birth weight, infant sex, singleton birth and antenatal steroids. E, Results from all external validations of the Apgar score, without meta-analysis. Conventional = original scoring system as introduced by Virginia Apgar in 195322, including five items: heart rate, respiratory effort, reflex irritability, muscle tone and color.; Specified = scoring the items of the conventional Apgar independent of the requirements need to achieve condition; Expanded = scoring the interventions that are required to achieve a condition; Combined = scoring both the Specified and Expanded Apgar scores.

Close modal

At meta-analyses, estimated approximate average C-statistics across the included studies were 0.88 (95% CI: 0.83–0.91, I2 = 91%) for the CRIB score (Fig 5A), 0.87 (95% CI: 0.81–0.92, I2 = 94%) for the CRIB-II score (Fig 5B), 0.86 (95% CI: 0.78–0.92, I2 = 90%) for the SNAPPE-II score (Fig 5C) and 0.71 (95%CI 0.61–0.79, i2 = 84%) for the NICHD model (Fig 5D). The 95% PIs were 0.63–0.97, 0.59–0.97, and 0.60–0.96 for CRIB, CRIB-II, SNAPPE-II and NICHD scores, respectively. Based on the forest plot in Fig 5A, the study of Asker et al was an outlier in comparison with other studies and as such may be great source of heterogeneity. Exclusion of this study lowered the I2 to 69% and improved the 95% PI to 0.80–0.94. The probabilities that the scores would achieve a discrimination >0.7 and >0.8 in future validation studies were 93% and 78%, respectively, for the CRIB score, 92% and 78%, respectively, for the CRIB-II score, 93% and 78% for the SNAPPE-II score and 57% and 10% respectively for the NICHD score. A calibration plot was presented for 2 external validation studies of the CRIB score for 1 external validation study of the CRIB-II score and for 1 external validation study of the NICHD score, showing poor and good calibration for the CRIB score19,23  good calibration for the CRIB-II score23  and moderate calibration for the NICHD score.24  Subgroup analyses in a VLGA/BW population for the CRIB-II and SNAPPE-II scores showed similar results (Supplemental Fig 6).

In this systematic review, we summarized all available prognostic models for mortality prediction in liveborn very preterm infants. In total, 144 models from 36 studies on model development and 118 models from 34 studies on external validation were identified, revealing that there is an abundance of mortality risk prediction models for very preterm infants. ROB assessment showed high ROB in the majority of the models, most often because of inadequate (reporting of the) analysis. Furthermore, internal and external validation of these models is often lacking.

Four main identified methodologic flaws within the analysis domain need addressing. First, at development, 61 (51%) models had a number of participants with the outcome in relation to the number of candidate predictors (EPV) <10, resulting in high ROB according to PROBAST because of the risk of overfitting. With such a small EPV, it is recommended to account for overfitting and optimism to decrease the ROB,9  but this was scarcely done in the included models. Historically, and as such in PROBAST, sample size consideration have been based on the EPV; however, it has been recently suggested to also include the total number of participants, the outcome incidence in the study population, and the expected predictive performance.25 

Second, none of the included studies handled participants with missing data correctly according to PROBAST. Use of missing data as an exclusion criterion or excluding enrolled participants with any missing data from the analysis leads to biased associations and model performance.2635  Therefore, multiple imputation is recommended to handle missing data because it leads to the least biased results with correct SEs and P values.2631,3335 

Third, information on both calibration and discrimination was presented for only 25% of the models. Calibration was most often assessed by using a Hosmer–Lemeshow test, whereas this statistical test indicates neither the presence nor the magnitude of any miscalibration and is known to be dependent on the sample size.9  Therefore, it is recommended to present a calibration plot instead, which unfortunately was hardly ever reported in the included articles.

Fourth, 84 of the 144 models (58%) were internally validated, most often by using a random split of the data into development and validation data sets (n = 42, 50%). However, this has been shown to be an insufficient way of data use and as an inadequate way to measure optimism.36,37  Instead, bootstrapping or cross-validation is recommended to quantify overfitting of the developed model and optimism in its predictive performance.38  Furthermore, the majority of the studies performing internal validation seemingly failed to replicate the exact model development procedure and thus may still underestimate the actual optimism and thus overestimate the actual performance of their model.39,40 

Methodologic flaws identified within the ROB assessment of the participants domain included using a nested case control design without correction for baseline risk, inclusion of <50% of the eligible infants, and exclusion of all infants who died after 7 days. Within the applicability assessment, issues raised included exclusion of outborn infants, a study conducted in a high altitude NICU, and exclusion of all infants who died within 72 hours.

This review reveals that development of new prediction models for mortality in preterm infants is an ongoing practice. However, many models are of unknown value for daily practice because of lack of validation. Therefore, future emphasis should be shifted toward external validation and adaption of existing prediction models, which applies to a broader field of prediction modeling and has been stated before.41,42  Ideally, these validation studies are performed by using prospectively collected data because validation studies have higher potential for ROB when participant data are from existing sources with data collected for a purpose other than validation or updating of prediction models. Consecutively, impact studies are warranted to quantify the effect of a prognostic model on physician’s behavior and patient outcome.5 

In the majority of development studies, participants, predictors, and outcome were described sufficiently clear and did not introduce bias. Contrastingly, high ROB occurred in the analysis section of practically all studies because of inappropriate analysis methods or omission of important statistical considerations. Moreover, for almost 40% of the models, information to allow others to correctly apply the models in new individuals (ie, information on predictors and coefficients of the final developed model including intercept) was insufficient. Improvements in studies on mortality risk prediction in very preterm infants are needed and can be achieved from better (reporting of) analyses. A first step in that direction would be better adherence to the TRIPOD (transparent reporting of a multivariable prediction model for individual prognosis or diagnosis) statement43  and consideration of PROBAST.8,9 

This review showed that variables concerning size and maturity of the infant (in 77% of the models), variables concerning birth and delivery (64%), and maternal variables (41%) were most often included. Specifically, gestational age, Apgar score, birth weight, sex, multiplicity, antenatal corticosteroids, and ethnicity were used as predictors in >40 models. This reveals the importance of these variables in mortality risk prediction in preterm infants. Nevertheless, because the vast majority of these models were considered of low quality and calibration of these models was not reported, their actual in value in mortality risk prediction remains unclear.

At meta-analysis of the C-statistic, the CRIB, CRIB-II, and SNAPPE-II, all revealed excellent performance (C-statistic >0.85), comparable to a recently published meta-analysis.44  However, considerable heterogeneity across the included studies was found (I2 ≥ 90% for all models), which can originate from differences between study populations and study designs.45,46  Important characteristics of the included studies, including inclusion criteria, moment of prediction, and time span of the outcome, are shown in Fig 5 A–C and indicate substantial differences in study population. However, it is difficult to draw conclusions on the defining sources of heterogeneity, meaning further research will be necessary. Although the 3 models revealed great discriminative performance, information on calibration is largely lacking. To provide a complete and accurate judgment of the performance of these models, information on calibration, ideally by providing a calibration plot, will be needed. Compared to these three models, the NICHD model lagged in its discriminative performance (average c-statistics 0.71), which may be related to the study population of extremely preterm infants.

Health care decisions for individual patients should be informed by using the best available evidence. Systematic reviews summarizing large amounts of information are very powerful tools to facilitate clinical decision-making but also to identify gaps in our knowledge or room for improvement. In our article, we clearly show a lack of evidence regarding the external validity of the majority of models, poor (reporting of) analyses, and absence of calibration plots in the majority of the models. The abundant availability of insufficiently validated models is not useful for clinical practice.47  In our systematic review, the extensive ROB assessment revealed that the model published by Manktelow et al48  had the highest quality among all 144 developed models. Furthermore, the external validity of the CRIB, CRIB-II SNAPPE-II and NICHD models has been assessed often and show good discriminative performance.1821  Unfortunately, information on their calibration is still lacking. Based on the currently available evidence, we consider these 5 prediction models to have the highest potential for use in clinical practice. A first step would be to (again) externally validate these models, but now also focus on calibration. Presenting discrimination will be sufficient when the aim is to distinguish high and low risk populations, but for individual prediction information on calibration is essential. During such external validation, the original model may require an update, thereby addressing the potential issue of miscalibration associated with differences in mortality rate between the development and validation population. Ideally, such external validations are followed by impact studies to quantify the effect of a prognostic model on physician’s behavior and patient outcome.

Since Medlock et al7  published their systematic review of models for the prediction of mortality in very premature infants in 2011, only 1 systematic review in Spanish has been published.49  Large improvements of our review compared with both existing reviews are (1) the use of a standard tool for ROB assessment, which is an essential step in any systematic review8,9 ; (2) the inclusion of articles externally validating models and meta-analysis of the models most often validated, giving additional insight in their quality and value for clinical practice; and (3) the vast amount of newly published prediction models since 2011, showing the need for an update to provide a comprehensive overview of prediction models for mortality in very preterm infants.

However, this study has several limitations, too. First, for some models, an external validation study was included, but not the development study, because studies developed in the presurfactant era or in the general NICU population that included very preterm born infants but also infants born >32 weeks’ gestational age were excluded. Second, PROBAST is a recently developed tool using contemporary expertise and knowledge, which was applied to models of which some were developed and published decades ago. Information currently necessary for assessment of bias (eg, calibration) was often not reported, leading to high ROB in the analysis domain across all models. Third, the majority of the included studies originated from developed countries, making this review less applicable to developing countries. In future research, validating prediction models in developing countries might require more attention because there is much to be gained with respect to postnatal mortality in preterm infants.

There is an abundance of mortality risk prediction models for very preterm infants. Improvement in studies on mortality risk prediction in very preterm infants can be achieved from improved (reporting of) analyses. Many of the models are of unknown value for daily practice because of lack of external validation. Meta-analysis on the widely used CRIB, CRIB-II, SNAPPE-II and NICHD scores revealed good discriminative performance of these scores, but calibration is currently unknown. Instead of developing additional mortality prediction models for preterm infants, the emphasis should be shifted toward external validation and consecutive adaption of the existing prediction models for mortality in preterm infants.

Dr van Beek designed the study, performed the literature search, conducted the study selection process, data extraction, and critical appraisal, analyzed the data, and wrote the first draft of the manuscript; Drs Andriessen and Onland conducted the study selection process, data extraction, and critical appraisal, provided critical feedback, and helped shape the research, analysis, and manuscript; Dr Schuit designed the study, conducted the study selection process, data extraction, and critical appraisal, provided critical feedback, helped shape the research, analysis, and manuscript, and supervised the project; and all authors approved the final manuscript as submitted and agree to be accountable for all aspects of the work.

FINANCIAL DISCLOSURE: The authors have indicated they have no financial relationships relevant to this article to disclose.

CHARMS

critical appraisal and data extraction for systematic reviews of prediction modelling studies

CI

confidence interval

CRIB

clinical risk index for babies

ELGA/BW

extremely low gestational age or birth weight

EPV

events per variable

PI

prediction interval

PROBAST

prediction model risk of bias assessment tool

ROB

risk of bias

SNAPPE

score for neonatal acute physiology perinatal extension

VLGA/BW

very low gestational age or birth weight

1
Tucker
J
,
McGuire
W
.
Epidemiology of preterm birth
.
BMJ.
2004
;
329
(
7467
):
675
678
.
2
WHO. Preterm birth. fact sheet: Reviewed february
2018
. http://www.who.int/news-room/fact-sheets/detail/preterm- birth. Accessed December 20, 2019.
3
Blencowe
H
,
Cousens
S
,
Oestergaard
MZ
et al
National, regional, and worldwide estimates of preterm birth rates in the year 2010 with time trends since 1990 for selected countries: A systematic analysis and implications
.
Lancet.
2012
;
379
(
9832
):
2162
2172
.
4
Schuit
E
,
Hukkelhoven
CW
,
Manktelow
BN
et al
Prognostic models for stillbirth and neonatal death in very preterm birth: A validation study
.
Pediatrics.
2012
;
129
(
1
):
120
.
5
Moons
KG
,
Altman
DG
,
Vergouwe
Y
,
Royston
P
.
Prognosis and prognostic research: Application and impact of prognostic models in clinical practice
.
BMJ.
2009
;
338
:
b606
.
6
Leushuis
E
,
van der Steeg
J W
,
Steures
P
et al
Prediction models in reproductive medicine: A critical appraisal
.
Hum Reprod Update.
2009
;
15
(
5
):
537
552
.
7
Medlock
S
,
Ravelli
AC
,
Tamminga
P
,
Mol
BW
,
Abu-Hanna A. Prediction of mortality in very premature infants: A systematic review of prediction models
.
PLoS One.
2011
;
6
(
9
):
e23441
.
8
Wolff
RF
,
Moons
KGM
,
Riley
RD
et al
PROBAST: A tool to assess the risk of bias and applicability of prediction model studies
.
Ann Intern Med.
2019
;
170
(
1
):
51
58
.
9
Moons
KGM
,
Wolff
RF
,
Riley
RD
et al
PROBAST: A tool to assess risk of bias and applicability of prediction model studies: Explanation and elaboration
.
Ann Intern Med.
2019
;
170
(
1
):
W1
-W33.
10
Liberati
A
,
Altman
DG
,
Tetzlaff
J
et al
The PRISMA statement for reporting systematic reviews and meta-analyses of studies that evaluate healthcare interventions: Explanation and elaboration
.
BMJ.
2009
;
339
:
b2700
.
11
Moons
KG
,
de Groot
JA
,
Bouwmeester
W
et al
Critical appraisal and data extraction for systematic reviews of prediction modelling studies: The CHARMS checklist
.
PLoS Med.
2014
;
11
(
10
):
e1001744
.
12
van Beek
P
,
Andriessen
P
,
Onland
W
,
Schuit
E. Prognostic models for
mortality in very preterm infants. PROSPERO
2019
CRD42019141434
. https://www.crd.york.ac.uk/prospero/display_record.php?ID=CRD42019141434.
13
Snell
KI
,
Ensor
J
,
Debray
TP
,
Moons
KG
,
Riley
RD
.
Meta-analysis of prediction model performance across multiple studies: Which scale helps ensure between-study normality for the C-statistic and calibration measures?
Stat Methods Med Res.
2018
;
27
(
11
):
3505
3522
.
14
Higgins
JP
,
Thompson
SG
.
Quantifying heterogeneity in a meta-analysis
.
Stat Med.
2002
;
21
(
11
):
1539
1558
.
15
Debray
TP
,
Damen
JA
,
Snell
KI
et al
A guide to systematic review and meta-analysis of prediction model performance
.
BMJ.
2017
;
356
:
i6460
.
16
Deeks
JJ
,
Higgins
J
,
Altman DG.
Analysing data and undertaking meta-analyses, chapter 9.
Cochrane Collaboration
;
2011
.
17
Damen
JA
,
Pajouheshnia
R
,
Heus
P
et al
Performance of the framingham risk models and pooled cohort equations for predicting 10-year risk of cardiovascular disease: A systematic review and meta-analysis
.
BMC Med.
2019
;
17
(
1
):
109
7
.
18
The CRIB (clinical risk index for babies) score: A tool for assessing initial neonatal risk and comparing performance of neonatal intensive care units. the international neonatal network
.
Lancet.
1993
;
342
(
8865
):
193
198
.
19
Parry
G
,
Tucker
J
,
Tarnow-Mordi
W
,
UK
Neonatal
Staffing Study Collaborative Group. CRIB II: An update of the clinical risk index for babies score
.
Lancet.
2003
;
361
(
9371
):
1789
1791
.
20
Richardson
DK
,
Corcoran
JD
,
Escobar
GJ
,
Lee
SK
.
SNAP-II and SNAPPE-II: Simplified newborn illness severity and mortality risk scores
.
J Pediatr.
2001
;
138
(
1
):
92
100
.
21
Tyson
JE
,
Parikh
NA
,
Langer
J
,
Green
C
,
Higgins
RD
,
National Institute of Child Health and Human Development Neonatal Research Network. Intensive care for extreme prematurity–moving beyond gestational age
.
N Engl J Med.
2008
;
358
(
16
):
1672
1681
.
22
APGAR
V
.
A proposal for a new method of evaluation of the newborn infant
.
Curr Res Anesth Analg.
1953
;
32
(
4
):
260
267
.
23
Manktelow
BN
,
Draper
ES
,
Field
DJ
.
Predicting neonatal mortality among very preterm infants: A comparison of three versions of the CRIB score
.
Arch Dis Child Fetal Neonatal Ed.
2010
;
95
(
1
):
F9
F13
.
24
Rysavy
MA
,
Horbar
JD
,
Bell
EF
et al
Assessment of an updated neonatal research network extremely preterm birth outcome model in the vermont oxford network
.
JAMA Pediatr.
2020
;
174
(
5
):
e196294
.
25
Riley
RD
,
Ensor
J
,
Snell
KIE
et al
Calculating the sample size required for developing a clinical prediction model
.
BMJ.
2020
;
368
:
m441
.
26
Schafer
JL
.
Multiple imputation: A primer
.
Stat Methods Med Res.
1999
;
8
(
1
):
3
15
.
27
Rubin
DB
,
Schenker
N
.
Multiple imputation in health-care databases: An overview and some applications
.
Stat Med.
1991
;
10
(
4
):
585
598
.
28
van Buuren
S
,
Boshuizen
HC
,
Knook
DL
.
Multiple imputation of missing blood pressure covariates in survival analysis
.
Stat Med.
1999
;
18
(
6
):
681
694
.
29
White
IR
,
Royston
P
,
Wood
AM
.
Multiple imputation using chained equations: Issues and guidance for practice
.
Stat Med.
2011
;
30
(
4
):
377
399
.
30
Donders
AR
,
van der Heijden
G J
,
Stijnen
T
,
Moons
KG
.
Review: A gentle introduction to imputation of missing values
.
J Clin Epidemiol.
2006
;
59
(
10
):
1087
1091
.
31
Janssen
KJ
,
Donders
AR
,
Harrell
FE
et al
Missing covariate data in medical research: To impute is better than to ignore
.
J Clin Epidemiol.
2010
;
63
(
7
):
721
727
.
32
Marshall
A
,
Altman
DG
,
Royston
P
,
Holder
RL
.
Comparison of techniques for handling missing covariate data within prognostic modelling studies: A simulation study
.
BMC Med Res Methodol.
2010
;
10
:
7
7
.
33
Sterne
JA
,
White
IR
,
Carlin
JB
et al
Multiple imputation for missing data in epidemiological and clinical research: Potential and pitfalls
.
BMJ.
2009
;
338
:
b2393
.
34
Vergouwe
Y
,
Royston
P
,
Moons
KG
,
Altman
DG
.
Development and validation of a prediction model with missing predictor data: A practical approach
.
J Clin Epidemiol.
2010
;
63
(
2
):
205
214
.
35
Groenwold
RH
,
White
IR
,
Donders
AR
,
Carpenter
JR
,
Altman
DG
,
Moons
KG
.
Missing covariate data in clinical research: When and when not to use the missing-indicator method for analysis
.
CMAJ.
2012
;
184
(
11
):
1265
1269
.
36
Steyerberg
EW
,
Harrell
FE
,
Borsboom
GJ
,
Eijkemans
MJ
,
Vergouwe
Y
,
Habbema
JD
.
Internal validation of predictive models: Efficiency of some procedures for logistic regression analysis
.
J Clin Epidemiol.
2001
;
54
(
8
):
774
781
.
37
Austin
PC
,
Steyerberg
EW
.
Events per variable (EPV) and the relative performance of different strategies for estimating the out-of-sample validity of logistic regression models
.
Stat Methods Med Res.
2017
;
26
(
2
):
796
808
.
38
Steyerberg
EW
,
Harrell
FE
.
Prediction models need appropriate internal, internal-external, and external validation
.
J Clin Epidemiol.
2016
;
69
:
245
247
.
39
Castaldi
PJ
,
Dahabreh
IJ
,
Ioannidis
JP
.
An empirical assessment of validation practices for molecular classifiers
.
Brief Bioinform.
2011
;
12
(
3
):
189
202
.
40
Varma
S
,
Simon
R
.
Bias in error estimation when using cross-validation for model selection
.
BMC Bioinformatics.
2006
;
7
:
91
91
.
41
Kleinrouweler
CE
,
Cheong-See
FM
,
Collins
GS
et al
Prognostic models in obstetrics: Available, but far from applicable
.
Am J Obstet Gynecol.
2016
;
214
(
1
):
79
90.e36
.
42
Damen
JA
,
Hooft
L
,
Schuit
E
et al
Prediction models for cardiovascular disease risk in the general population: Systematic review
.
BMJ.
2016
;
353
:
i2416
.
43
Moons
KG
,
Altman
DG
,
Reitsma
JB
et al
Transparent reporting of a multivariable prediction model for individual prognosis or diagnosis (TRIPOD): Explanation and elaboration
.
Ann Intern Med.
2015
;
162
(
1
):
1
.
44
McLeod
JS
,
Menon
A
,
Matusko
N
et al
Comparing mortality risk models in VLBW and preterm infants: Systematic review and meta-analysis
.
J Perinatol.
2020
;
40
(
5
):
695
703
.
45
Debray
TP
,
Vergouwe
Y
,
Koffijberg
H
,
Nieboer
D
,
Steyerberg
EW
,
Moons
KG
.
A new framework to enhance the interpretation of external validation studies of clinical prediction models
.
J Clin Epidemiol.
2015
;
68
(
3
):
279
289
.
46
Vergouwe
Y
,
Moons
KG
,
Steyerberg
EW
.
External validity of risk models: Use of benchmark values to disentangle a case-mix effect from incorrect coefficients
.
Am J Epidemiol.
2010
;
172
(
8
):
971
980
.
47
Wynants
L
,
Van Calster
B
,
Collins
GS
et al
Prediction models for diagnosis and prognosis of covid-19 infection: Systematic review and critical appraisal
.
BMJ.
2020
;
369
:
m1328
.
48
Manktelow
BN
,
Seaton
SE
,
Field
DJ
,
Draper
ES
.
Population-based estimates of in-unit survival for very preterm infants
.
Pediatrics.
2013
;
131
(
2
):
425
.
49
Del
Rio
R
Thio
M
Bosio
M
Figueras
J
Iriondo
M. Prediction of mortality in premature neonates. an updated systematic review
.
An Pediatr (Barc).
2020
.
50
Pishevar
N
,
Fathi
O
,
Backes
CH
,
Shepherd
EG
,
Nelin
LD
.
Predicting survival in infants born at
J Perinatol.
2020
;
40
(
5
):
750
757
.
51
Podda
M
,
Bacciu
D
,
Micheli
A
,
Bellu
R
,
Placidi
G
,
Gagliardi
L
.
A machine learning approach to estimating preterm infants survival: Development of the preterm infants survival assessment (PISA) predictor
.
Sci Rep.
2018
;
8
(
1
):
13743
6
.
52
Oltman
SP
,
Rogers
EE
,
Baer
RJ
et al
Initial metabolic profiles are associated with 7-day survival among infants born at 22-25 weeks of gestation
.
J Pediatr.
2018
;
198
:
194
200.e3
.
53
Beltempo
M
,
Shah
PS
,
Ye
XY
et al
SNAP-II for prediction of mortality and morbidity in extremely preterm infants
.
J Matern Fetal Neonatal Med.
2019
;
32
(
16
):
2694
2701
.
54
Cnattingius
S
,
Norman
M
,
Granath
F
,
Petersson
G
,
Stephansson
O
,
Frisell
T
.
Apgar score components at 5 minutes: Risks and prediction of neonatal mortality
.
Paediatr Perinat Epidemiol.
2017
;
31
(
4
):
328
337
.
55
Koller-Smith
LI
,
Shah
PS
,
Ye
XY
et al
Comparing very low birth weight versus very low gestation cohort methods for outcome analysis of high risk preterm infants
.
BMC Pediatr.
2017
;
17
(
1
):
166
-x.
56
Steurer
MA
,
Anderson
J
,
Baer
RJ
et al
Dynamic outcome prediction in a socio-demographically diverse population-based cohort of extremely preterm neonates
.
J Perinatol.
2017
;
37
(
6
):
709
715
.
57
Sullivan
BA
,
McClure
C
,
Hicks
J
,
Lake
DE
,
Moorman
JR
,
Fairchild
KD.
Early
heart rate characteristics predict death and morbidities in preterm infants
.
J Pediatr.
2016
;
174
:
57
62
.
58
Jeschke
E
,
Biermann
A
,
Gunster
C
et al
Mortality and major morbidity of very-low-birth-weight infants in germany 2008-2012: A report based on administrative data
.
Front Pediatr.
2016
;
4
:
23
.
59
Rudiger
M
,
Braun
N
,
Aranda
J
et al
Neonatal assessment in the delivery room–trial to evaluate a specified type of apgar (TEST-apgar
).
BMC Pediatr.
2015
;
15
:
18
7
.
60
Vincer
MJ
,
Armson
BA
,
Allen
VM
et al
An algorithm for predicting neonatal mortality in threatened very preterm birth
.
J Obstet Gynaecol Can.
2015
;
37
(
11
):
958
965
.
61
Ravelli
AC
,
Schaaf
JM
,
Mol
BW
et al
Antenatal prediction of neonatal mortality in very premature infants
.
Eur J Obstet Gynecol Reprod Biol.
2014
;
176
:
126
131
.
62
Wu
PL
,
Lee
WT
,
Lee
PL
,
Chen
HL
.
Predictive power of serial neonatal therapeutic intervention scoring system scores for short-term mortality in very-low-birth-weight infants
.
Pediatr Neonatol.
2015
;
56
(
2
):
108
113
.
63
Dong
Y
,
Yue
G
,
Yu
JL
.
Changes in perinatal care and predictors of in-hospital mortality for very low birth weight preterm infants
.
Iran J Pediatr.
2012
;
22
(
3
):
326
332
.
64
Ambalavanan
N
,
Carlo
WA
,
Tyson
JE
et al
Outcome trajectories in extremely preterm infants
.
Pediatrics.
2012
;
130
(
1
):
115
.
65
Lee
SK
,
Aziz
K
,
Dunn
M
et al
Transport risk index of physiologic stability, version II (TRIPS-II): A simple and practical neonatal illness severity score
.
Am J Perinatol.
2013
;
30
(
5
):
395
400
.
66
Phillips
LA
,
Dewhurst
CJ
,
Yoxall
CW
.
The prognostic value of initial blood lactate concentration measurements in very low birthweight infants and their use in development of a new disease severity scoring system
.
Arch Dis Child Fetal Neonatal Ed.
2011
;
96
(
4
):
275
.
67
Schenone
MH
,
Aguin
E
,
Li
Y
,
Lee
C
,
Kruger
M
,
Bahado-Singh
RO
.
Prenatal prediction of neonatal survival at the borderline viability
.
J Matern Fetal Neonatal Med.
2010
;
23
(
12
):
1413
1418
.
68
Cole
TJ
,
Hey
E
,
Richmond
S
.
The PREM score: A graphical tool for predicting survival in very preterm births
.
Arch Dis Child Fetal Neonatal Ed.
2010
;
95
(
1
):
14
.
69
Gargus
RA
,
Vohr
BR
,
Tyson
JE
et al
Unimpaired outcomes for extremely low birth weight infants at 18 to 22 months
.
Pediatrics.
2009
;
124
(
1
):
112
121
.
70
Forsblad
K
,
Kallen
K
,
Marsal
K
,
Hellstrom-Westas
L
.
Short-term outcome predictors in infants born at 23-24 gestational weeks
.
Acta Paediatr.
2008
;
97
(
5
):
551
556
.
71
Zupancic
JA
,
Richardson
DK
,
Horbar
JD
et al
Revalidation of the score for neonatal acute physiology in the vermont oxford network
.
Pediatrics.
2007
;
119
(
1
):
156
.
72
Forsblad
K
,
Kallen
K
,
Marsal
K
,
Hellstrom-Westas
L
.
Apgar score predicts short-term outcome in infants born at 25 gestational weeks
.
Acta Paediatr.
2007
;
96
(
2
):
166
171
.
73
Evans
N
,
Hutchinson
J
,
Simpson
JM
,
Donoghue
D
,
Darlow
B
,
Henderson-Smart
D
.
Prenatal predictors of mortality in very preterm infants cared for in the australian and new zealand neonatal network
.
Arch Dis Child Fetal Neonatal Ed.
2007
;
92
(
1
):
34
.
74
Marshall
G
,
Tapia
JL
,
D'Apremont
I
et al
A new score for predicting neonatal very low birth weight mortality risk in the NEOCOSUR south american network
.
J Perinatol.
2005
;
25
(
9
):
577
582
.
75
Locatelli
A
,
Roncaglia
N
,
Andreotti
C
et al
Factors affecting survival in infants weighing 750 g or less
.
Eur J Obstet Gynecol Reprod Biol.
2005
;
123
(
1
):
52
55
.
76
Ambalavanan
N
,
Carlo
WA
,
Bobashev
G
et al
Prediction of death for extremely low birth weight neonates
.
Pediatrics.
2005
;
116
(
6
):
1367
1373
.
77
Janota
J
,
Stranak
Z
,
Statecna
B
,
Dohnalova
A
,
Sipek
A
,
Simak
J
.
Characterization of multiple organ dysfunction syndrome in very low birthweight infants: A new sequential scoring system
.
Shock.
2001
;
15
(
5
):
348
352
.
78
Ambalavanan
N
,
Carlo
WA
.
Comparison of the prediction of extremely low birth weight neonatal mortality by regression analysis and by neural networks
.
Early Hum Dev.
2001
;
65
(
2
):
123
137
.
79
Doyle
LW
,
Victorian Infant Collaborative Study Group. Outcome at 5 years of age of children 23 to 27 weeks' gestation: Refining the prognosis
.
Pediatrics.
2001
;
108
(
1
):
134
141
.
80
Pollack
MM
,
Koch
MA
,
Bartel
DA
et al
A comparison of neonatal mortality risk prediction models in very low birth weight infants
.
Pediatrics.
2000
;
105
(
5
):
1051
1057
.
81
Draper
ES
,
Manktelow
B
,
Field
DJ
,
James
D
.
Prediction of survival for preterm births by weight and gestational age: Retrospective population based study
.
BMJ.
1999
;
319
(
7217
):
1093
1097
.
82
Zernikow
B
,
Holtmannspoetter
K
,
Michel
E
et al
Artificial neural network for risk assessment in preterm neonates
.
Arch Dis Child Fetal Neonatal Ed.
1998
;
79
(
2
):
129
.
83
Rautonen
J
,
Makela
A
,
Boyd
H
,
Apajasalo
M
,
Pohjavuori
M
.
CRIB and SNAP: Assessing the risk of death for preterm neonates
.
Lancet.
1994
;
343
(
8908
):
1272
1273
.
84
de Courcy-Wheeler
RH
,
Wolfe
CD
,
Fitzgerald
A
,
Spencer
M
,
Goodman
JD
,
Gamsu
HR
.
Use of the CRIB (clinical risk index for babies) score in prediction of neonatal mortality and morbidity
.
Arch Dis Child Fetal Neonatal Ed.
1995
;
73
(
1
):
32
.
85
Kaaresen
PI
,
Dohlen
G
,
Fundingsrud
HP
,
Dahl
LB
.
The use of CRIB (clinical risk index for babies) score in auditing the performance of one neonatal intensive care unit
.
Acta Paediatr.
1998
;
87
(
2
):
195
200
.
86
Khanna
R
,
Taneja
V
,
Singh
SK
,
Kumar
N
,
Sreenivas
V
,
Puliyel
JM
.
The clinical risk index of babies (CRIB) score in india
.
Indian J Pediatr.
2002
;
69
(
11
):
957
960
.
87
Maier
RF
,
Caspar-Karweck
UE
,
Grauel
EL
,
Bassir
C
,
Metze
BC
,
Obladen
M
.
A comparison of two mortality risk scores for very low birthweight infants: Clinical risk index for babies and berlin score
.
Intensive Care Med.
2002
;
28
(
9
):
1332
1335
.
88
Brito
AS
,
Matsuo
T
,
Gonzalez
MR
,
de Carvalho
AB
,
Ferrari
LS
.
CRIB score, birth weight and gestational age in neonatal mortality risk evaluation
.
Rev Saude Publica.
2003
;
37
(
5
):
597
602
.
89
Zardo
MS
,
Procianoy
RS
.
Comparison between different mortality risk scores in a neonatal intensive care unit
.
Rev Saude Publica.
2003
;
37
(
5
):
591
596
.
90
Gagliardi
L
,
Cavazza
A
,
Brunelli
A
et al
Assessing mortality risk in very low birthweight infants: A comparison of CRIB, CRIB-II, and SNAPPE-II
.
Arch Dis Child Fetal Neonatal Ed.
2004
;
89
(
5
):
419
.
91
De Felice
C
,
Del Vecchio
A
,
Latini
G
.
Evaluating illness severity for very low birth weight infants: CRIB or CRIB-II?
J Matern Fetal Neonatal Med.
2005
;
17
(
4
):
257
260
.
92
Buhrer
C
,
Metze
B
,
Obladen
M
.
CRIB, CRIB-II, birth weight or gestational age to assess mortality risk in very low birth weight infants?
Acta Paediatr.
2008
;
97
(
7
):
899
903
.
93
Asker
HS
,
Satar
M
,
Yildizdas
HY
et al
Evaluation of score for neonatal acute physiology and perinatal extension II and clinical risk index for babies with additional parameters
.
Pediatr Int.
2016
;
58
(
10
):
984
987
.
94
Rastogi
PK
,
Sreenivas
V
,
Kumar
N
.
Validation of CRIB II for prediction of mortality in premature babies
.
Indian Pediatr.
2010
;
47
(
2
):
145
147
.
95
Greenwood
S
,
Abdel-Latif
ME
,
Bajuk
B
,
Lui
K
,
NSW
and
ACT Neonatal Intensive Care Units Audit Group. Can the early condition at admission of a high-risk infant aid in the prediction of mortality and poor neurodevelopmental outcome? A population study in australia
.
J Paediatr Child Health.
2012
;
48
(
7
):
588
595
.
96
Reid
S
,
Bajuk
B
,
Lui
K
,
Sullivan
EA
,
NSW and ACT Neonatal Intensive Care Units Audit Group, PSN
.
Comparing CRIB-II and SNAPPE-II as mortality predictors for very preterm infants
.
J Paediatr Child Health.
2015
;
51
(
5
):
524
528
.
97
Ezz-Eldin
ZM
,
Hamid
TA
,
Youssef
MR
,
Nabil
H
.
Clinical risk index for babies (CRIB II) scoring system in prediction of mortality in premature babies
.
J Clin Diagn Res.
2015
;
9
(
6
):
SC08
11
.
98
Park
JH
,
Chang
YS
,
Ahn
SY,
,
Sung
SI
,
Park
WS
.
Predicting mortality in extremely low birth weight infants: Comparison between gestational age, birth weight, apgar score, CRIB II score, initial and lowest serum albumin levels
.
PLoS One.
2018
;
13
(
2
):
e0192232
.
99
Sotodate
G
,
Oyama
K
,
Matsumoto
A
,
Konishi
Y
,
Toya
Y
,
Takashimizu
N
.
Predictive ability of neonatal illness severity scores for early death in extremely premature infants
.
J Matern Fetal Neonatal Med.
2020
:
1
6
.
100
Boland
RA
,
Davis
PG
,
Dawson
JA
,
Doyle
LW
,
Victorian Infant Collaborative Study Group. Predicting death or major neurodevelopmental disability in extremely preterm infants born in australia
.
Arch Dis Child Fetal Neonatal Ed.
2013
;
98
(
3
):
201
.
101
Marrs
CC
,
Pedroza
C
,
Mendez-Figueroa
H
,
Chauhan
SP
,
Tyson
JE
.
Infant outcomes after periviable birth: External validation of the neonatal research network estimator with the BEAM trial
.
Am J Perinatol.
2016
;
33
(
6
):
569
576
.
102
Yeo
KT
,
Safi
N
,
Wang
YA
et al
Prediction of outcomes of extremely low gestational age newborns in australia and new zealand
.
BMJ Paediatr Open.
2017
;
1
(
1
):
e000205-000205
.
eCollection
2017.
103
Mori
R
,
Shiraishi
J
,
Negishi
H
,
Fujimura
M
.
Predictive value of apgar score in infants with very low birth weight
.
Acta Paediatr.
2008
;
97
(
6
):
720
723
.
104
Dalili
H
,
Sheikh
M
,
Hardani
AK
,
Nili
F
,
Shariat
M
,
Nayeri
F
.
Comparison of the combined versus conventional apgar scores in predicting adverse neonatal outcomes
.
PLoS One.
2016
;
11
(
2
):
e0149464
.
105
Richardson
DK
,
Phibbs
CS
,
Gray
JE
,
McCormick
MC
,
Workman-Daniels
K
,
Goldmann
DA
.
Birth weight and illness severity: Independent predictors of neonatal mortality
.
Pediatrics.
1993
;
91
(
5
):
969
975
.
106
Richardson
DK
,
Gray
JE
,
McCormick
MC
,
Workman
K
,
Goldmann
DA
.
Score for neonatal acute physiology: A physiologic severity index for neonatal intensive care
.
Pediatrics.
1993
;
91
(
3
):
617
623
.
107
Gray
JE
,
Richardson
DK
,
McCormick
MC
,
Workman-Daniels
K
,
Goldmann
DA
.
Neonatal therapeutic intervention scoring system: A therapy-based severity-of-illness index
.
Pediatrics.
1992
;
90
(
4
):
561
567
.
108
Maier
RF
,
Rey
M
,
Metze
BC
,
Obladen
M
.
Comparison of mortality risk: A score for very low birthweight infants
.
Arch Dis Child Fetal Neonatal Ed.
1997
;
76
(
3
):
F146
1
.
109
Horbar
JD
,
Onstad
L
,
Wright
E
.
Predicting mortality risk for infants weighing 501 to 1500 grams at birth: A national institutes of health neonatal research network report
.
Crit Care Med.
1993
;
21
(
1
):
12
18
.

Competing Interests

POTENTIAL CONFLICT OF INTEREST: The authors have indicated they have no potential conflicts of interest to disclose.

Supplementary data