BACKGROUND AND OBJECTIVES

Research on outcomes of prematurity frequently examines neurodevelopment in the toddler years as an end point, but the age range at examination varies. We aimed to evaluate whether the corrected age (CA) at Bayley-III assessment is associated with rates of developmental delay in extremely preterm children.

METHODS

This retrospective cohort study included children born at <29 weeks’ gestation who were admitted in the Canadian Neonatal Network between 2009 and 2017. The primary outcomes were significant developmental delay (Bayley-III score <70 in any domain) and developmental delay (Bayley-III score <85 in any domain). To assess the association between CA at Bayley-III assessment and developmental delay, we compared outcomes between 2 groups of children: those assessed at 18 to 20 months’ CA and 21–24 months.

RESULTS

Overall, 3944 infants were assessed at 18–20 months’ CA and 881 at 21–24 months. Compared with infants assessed at 18–20 months, those assessed at 21–24 months had higher odds of significant development delay (20.0% vs 12.5%; adjusted odds ratio, 1.75; 95% confidence interval [CI], 1.41–2.13) and development delays (48.9% vs 41.7%, adjusted odds ratio 1.33; 95% CI, 1.11–1.52). Bayley-III composite scores were on average 3 to 4 points lower in infants evaluated at 21–24 months’ CA (for instance, adjusted mean difference and 95% CI for language: 3.49 [2.33–4.66]). Conversely, rates of cerebral palsy were comparable (4.6% vs 4.7%) between the groups.

CONCLUSIONS

Bayley-III assessments performed at 21–24 months’ CA were more likely to diagnose a significant developmental delay compared with 18- to 20-month assessments in extremely preterm children.

What’s Known on This Subject:

The toddler neurodevelopmental evaluation represents a key period for developmental surveillance and a common research end point. In clinical practice and research, a wide time window (18–24 months) around the target age for assessment is given to increase follow-up rates.

What This Study Adds:

In extremely preterm children, Bayley-III assessments at 21–24 months’ corrected age more often identified significant developmental delay relative to 18- to 20-month assessments. Later age of assessment may more accurately diagnose developmental challenges as the complexity of the tasks increases.

Outcomes research on extremely preterm infants frequently examines neurodevelopment in the toddler years as a primary end point because of the higher chances of identifying developmental challenges.1,2  The age range at which this occurs varies from 1 study to the other. A time window around the target age for assessment is given to facilitate scheduling and increase the follow-up rate. However, in toddlers, development changes rapidly from 1 month to the next. Standardized developmental assessments, such as the Bayley Scales of Infant and Toddler Development (Bayley), account for these changes using play tasks that evaluate language, cognitive, and motor skills, and that become more complex with increasing age.3,4  One concern in studies allowing for a larger age range during assessment is that children seen at a later age may be at a disadvantage compared with those seen earlier because of higher developmental expectations that can unmask challenges not previously observed. Conversely, preterm children with latent developmental problems who are assessed too early may be missed. Therefore, the timing of assessment may influence identification of developmental delays, which are important at the individual child-level to direct early intervention and at the macro-level for establishing time trends in outcomes and benchmarking between units.5 

As part of the Canadian Neonatal Follow-Up Network (CNFUN), preterm children born <29 weeks are assessed between 18 to 24 months’ corrected age (CA). This variation allowed us to investigate whether the age at which the Bayley was administered was associated with rates of developmental delay. We hypothesized that rates of developmental delay would be higher in children seen at later ages within the window of 18–24 months’ CA.

We conducted a retrospective cohort study using data from the Canadian Neonatal Network (CNN) and CNFUN. Neonatal data were abstracted from neonatal medical records by trained personnel according to standardized definitions and transmitted to the CNN Coordinating Centre at the Maternal-Infant Care Research Centre in Toronto, Ontario, as previously described.6  Neurodevelopmental outcome data for survivors were collected as part of the CNFUN, a collaboration between neonatal and follow-up programs across Canada that includes a national standardized assessment between 18–24 months’ CA. The CNFUN does not make recommendations around the timing of the 18- to 24-month CA assessment. The CNFUN has linked data to the CNN, as previously described.7  Data collection for the CNN/CNFUN is approved by either the research ethics boards or hospital quality improvement committees at each site. Approval for this project was obtained from the McGill University Health Centre research ethics board and the Executive Committees of the CNN and CNFUN.

Preterm neonates born at <29 weeks’ gestation and admitted to participating CNN units between April 1, 2009, and December 31, 2017, were eligible for inclusion. December 31, 2017, was chosen as the end of the study period because the follow-up rates of children born in 2018 were affected by the COVID-19 pandemic. Infants evaluated before 17 months, 16 days, or after 24 months, 15 days, were excluded. Because of resource limitations, many sites have low data entry for the CNFUN. The study cohort was thus restricted to sites with neurodevelopmental outcomes data available for ≥50% of survivors during the entire study period (15 of 26 participating sites across Canada). We chose 50% (compared with 70%–80% threshold used in other studies) because we did not expect attrition bias to have a major impact on our study question.

As part of the CNFUN standardized protocol, infants born <29 weeks’ gestational age (GA) undergo neurodevelopmental assessment at 18–24 months’ CA. This includes a neurologic examination for signs of cerebral palsy and assessment of hearing and visual function. Children also undergo evaluation with the Bayley, which yields language, cognitive, and motor composite scores. For the 2009–2017 birth cohort, the third edition of the Bayley (Bayley-III) was administered. Scaled scores are derived from raw scores and represent a child’s performance compared with their same-age peers grouped according to corrected months and days as follows: 17 months, 16 days, to 18 months, 15 days; 18 months, 16 days, to 19 months, 15 days; and so on. Composite scores for each of the included domains are based on sums of scaled scores and have a general population mean of 100 and an SD of 15.3 

The study’s primary outcomes were significant developmental delay, defined as a Bayley-III score <70 in any domain, and developmental delay, defined as a Bayley-III score <85 in any domain.8  Secondary outcomes included the individual components of significant developmental delay and developmental delay and cerebral palsy.9  Last, we evaluated the composite outcomes of neurodevelopmental impairment (Bayley-III score <85 in any domain, cerebral palsy with a Gross Motor Function Classification System level ≥I, unilateral or bilateral visual impairment, or sensorineural/mixed hearing loss) and significant neurodevelopmental impairment (Bayley-III score <70 in any domain, cerebral palsy with a Gross Motor Function Classification System level ≥III, bilateral blindness, or the need for hearing aid or cochlear implant).7,10 

Small for gestational age (SGA) was defined as birth weight below the 10th percentile for GA and sex.11  Bronchopulmonary dysplasia (BPD) was defined as the receipt of supplemental oxygen or any ventilatory support at 36 weeks’ postmenstrual age12 ; severe brain injury as grade 3 intraventricular hemorrhage or higher or persistent periventricular echogenicity; necrotizing enterocolitis as Bell stage 2 or higher13 ; severe retinopathy of prematurity as stage 3 or higher in either eye; and late-onset sepsis as a pathogenic organism in either blood or cerebrospinal fluid culture in a symptomatic neonate drawn after the third day of life.

Sociodemographic variables were obtained from the CNFUN database and included maternal education, caregiver status, ethnicity, maternal country of origin and maternal employment, or student status.7 

To assess the association between CA at Bayley-III assessment with developmental delay, we compared outcomes between 2 groups of children: those assessed at 18–20 months’ CA (ie, 17 months, 16 days–20 months, 15 days) and 21–24 months (ie, 20 months, 16 days–24 months, 15 days). The division of the cohort into 2 groups was pragmatic to have comparable time frames and children assessed at the middle of the range at 21 months’ CA were categorized into the later 21- to 24-month group because this group was smaller.

Group characteristics were compared using the Pearson χ2 test (categorical variables), the Student’s t test, and the Wilcoxon rank test as appropriate (continuous variables). Odds ratios (ORs) and mean differences with 95% confidence intervals (CIs) were calculated for differences between the 2 groups using a generalized estimating equations approach with symmetric covariance structure to account for clustering within each site.14  Analyses were adjusted for differences in patient characteristics that might impact neurodevelopment (maternal education, GA, SGA, sex, multiple pregnancies, outborn, and neonatal morbidities that were unbalanced between the groups [P < .1: BPD and late-onset sepsis]).

Sensitivity analyses for the primary outcomes were conducted to assess potential biases. First, the analysis was stratified by site to assess if the results were consistent. Second, the analysis was stratified based on the proportion of infants followed at 18–20 months (≥80% and <80%) to assess if results were similar in sites with a consistent CA at assessment compared with more intrasite variability. Third, a propensity score was estimated using a multivariable logistic regression model that included maternal education, GA, SGA, singleton, sex, outborn, BPD, and late-onset sepsis. Matching was performed using the SAS macro match.sas and was based on a caliper width of 0.1-fold the SD of the logit-transformed propensity scores. Association of the exposure with the outcomes in the matched samples was examined with logistic regression analyses using generalized estimating equations with an unstructured correlation. Finally, the analysis was stratified by site based on the rate of follow-up using thresholds of >70% and >80% to assess for any association with site attrition rate. All analyses were conducted using SAS 9.4 (SAS Institute Inc, Cary, NC) with a 2-sided significance level of P < .05.

The study population comprised 4825 infants with a complete Bayley-III evaluation at 18–24 months’ CA (Fig 1). Before excluding sites with a follow-up rate <50%, a comparison of antenatal and neonatal characteristics between infants with complete 18- to 24-month Bayley-III assessment (n = 5598) and the infants without (n = 5789) showed that infants with complete assessments were more often SGA, less often outborn, and had higher rates of late-onset sepsis (Supplemental Table 3). Infants included in this study, compared with the 773 infants subsequently excluded for being followed at sites with a follow-up rate <50%, were less often outborn and had lower rates of retinopathy of prematurity and severe brain injury (Supplemental Table 4).

FIGURE 1

Flow diagram of excluded, lost to follow-up, dead, and assessed infants.

FIGURE 1

Flow diagram of excluded, lost to follow-up, dead, and assessed infants.

Close modal

The follow-up rate among the 15 included sites was 77.6% and site-specific follow-up rate ranged from 63.8% to 91.3%. The median CA at Bayley-III assessment was 18.7 months (interquartile range [IQR], 18.2–19.8) and varied between sites (Supplemental Table 5). Differences in neonatal characteristics and morbidities between the 3944 infants assessed at 18–20 months’ CA and the 881 infants assessed at 21–24 months’ CA are reported in Table 1. Infants assessed at 21–24 months were more likely to be SGA, be outborn, and have BPD, and less likely to have late-onset sepsis and a mother with a college education or higher.

TABLE 1

Child Characteristics Stratified by Corrected Age at Bayley-III Assessment

Characteristics18–20 mo Corrected Age at Bayley-III Assessment n = 394421–24 mo Corrected Age at Bayley-III Assessment n = 881P
Antenatal 
 Maternal age, y 31.7 (5.6) 31.0 (5.9) <.01 
 Any antenatal steroids 3523 (91.2) 791 (90.7) .68 
 Any magnesium sulfate 1764 (51.0) 357 (44.4) <.01 
 Multiples 1043 (26.5) 249 (28.3) .27 
Perinatal 
 Gestational age, wk 26.3 (1.4) 26.3 (1.5) .98 
 Birth weight, g 933 (233) 924 (247) .31 
 Male sex 2120 (53.9) 472 (53.6) .87 
 Outborn 449 (11.4) 123 (13.9) .03 
 Cesarean delivery 2370 (60.3) 509 (57.9) .19 
 Small for gestational age 314 (8.0) 92 (10.4) .02 
 Apgar score at 5 min 7 (6, 8) 7 (6, 8) .01 
 SNAP-II score >20 1006 (25.7) 226 (25.7) .97 
Postnatal factors 
 Length of stay, days 81.1 (40.2) 81.7 (42.4) .69 
 Bronchopulmonary dysplasia 1760 (44.6) 472 (53.6) <.01 
 Late-onset sepsis 986 (25.0) 191 (21.7) .04 
 Necrotizing enterocolitis 226 (5.8) 47 (5.4) .63 
 Severe retinopathy of prematurity 410 (13.0) 95 (12.8) .88 
 Severe brain injury 317 (8.3) 80 (9.3) .34 
Environmental 
 College and higher education 2213 (59.1) 449 (55.0) .03 
 Single caregiver 273 (6.9) 70 (8.0) .28 
 Ethnicity, white 1908 (55.8) 423 (52.6) .11 
 Mother born in Canada 2088 (68.4) 491 (65.4) .11 
 Mother employed or student 2560 (67.7) 515 (61.7) <.01 
Characteristics18–20 mo Corrected Age at Bayley-III Assessment n = 394421–24 mo Corrected Age at Bayley-III Assessment n = 881P
Antenatal 
 Maternal age, y 31.7 (5.6) 31.0 (5.9) <.01 
 Any antenatal steroids 3523 (91.2) 791 (90.7) .68 
 Any magnesium sulfate 1764 (51.0) 357 (44.4) <.01 
 Multiples 1043 (26.5) 249 (28.3) .27 
Perinatal 
 Gestational age, wk 26.3 (1.4) 26.3 (1.5) .98 
 Birth weight, g 933 (233) 924 (247) .31 
 Male sex 2120 (53.9) 472 (53.6) .87 
 Outborn 449 (11.4) 123 (13.9) .03 
 Cesarean delivery 2370 (60.3) 509 (57.9) .19 
 Small for gestational age 314 (8.0) 92 (10.4) .02 
 Apgar score at 5 min 7 (6, 8) 7 (6, 8) .01 
 SNAP-II score >20 1006 (25.7) 226 (25.7) .97 
Postnatal factors 
 Length of stay, days 81.1 (40.2) 81.7 (42.4) .69 
 Bronchopulmonary dysplasia 1760 (44.6) 472 (53.6) <.01 
 Late-onset sepsis 986 (25.0) 191 (21.7) .04 
 Necrotizing enterocolitis 226 (5.8) 47 (5.4) .63 
 Severe retinopathy of prematurity 410 (13.0) 95 (12.8) .88 
 Severe brain injury 317 (8.3) 80 (9.3) .34 
Environmental 
 College and higher education 2213 (59.1) 449 (55.0) .03 
 Single caregiver 273 (6.9) 70 (8.0) .28 
 Ethnicity, white 1908 (55.8) 423 (52.6) .11 
 Mother born in Canada 2088 (68.4) 491 (65.4) .11 
 Mother employed or student 2560 (67.7) 515 (61.7) <.01 

Categoric data presented as n (%) and continuous data as mean (SD) or median (interquartile range). 18–20 mo corrected age refers to infants with a corrected age of 17 mo, 16 d–20 mo, 15 d, and 21–24 mo refers to 20 mo, 16 d–24 mo, 15 d. Group characteristics were compared using the Pearson χ2 test (categorical variables), the Student’s t test, and the Wilcoxon rank test as appropriate (continuous variables).

Rates of significant developmental delay for preterm infants seen at 18 to 20 months’ CA versus 21–24 months’ CA were 12.5% and 20.0%, respectively, and for developmental delay, 41.7% and 48.9%, respectively (Table 2). For infants assessed at 21–24 months’ CA, this translated into an adjusted 1.75-fold higher odds of scoring <70 on any of the domains and a 1.33-fold higher odds of scoring <85 in comparison with those seen at an earlier age. Rates of delay in each of the individual language, cognitive, and motor domains were also higher in infants assessed at 21–24 months versus 18–20 months’ CA. Moreover, composite scores were on average 3 to 4 points lower in infants evaluated at 21–24 months’ CA. On the other hand, rates of cerebral palsy were equivalent between the groups.

TABLE 2

Neurodevelopmental Outcomes Stratified by Corrected Age at Bayley-III and Association Between Corrected Age at Bayley-III and Neurodevelopmental Outcomes

Characteristics18–20 mo Corrected Age at Bayley-III Assessment
n = 3944
21–24 mo Corrected Age at Bayley-III Assessment
n = 881
Unadjusted MD/OR (95% CI)Adjusteda
MD/OR (95% CI)
Bayley-III 
 Language composite score 90.0 (16.0) 86.1 (16.3) 3.90 (2.73–5.07) 3.49 (2.07–4.92) 
 Cognitive composite score 96.7 (13.6) 93.0 (13.7) 3.72 (2.73–4.72) 3.47 (1.94–5.01) 
 Motor composite score 93.3 (13.2) 89.7 (13.1) 3.67 (2.71–4.63) 3.45 (1.91–5.00) 
 Language composite score <70 386 (9.8) 153 (17.4) 0.79 (0.68–0.92) 0.81 (0.68–0.96) 
 Cognitive composite score <70 113 (2.9) 42 (4.8) 0.62 (0.51–0.75) 0.62 (0.50–0.76) 
 Motor composite score <70 200 (5.1) 63 (7.2) 0.63 (0.53–0.75) 0.62 (0.50–0.78) 
 Language composite score <85 1383 (35.1) 358 (40.6) 0.52 (0.42–0.63) 0.52 (0.41–0.66) 
 Cognitive composite score <85 523 (13.3) 174 (19.8) 0.59 (0.41–0.85) 0.61 (0.47–0.81) 
 Motor composite score <85 737 (18.7) 235 (26.7) 0.69 (0.52–0.93) 0.72 (0.50–1.03) 
Cerebral palsy 
 Any cerebral palsy 183 (4.7) 40 (4.6) 1.01 (0.71–1.44) 1.02 (0.73–1.42) 
 Cerebral palsy with GMFCS III-V 32 (0.8) 7 (0.8) 1.01 (0.45–2.30) 1.12 (0.49–2.53) 
Composite diagnoses 
 Developmental delay 645 (41.7) 431 (48.9) 0.75 (0.65–0.87) 0.77 (0.64–0.92) 
 Significant developmental delay 492 (12.5) 176 (20.0) 0.57 (0.47–0.69) 0.58 (0.46–0.73) 
 Neurodevelopmental impairment 1758 (44.6) 451 (51.2) 0.77 (0.66–0.89) 0.79 (0.62–1.00) 
 Significant neurodevelopmental impairment 545 (13.8) 181 (20.5) 0.62 (0.51–0.75) 0.63 (0.49–0.80) 
Characteristics18–20 mo Corrected Age at Bayley-III Assessment
n = 3944
21–24 mo Corrected Age at Bayley-III Assessment
n = 881
Unadjusted MD/OR (95% CI)Adjusteda
MD/OR (95% CI)
Bayley-III 
 Language composite score 90.0 (16.0) 86.1 (16.3) 3.90 (2.73–5.07) 3.49 (2.07–4.92) 
 Cognitive composite score 96.7 (13.6) 93.0 (13.7) 3.72 (2.73–4.72) 3.47 (1.94–5.01) 
 Motor composite score 93.3 (13.2) 89.7 (13.1) 3.67 (2.71–4.63) 3.45 (1.91–5.00) 
 Language composite score <70 386 (9.8) 153 (17.4) 0.79 (0.68–0.92) 0.81 (0.68–0.96) 
 Cognitive composite score <70 113 (2.9) 42 (4.8) 0.62 (0.51–0.75) 0.62 (0.50–0.76) 
 Motor composite score <70 200 (5.1) 63 (7.2) 0.63 (0.53–0.75) 0.62 (0.50–0.78) 
 Language composite score <85 1383 (35.1) 358 (40.6) 0.52 (0.42–0.63) 0.52 (0.41–0.66) 
 Cognitive composite score <85 523 (13.3) 174 (19.8) 0.59 (0.41–0.85) 0.61 (0.47–0.81) 
 Motor composite score <85 737 (18.7) 235 (26.7) 0.69 (0.52–0.93) 0.72 (0.50–1.03) 
Cerebral palsy 
 Any cerebral palsy 183 (4.7) 40 (4.6) 1.01 (0.71–1.44) 1.02 (0.73–1.42) 
 Cerebral palsy with GMFCS III-V 32 (0.8) 7 (0.8) 1.01 (0.45–2.30) 1.12 (0.49–2.53) 
Composite diagnoses 
 Developmental delay 645 (41.7) 431 (48.9) 0.75 (0.65–0.87) 0.77 (0.64–0.92) 
 Significant developmental delay 492 (12.5) 176 (20.0) 0.57 (0.47–0.69) 0.58 (0.46–0.73) 
 Neurodevelopmental impairment 1758 (44.6) 451 (51.2) 0.77 (0.66–0.89) 0.79 (0.62–1.00) 
 Significant neurodevelopmental impairment 545 (13.8) 181 (20.5) 0.62 (0.51–0.75) 0.63 (0.49–0.80) 

Categoric data presented as n (%) and continuous data as mean (SD). 18–20 mo corrected age refers to infants with a corrected age of 17 mo, 16 d–20 mo, 15 d, and 21–24 mo refers to 20 mo, 16 d–24 mo, 15 d. GMFCS, Gross Motor Function Classification System; OR, odds ratio.

a

Adjusted for maternal education, gestational age, small for gestational age, singleton, sex, outborn, bronchopulmonary dysplasia and late-onset sepsis, and clustering within each site using generalized estimation equations approach. Reference is the 21- to 24-mo group.

The timing of evaluation (in months) was also associated with the mean Bayley-III score in each domain: the mean scores decreased progressively as timing of evaluation increased (Fig 2; unadjusted P value for trend <.01 for each domain).

FIGURE 2

Mean (SD) Bayley-III composite scores for motor, cognitive and language according to corrected age at time of evaluation. Y-axes cut at 65 for data visualization. P values for association of trend (timing of evaluation with score) obtained via linear regression (unadjusted) were <.01 for motor, cognitive, and language scores.

FIGURE 2

Mean (SD) Bayley-III composite scores for motor, cognitive and language according to corrected age at time of evaluation. Y-axes cut at 65 for data visualization. P values for association of trend (timing of evaluation with score) obtained via linear regression (unadjusted) were <.01 for motor, cognitive, and language scores.

Close modal

There was site variation in the proportion of infants assessed at 18–20 months’ CA (range, 36%–96%; Supplemental Table 5). Analyses stratified by site showed similar effect directions (18- to 20-month CA evaluation was associated with lower odds of developmental delay and significant developmental delay compared with 21–24 months). Analyses stratified by site based on the proportion of follow-up at 18–20 months’ CA (Supplemental Table 6) and propensity score analyses (Supplemental Table 7) showed similar results as the primary analysis. Sensitivity analyses using sites with >70% and >80% follow-up rates showed similar results as well (Supplemental Table 8).

In this large, multicenter cohort study of 4825 preterm children born at <29 weeks’ gestation, the timing of the 18- to 24-month CA Bayley-III assessment was independently associated with the diagnosis of significant developmental delay, developmental delay, and all 3 of the individual components of the Bayley-III. Later evaluation (21–24 months’ CA) was associated with higher odds of identifying significant developmental delay and developmental delay compared with earlier evaluation (18–20 months’ CA) but not a higher odds of being diagnosed with cerebral palsy.

The main goal of developmental surveillance in the extreme preterm population is the early and accurate identification of developmental delays, which then allows for earlier intervention and better outcomes.5  Currently, the Bayley is the most widely used tool for assessment of early childhood development.15  Although our study lacked a school-age outcome against which to compare the 18- to 20- and 21- to 24-month CA outcomes, our results suggest that the timing of the 18- to 24-month CA assessment may affect the likelihood of detecting a developmental delay. This is similar to other reports that have used different methods. For example, a study explored the stability of developmental delay at 7 time points (3, 4, 6, 9, 12, 18, and 24 months’ CA) in a mixed sample of 54 preterm and term children. They found that the earlier Bayley-III assessments were not sensitive in identifying children with developmental delay at 24 months.16  Our results suggest that later Bayley-III assessment at 21–24 months’ CA may be more sensitive for developmental delay than earlier assessment at 18–20 months’ CA. To reduce underidentification of developmental delay, later evaluation with the Bayley-III at 21–24 months may be desirable at the individual patient level.

The higher odds of developmental delay in extremely preterm children evaluated at 21–24 months could also be interpreted as the Bayley-III assessment at 21–24 months overestimating developmental delay. Indeed, a proportion of children with early diagnosis of developmental delay have normal or only mildly impaired school-age function.17  For example, in a prospective study of very preterm children, ∼40% of children with a Bayley-III cognitive or language score <–1 SD at 24 months’ CA had a normal IQ at 4 years.18  However, children scoring <–2 SD at 24 months universally had an IQ <–1 SD at 4 years. As such, more severe deficits detected on the Bayley-III are unlikely to be false-positives. In addition, the extent to which the Bayley-III underreports developmental delay may vary across different age bands.19  Our study suggests that 18- to 20-month Bayley-III assessments are more apt to underreport developmental delay than assessments at 21–24 months.

The Bayley-III often serves as the end point in studies of preterm children evaluating the neurodevelopmental impact of neonatal exposures and interventions. The NICHD Neonatal Research Network has used 18–22 months’ CA data, and, more recently, 22–26 months.2  In one study, the Neonatal Research Network evaluated the associations between neonatal exposures, such as surgery and bronchopulmonary dysplasia, and 18- to 22-month neurodevelopmental outcomes using the Bayley-III.20,21  However, the CA at Bayley-III assessment was not included in their multivariable models. With our finding that CA at Bayley-III assessment is associated with both neonatal characteristics and Bayley-III scores and diagnoses, it could serve as a confounder in relationships between neonatal exposures and Bayley-III outcomes. As such, future observational studies can consider including CA at Bayley-III assessment in their adjusted analyses.

In addition, the association between CA at Bayley-III assessment and significant developmental delay has implications for benchmarking between units. To compare the rates of developmental delay and other neurodevelopmental diagnoses between level III NICUs, neonatal networks typically adjust for neonatal and sociodemographic characteristics such as GA, rates of BPD, and maternal education. Timing of examination may contribute to site variations in outcomes as well.

The main limitation in our observational study relates to possible nonrandom selection of infants for later Bayley-III assessment. Indeed, infants assessed at 21–24 months’ CA were more often SGA and less often had mothers with college education or higher. We adjusted for the differences in patient characteristics in our analyses but we could not account for unknown confounders. We also performed propensity score-matched analyses, matching environmental and clinical variables to reduce any imbalances between the groups. Nonetheless, we were not able to assess why some infants were evaluated at 21–24 months rather than at 18–20 months or why some sites had a greater proportion of infants evaluated at 21–24 months. In particular, sites with greater variation in timing of Bayley-III assessment may nonrandomly select infants to be evaluated later. To evaluate whether such sites with a higher proportion of Bayley-III assessments at 21–24 months biased our results, we performed sensitivity analyses stratified by site based on the proportion of infants assessed at 18–20 months’ CA.

Our study has other limitations. First, children were not evaluated longitudinally with repeat Bayley evaluations. Instead, we compared 2 groups of children (those evaluated at 18–20 months versus at 21–24 months). Second, we were not able to compare the longer term predictive accuracy of the Bayley-III across 18–24 months’ CA because the CNFUN does not track school-age outcomes. Third, we did not adjust for multiple comparisons when assessing the association between outcomes and CA at Bayley-III assessment. However, all the associations tested showed similar effect directions (association of later evaluation with lower Bayley-III scores), suggesting that the associations were not detected by chance. Fourth, approximately half of eligible extremely preterm children born in Canadian NICUs were not included. However, by excluding sites with <50% follow-up rates, the follow-up rate in our study of 15 sites was a more acceptable 77.6%. In addition, children included were clinically similar to those not included. Fifth, the Bayley-4 has recently replaced the Bayley-III in Canada and elsewhere and it has some important differences aimed at increasing the test’s sensitivity.22  As such, our hypothesis should be retested in the era of the Bayley-4 once a sufficient number of evaluations have been completed within the CNFUN. The main strengths of our study include the number of participants, which allowed us to craft deliberate multivariable models to test our hypothesis, and the inclusion of 15 level III NICUs across the country.

Bayley-III assessments performed at 21–24 months’ CA were independently associated with a higher chance of detecting a significant developmental delay compared with 18- to 20-month CA assessments in extremely preterm children. We speculate that later age of assessment unmasks developmental challenges as the complexity of the tasks increase. Our findings have relevance to neonatal follow-up clinics that rely on the Bayley to identify children with significant developmental delay to trigger referrals for early intervention, observational studies exploring the association between neonatal exposures and 18- to 24-month Bayley-III assessments, and network benchmarking. Clinicians using this standardized assessment tool wanting to identify significant developmental delay in extremely preterm children between 18 and 24 months’ CA may consider evaluating closer to the 24-month mark rather than the 18-month mark to enhance sensitivity, although our findings are hypothesis-generating, and further research will be required in the Bayley-4 era. Prospective studies evaluating the association between the timing of the 18- to 24-month CA assessment among infants <29 weeks with school-age outcomes are required to clarify the clinical impact of the timing of standardized assessments.

The authors thank all site investigators and data abstractors of the Canadian Neonatal Network (CNN) and the Canadian Neonatal Follow-Up Network (CNFUN). A list of CNN site investigators and their affiliations is presented here. We thank the staff at the Maternal-infant Care Research Centre (MiCare) at Mount Sinai Hospital in Toronto, Ontario, Canada, for organizational support of CNN and CNFUN.

Canadian Neonatal Network Site Investigators

Prakesh S. Shah, MD, MSc (Director, Canadian Neonatal Network and Site Investigator), Mount Sinai Hospital, Toronto, Ontario; Marc Beltempo, MD (Associate Director, Canadian Neonatal Network and Site Investigator), Montreal Children’s Hospital at McGill University Health Centre, Montréal, Québec; Jaideep Kanungo, MD, Victoria General Hospital, Victoria, British Columbia; Jonathan Wong, MD, British Columbia Women’s Hospital, Vancouver, British Columbia; Miroslav Stavel, MD, Royal Columbian Hospital, New Westminster, British Columbia; Rebecca Sherlock, MD, Surrey Memorial Hospital, Surrey, British Columbia; Ayman Abou Mehrem, MD, Foothills Medical Centre, Calgary, Alberta; Jennifer Toye, MD, and Joseph Ting, MD, Royal Alexandra Hospital and University of Alberta Hospital, Edmonton, Alberta; Carlos Fajardo, MD, Alberta Children’s Hospital, Calgary, Alberta; Andrei Harabor, MD, Regina General Hospital, Regina, Saskatchewan; Lannae Strueby, MD, Jim Pattison Children’s Hospital, Saskatoon, Saskatchewan; Mary Seshia, MBChB, and Deepak Louis, MD, Winnipeg Health Sciences Centre, Winnipeg, Manitoba; Ruben Alvaro, MD, and Ann Yi, MD, St. Boniface General Hospital, Winnipeg, Manitoba; Amit Mukerji, MD, Hamilton Health Sciences Centre, Hamilton, Ontario; Orlando Da Silva, MD, MSc, London Health Sciences Centre, London, Ontario; Sajit Augustine, MD, Windsor Regional Hospital, Windsor, Ontario; Kyong-Soon Lee, MD, MSc, Hospital for Sick Children, Toronto, Ontario; Eugene Ng, MD, Sunnybrook Health Sciences Centre, Toronto, Ontario; Brigitte Lemyre, MD, The Ottawa Hospital, Ottawa, Ontario; Thierry Daboval, MD, Children’s Hospital of Eastern Ontario, Ottawa, Ontario; Faiza Khurshid, MD, Kingston General Hospital, Kingston, Ontario; Victoria Bizgu, MD, Jewish General Hospital, Montréal, Québec; Keith Barrington, MBChB, Anie Lapointe, MD, and Guillaume Ethier, NNP, Hôpital Sainte-Justine, Montréal, Québec; Christine Drolet, MD, Centre Hospitalier Universitaire de Québec, Sainte Foy, Québec; Martine Claveau, MSc, LLM, NNP, Montreal Children’s Hospital at McGill University Health Centre, Montréal, Québec; Marie St-Hilaire, MD, Hôpital Maisonneuve-Rosemont, Montréal, Québec; Valerie Bertelle, MD, and Edith Masse, MD, Centre Hospitalier Universitaire de Sherbrooke, Sherbrooke, Québec; Caio Barbosa de Oliveira, MD, Moncton Hospital, Moncton, New Brunswick; Hala Makary, MD, Dr. Everett Chalmers Hospital, Fredericton, New Brunswick; Cecil Ojah, MBBS, and Alana Newman, MD, Saint John Regional Hospital, Saint John, New Brunswick; Jo-Anna Hudson, MD, Janeway Children’s Health and Rehabilitation Centre, St. John’s, Newfoundland; Jehier Afifi, MB BCh, MSc, IWK Health Centre, Halifax, Nova Scotia; Andrzej Kajetanowicz, MD, Cape Breton Regional Hospital, Sydney, Nova Scotia; Bruno Piedboeuf, MD (Chairman, Canadian Neonatal Network), Centre Hospitalier Universitaire de Québec, Sainte Foy, Québec.

CNFUN Site Investigators and Steering Committee

Thevanisha Pillay, MD, Victoria General Hospital, Victoria, British Columbia; Anne Synnes, MDCM, MHSc (past Director), Lindsay Colby, RN, BScN, MSN (Steering Committee), and Jill Zwicker, PhD, OT (Steering Committee), British Columbia Children's Hospital, Vancouver, British Columbia; Rebecca Sherlock, MD, Surrey Memorial Hospital, Surrey, British Columbia; Miroslav Stavel, MD, and Anitha Moodley, MD, Royal Columbian Hospital, New Westminster, British Columbia; Leonora Hendson, MD, Alberta Children’s Hospital/Foothills Medical Centre, Calgary, Alberta; Amber Reichert, MD, and Matthew Hicks, MD, PhD (Steering Committee), Glenrose Rehabilitation Hospital, Edmonton, Alberta; Diane Moddemann, MD, MEd, Cecilia de Cabo, MD, and M. Florencia Ricci, MD, PhD (Steering Committee) ORCID, Winnipeg Health Sciences Centre, St Boniface General Hospital, Winnipeg, Manitoba; Sajit Augustine, MD, Windsor Regional Hospital, Windsor, Ontario; Sarah McKnight, MD, Kingston General Hospital, Kingston, Ontario; Kevin Coughlin, MD, Children’s Hospital London Health Sciences Centre, London, Ontario; Linh Ly, MD, Hospital for Sick Children, Toronto, Ontario; Edmond Kelly, MD, Mount Sinai Hospital, Toronto, Ontario; Karen Thomas, MD (Steering Committee), Hamilton Health Sciences Centre, Hamilton, Ontario; Paige Church, MD, and Rudaina Banihani, MD (Steering Committee), Sunnybrook Health Sciences Centre, Toronto, Ontario; Kim-Anh Nguyen, MD, and Ruth Mandel, MD, Jewish General Hospital, Montréal, Québec; May Khairy, MD, Jarred Garfinkle, MD, and Marc Beltempo, MD, Montréal Children's Hospital, Montréal, Québec; Thuy Mai Luu, MD, MSc (Director), Centre Hospitalier Universitaire Sainte-Justine, Montréal, Québec; Alyssa Morin, MD, and Sylvie Bélanger, MD, Centre Hospitalier Universitaire de Québec, Québec City, Québec; and Jehier Afifi, MB, BCh, MSc (Co-Director), IWK Health Centre, Halifax, Nova Scotia

A complete list of Canadian Neonatal Network and the Canadian Neonatal Follow-Up Network Investigators can be found in the Acknowledgments.

Dr Garfinkle conceptualized and designed the study, interpreted the data, drafted the initial manuscript, and critically reviewed and revised the manuscript; Drs Khairy, Luu, Beltempo, Simard, Wong, and Shah conceptualized and designed the study, interpreted the data, and critically reviewed and revised the manuscript; and all authors approved the final manuscript as submitted and agree to be accountable for all aspects of the work.

COMPANION PAPER: A companion to this article can be found online at www.pediatrics.org/cgi/doi/10.1542/peds.2023-063801.

FUNDING: Although no specific funding was received for this study, organizational support for the Canadian Neonatal Network and the Canadian Neonatal Follow-Up Network was provided by the Maternal-Infant Care Research Centre (MiCare) at Mount Sinai Hospital, Toronto. MiCare is supported by the Canadian Institutes of Health Research (CTP 87518) and Mount Sinai Hospital. The funding bodies had no role in the design or conduct of the study; the collection, management, analysis, or interpretation of the data; the preparation, review, or approval of the manuscript; or the decision to submit the manuscript for publication.

CONFLICT OF INTEREST DISCLOSURES: The authors have indicated they have no potential conflicts of interest to disclose.

Bayley

Bayley Scales of Infant and Toddler Development

Bayley-III

Bayley Scales of Infant and Toddler Development, 3rd edition

BPD

bronchopulmonary dysplasia

CA

corrected age

CI

confidence interval

CNFUN

Canadian Neonatal Follow-Up Network

CNN

Canadian Neonatal Network

GA

gestational age

IQR

interquartile range

OR

odds ratio

SGA

small for gestational age

1
Wolke
D
,
Johnson
S
,
Mendonça
M
.
The life course consequences of very preterm birth
.
Annu Rev Dev Psychol
.
2019
;
1
:
69
92
2
Kilbride
HW
,
Vohr
BR
,
McGowan
EM
, et al
.
Early neurodevelopmental follow-up in the NICHD neonatal research network: advancing neonatal care and outcomes, opportunities for the future
.
Semin Perinatol
.
2022
:
46
(
7
):
151642
3
Bayley
N
.
The Bayley scales of infant and toddler development
, 3rd ed.
Harcourt Assessment
;
2006
4
Del Rosario
C
,
Slevin
M
,
Molloy
EJ
,
Quigley
J
,
Nixon
E
.
How to use the Bayley scales of infant and toddler development
.
Arch Dis Child Educ Pract Ed
.
2021
;
106
(
2
):
108
112
5
Spittle
A
,
Treyvaud
K
.
The role of early developmental intervention to influence neurobehavioral outcomes of children born preterm
.
Semin Perinatol
.
2016
;
40
(
8
):
542
548
6
Canadian Neonatal Network
.
Abstractor’s manual. Available at: www.canadianneonatalnetwork.org/portal/CNNHome/Publications.aspx. Accessed October 15, 2018
7
Synnes
A
,
Luu
TM
,
Moddemann
D
, et al
;
Canadian Neonatal Network and the Canadian Neonatal Follow-Up Network
.
Determinants of developmental outcomes in a very preterm Canadian cohort
.
Arch Dis Child Fetal Neonatal Ed
.
2017
;
102
(
3
):
F235
F234
8
Johnson
S
,
Moore
T
,
Marlow
N
.
Using the Bayley-III to assess neurodevelopmental delay: which cut-off should be used?
Pediatr Res
.
2014
;
75
(
5
):
670
674
9
Rosenbaum
P
,
Paneth
N
,
Leviton
A
, et al
.
A report: the definition and classification of cerebral palsy April 2006
.
Dev Med Child Neurol Suppl
.
2007
;
109
:
8
14
10
Palisano
R
,
Rosenbaum
P
,
Walter
S
,
Russell
D
,
Wood
E
,
Galuppi
B
.
Development and reliability of a system to classify gross motor function in children with cerebral palsy
.
Dev Med Child Neurol
.
1997
;
39
(
4
):
214
223
11
Kramer
MS
,
Platt
RW
,
Wen
SW
, et al
;
Fetal/Infant Health Study Group of the Canadian Perinatal Surveillance System
.
A new and improved population-based Canadian reference for birth weight for gestational age
.
Pediatrics
.
2001
;
108
(
2
):
E35
12
Shennan
AT
,
Dunn
MS
,
Ohlsson
A
,
Lennox
K
,
Hoskins
EM
.
Abnormal pulmonary outcomes in premature infants: prediction from oxygen requirement in the neonatal period
.
Pediatrics
.
1988
;
82
(
4
):
527
532
13
Bell
MJ
,
Ternberg
JL
,
Feigin
RD
, et al
.
Neonatal necrotizing enterocolitis. Therapeutic decisions based upon clinical staging
.
Ann Surg
.
1978
;
187
(
1
):
1
7
14
Hanley
JA
,
Negassa
A
,
Edwardes
MD
,
Forrester
JE
.
Statistical analysis of correlated data using generalized estimating equations: an orientation
.
Am J Epidemiol
.
2003
;
157
(
4
):
364
375
15
Anderson
PJ
,
Burnett
A
.
Assessing developmental delay in early childhood - concerns with the Bayley-III scales
.
Clin Neuropsychol
.
2017
;
31
(
2
):
371
381
16
Lobo
MA
,
Paul
DA
,
Mackley
A
,
Maher
J
,
Galloway
JC
.
Instability of delay classification and determination of early intervention eligibility in the first two years of life
.
Res Dev Disabil
.
2014
;
35
(
1
):
117
126
17
Taylor
GL
,
Joseph
RM
,
Kuban
KCK
, et al
.
Changes in neurodevelopmental outcomes from age 2 to 10 years for children born extremely preterm
.
Pediatrics
.
2021
;
147
(
5
):
e2020001040
18
Bode
MM
,
D’Eugenio
DB
,
Mettelman
BB
,
Gross
SJ
.
Predictive validity of the Bayley, third edition at 2 years for intelligence quotient at 4 years in preterm infants
.
J Dev Behav Pediatr
.
2014
;
35
(
9
):
570
575
19
Aylward
GP
.
Continuing issues with the Bayley-III: where to go from here
.
J Dev Behav Pediatr
.
2013
;
34
(
9
):
697
701
20
Morriss
FH
Jr
,
Saha
S
,
Bell
EF
, et al
;
Eunice Kennedy Shriver National Institute of Child Health and Human Development Neonatal Research Network
.
Surgery and neurodevelopmental outcome of very low-birth-weight infants
.
JAMA Pediatr
.
2014
;
168
(
8
):
746
754
21
Natarajan
G
,
Pappas
A
,
Shankaran
S
, et al
.
Outcomes of extremely low birth weight infants with bronchopulmonary dysplasia: impact of the physiologic definition
.
Early Hum Dev
.
2012
;
88
(
7
):
509
515
22
Aylward
GP
. The new test. In:
Aylward
GP
, ed.
Bayley 4 Clinical Use and Interpretation
.
Academic Press
;
2020
:
21
33

Supplementary data