Video Abstract

Video Abstract

Close modal
CONTEXT

Discovering new interventions to improve neurodevelopmental outcomes is a priority; however, clinical trials are challenging and methodological issues may impact the interpretation of intervention efficacy.

OBJECTIVES

Characterize the proportion of infant neurodevelopment trials reporting a null finding and identify features that may contribute to a null result.

DATA SOURCES

The Cochrane library, Medline, Embase, and CINAHL databases.

STUDY SELECTION

Randomized controlled trials recruiting infants aged <6 months comparing any “infant-directed” intervention against standard care, placebo, or another intervention. Neurodevelopment assessed as the primary outcome between 12 months and 10 years of age using a defined list of tools.

DATA EXTRACTION

Two reviewers independently extracted data and assessed quality of included studies.

RESULTS

Of n = 1283 records screened, 21 studies (from 20 reports) were included. Of 18 superiority studies, >70% reported a null finding. Features were identified that may have contributed to the high proportion of null findings, including selection and timing of the primary outcome measure, anticipated effect size, sample size and power, and statistical analysis methodology and rigor.

LIMITATIONS

Publication bias against null studies means the proportion of null findings is likely underestimated. Studies assessing neurodevelopment as a secondary or within a composite outcome were excluded.

CONCLUSIONS

This review identified a high proportion of infant neurodevelopmental trials that produced a null finding and detected several methodological and design considerations which may have contributed. We make several recommendations for future trials, including more sophisticated approaches to trial design, outcome assessment, and analysis.

Over the last few decades, implementation of antenatal interventions to reduce the risk of brain injury (eg, administration of corticosteroids) and neuroprotective strategies for newborns (eg, magnesium sulfate and therapeutic hypothermia), as well as advancements in the management of high-risk neonates and infants have resulted in a significant decline in mortality, particularly in high-income countries.1  Increased survival has resulted in a shift in focus toward reduction in morbidity, which remains a priority for families.2  Neurodevelopmental impairment has long-term implications, such as lower educational attainment, wealth, and job-related income; increased risk for psychiatric and mood disorders and social isolation; and decreased likelihood of achieving independent living, marriage, and employment.35  Thus, there is a need to identify and trial new interventions targeted at improving neurodevelopment and other outcomes which affect quality of life (eg, chronic pain, blindness, and deafness) for at-risk infants.

Despite a strong, collective desire to improve long-term neurodevelopmental outcomes, literature has highlighted some of the challenges associated with capturing robust and meaningful outcome data in clinical trials focused on at-risk infants.69  Some of these challenges are relevant to clinical trials in general, such as ethics, consent, recruitment, follow-up, cost, infrastructure, and personnel. Others, however, are more specific to the assessment of long-term neurodevelopmental outcome, including the natural variability in human development, especially during infancy, which is a period of rapid development, and the nature of neurodevelopment as a continuum.10,11  Variation in parenting practices and socioeconomic affordances also affect outcomes.12,13  In addition, issues related to the validity, reliability (including sensitivity and specificity to predict prognosis, and responsivity to detect change), and application of available assessment tools provide additional challenges.14  Most conventional tools were designed psychometrically for discriminative purposes (ie, discriminating normal from abnormal) and not for detecting change from intervention. Yet another factor complicating reliable neurodevelopmental assessment in at-risk infants is that many assessment tools rely on fine motor manipulation and/or verbal output to demonstrate ability across domains.14  As such, children with a motor and/or speech delay may be “untestable” or unable to accurately demonstrate ability across other domains, such as cognition. This could create significant inaccuracies if cognition was the primary outcome for an intervention study, for example.14 

Although some interventions will fail to produce a between-group difference in superiority studies because they are truly ineffective, it is also possible that some clinical trials assessing neurodevelopmental outcomes will generate a null finding because of the inherent challenges of conducting neurodevelopmental clinical trials, and/or because of their choice of outcome measure. Because clinical trials are costly to run in terms of time, money, and staffing resources, careful design, including selection of appropriate outcome measure/s, is critical to reduce research wastage and ensure that effective interventions are translated into clinical care.

To better understand the challenges related to conducting infant clinical trials aimed at improving neurodevelopment, we conducted a systematic review with the following aims:

  1. to identify the most commonly used outcome assessment tools in infant neurodevelopmental clinical trials;

  2. to determine the proportion of infant neurodevelopmental clinical trials reporting a null finding versus a finding on the primary outcome; and

  3. to identify methodological features of infant neurodevelopmental clinical trials that could contribute to a null finding result.

Ultimately, the objective of this systematic review was to make recommendations for the field to maximize the likelihood of achieving successful results by avoiding fatal trial design flaws.

The protocol for this review was prospectively registered on PROSPERO (CRD42019129004, registration date August 27, 2019). We conducted a systematic review using the standard methods of the Cochrane Neonatal Review Group.

Before conducting this systematic review, we consulted a panel of 14 international experts (7 clinical neuropsychologists and/or psychologists, 4 pediatric allied health professionals, and 3 specialist neurology/pediatric physicians, from 5 countries) to generate a list of the most common neurodevelopmental assessment tools for infants and children. Sixteen tools were identified, and for each tool, we derived a list of keywords to incorporate into our search terms (Supplemental Table 4).

We included randomized, quasirandomized, and cluster-randomized controlled clinical trials that compared the neurodevelopmental outcome/s of any “infant-directed” intervention versus standard care or placebo or another intervention, administered to infants recruited before 6 months of age. The neurodevelopmental outcome must have been assessed between 12 months (corrected) and 10 years of age using any edition of at least 1of the 16 tools listed in Supplemental Table 4.

Studies were excluded from this review if they met the following criteria:

  1. intervention was exclu sively “parental education” because this was deemed not infant-directed;

  2. absence of an explicit primary outcome (ie, multiple potential primary outcomes with no explicit nomination);

  3. reported neurodevelop mental outcome as a secondary outcome/follow-up analysis only; or

  4. used a composite outcome of multiple measures (excluding composite scores within a single neurodevelopmental tool).

We searched the Cochrane Central Register of Controlled Trials (CENTRAL) (the Cochrane Library, latest issue), PubMed (Medline), and Embase using OVID, in addition to the CINAHL database. The search strategy is described in the supplemental material (Supplemental Table 5). Searches were limited to English language articles and publication date 2009 onward because of the significant improvement in neonatal care practices (eg, magnesium sulfate) and guidelines documented since then.15  The search was initially conducted on May 6, 2021, and was rerun on December 13, 2021.

Deduplicated results from OVID and CINAHL were combined and exported into EndNote (version ×9). Additional deduplication was conducted before results were imported into Covidence Systematic Review Software (http://www.covidence.org).

Titles and/or abstracts of studies retrieved using the search strategy were screened independently by two review authors (M.F.E. and M.P.). Full texts of studies were then retrieved and independently assessed for eligibility by two authors (M.P., M.F.E., I.H.), with any disagreements resolved by the third author.

Data extraction was performed by at least two review authors (M.P., M.F.E., I.H.), with any discrepancies identified and resolved through discussion with the third author. Extracted data included details of participants, intervention, comparator, and primary outcome. Details of the statistical analysis and trial results, including P value, were also captured. Where possible, the unadjusted (or least adjusted) P value, derived from intention-to-treat (ITT) analysis, was extracted. When recording the outcome of a study (finding versus null finding), a study was determined to be a null finding if the size of the clinical effect of the primary outcome did not meet their anticipated effect size, even if the authors found a statistically significant between-group difference.

For risk of bias, two authors (split between I.N., M.P., M.F.E., and A.T.) independently analyzed each study as per the updated Cochrane risk-of-bias tool 2.16  Bias in studies was graded as low, some concerns, or high for each of the 5 domains before an overall assessment was made.

As a descriptive analysis of results, we reported the overall proportion of studies with/without a null finding; then, within each group (finding versus null finding), we reported on various trial aspects, including: (1) selection of primary outcome measure, (2) timing of primary endpoint, (3) anticipated clinical effect size, (4) study sample size and power, (5) statistical analysis methodology and rigor, and (6) methodological quality.

After the literature search and deduplication, 1283 records were identified. After title and abstract screening, 121 full-text reports were reviewed and 20 met eligibility criteria.1736  These 20 reports included 21 studies since Kimberlin 201124  reported 2 parallel trials investigating 2 distinct participant subpopulations:

  1. infants with herpes simplex virus (HSV) disease with central nervous system involvement; and

  2. infants with HSV with skin, eye, and mouth involvement only.

The Preferred Reporting Items for Systemic Reviews and Meta-Analyses37  flowchart of the search process is presented in Fig 1.

FIGURE 1

Preferred Reporting Items for Systemic Reviews and Meta-Analyses flow diagram of the study selection process.

FIGURE 1

Preferred Reporting Items for Systemic Reviews and Meta-Analyses flow diagram of the study selection process.

Close modal

Summaries of the included studies are presented in Table 1, including details of the participants, intervention, and comparator, and Table 2, detailing primary outcome, statistical analysis, and results. Of the 21 included studies, the majority (n = 18) were superiority trials, with 3 designed to determine either equivalence or noninferiority.

TABLE 1

Summary of Extracted Participant, Intervention, Comparator and Outcome Data From Included Studies

SourceParticipantsIntervention and ComparatorPrimary Outcome/s
TypeIntervention DetailsComparator DetailsAge at OnsetDuration
Andrew 201817  Infants with neurologic impairment risk factors (eg, preterm infants with low birth weight or a brain injury) Nutritional supplement Treatment supplement with DHA, eicosapentaenoic acid, arachidonic acid, choline, UMP, cytidine monophosphate, vitamin B12, zinc, and iodine, given daily mixed into formula, breast milk, or food Control supplement, given daily mixed into formula, breast milk, or food (placebo) NS (once infants were on full milk feeds) 2 y from trial entry BSID-III: cognitive composite score 
Balakrishnan 201818  Infants with very low birth weight (<1250 g) Nutritional supplement High-dose parenteral amino acids in HAL solution (increasing quickly by 3–4 g/kg per d) Standard-dose parenteral amino acids in HAL solution (increasing slowly by 0.5 g/kg per d) Within 19 h of birth As clinically indicated (HAL administered until on full feeds) BSID-III: cognitive composite score, language composite score, and motor composite score 
Carlo 201319  Infants with birth asphyxia who received bag and mask ventilationa Behavioral Home-based, early developmental intervention, implemented daily by parents with fortnightly trainer visits Fortnightly health and safety counseling (standard care) Within first mo of life Until 36 mo of age BSID-II: mental development index 
da Cunha 201620  Preterm infants with very low birth weight (<1500 g) Nutritional supplement Breast milk supplemented with a multinutrient given twice daily Breast milk alone 7–10 d after NICU discharge Until 4–6 mo’ corrected age BSID-III: motor composite score, cognitive composite score, and language composite score 
Field 201321  Infants with acute respiratory failure requiring ECMO Procedural Cooling (34°C for the first 48–72 h) while administering ECMO Normothermic (37°C) ECMO (standard care) As clinically indicated 48–72 h from ECMO initiation, standard ECMO course as required BSID-III: cognitive composite score 
Hulzebos 201422  Preterm infants (≤31 + 6 wk’ GA) Procedural Treatment decisions based on TSB/albumin ratio, together with TSB for evaluation of hyperbilirubinemia Treatment decisions based on TSB level only (standard care) NS (as required) First 10 d of monitoring/treatment BSID-III: motor composite score 
Khan 201823  Healthy infants visiting private GP clinic, >2500 g at birth, living in poor, urban location Behavioral Clinic-based, flip-book program including parent training for age-appropriate activities for early childhood development, improved nutrition, and management of mother’s depression; mother implemented with quarterly counseling by clinic assistants Routine care in control (nonchild development) clinics (standard care) <40 d old From 3–9 mo of age ASQ-3, Urdu: communication, gross motor, fine motor, problem-solving, and personal–social 
Kimberlin 201124 
Kimberlin 201124  
Infants with HSV disease with CNS involvement
Infants with HSV disease with skin, eye, and mouth disease only 
Drug Oral acyclovir, 300 mg per square meter of body-surface area, 3 times daily (after initial parenteral administration) NS (placebo) Within 28 d of life 6 mo from onset BSID-II: mental development index
BSID-II: mental development index 
Kulkarni 201725  Infants with postinfectious hydrocephalus Procedural Endoscopic third ventriculostomy with choroid plexus cauterization Ventriculoperitoneal shunting (standard care) As required (within 180 d of birth) Once-off surgical procedure BSID-III: cognitive scaled score 
Li 201926  Healthy, term infants Nutritional supplement Formula supplemented with bovine milk fat globule membrane and lactoferrin given exclusively Control formula given exclusively (placebo) D 10–14 of life Until 12 mo of age BSID-III: cognitive composite score 
McCann 201927  Infants (born >26 wk’ GA) scheduled for inguinal herniorrhaphy Procedural General anesthesia Awake–regional anesthesia ≤60 wk’ postmenstrual age Once-off during surgery WPPSI-III: full-scale IQ score 
Nair, 200928  Term infants with postasphyxial encephalopathy Drug Pyritinol, increasing dose from 20 mg per d to 100 mg per d by 6 mo NS (placebo) D 8 of life Until 6 mo of age BSID-II, Baroda, India norms: mental development index and psychomotor development index 
Nair 200929  At-risk infants (infants admitted to level II neonatal nursery) Behavioral Home-based program, including visual, auditory, tactile, and vestibular-kinaesthetic stimulations, parent-administered with monthly follow-up visits Routine postnatal checkup (standard care) NS Until 12 mo of age BSID-II, Baroda, India norms: mental development index and psychomotor development index 
Natalucci 201630  Very preterm infants (26 + 0–31 + 6 wk’ GA) Drug High-dose recombinant human erythropoietin, 3000 U/kg intravenously Isotonic saline (placebo) Within 3 h of birth 3 doses at 3, 12–18 and 36–42 h after birth BSID-II: mental development index 
O’Connor 201631  Very low birth weight infants (<1500 g) Nutritional supplement Nutrient-fortified, pasteurized donor breast milk, to supplement mother’s milk Preterm formula, to supplement mother’s milk Within 96 h of birth 90 d from onset or until discharge from the hospital BSID-III: cognitive composite score 
Shi 202032  Healthy, term infants, >2500 g at birth, living in urban, developing communities Behavioral Clinic-based program, including parent training for age-appropriate games and activities; parenting training sessions for child development, feeding, parent–child communication and early stimulation skills; telephone intervention for children at risk for developmental delay; parent implemented with 2 training sessions from child development experts Routine primary health care services (standard care) 1–2 mo of age Until 14 mo of age ASQ-3, Chinese: total score 
Spittle 201033  Preterm infants (<30 wk’ GA) Behavioral Home-based program to support infant development, parent mental health, and the parent–infant relationship, delivered by a psychologist and a physiotherapist Routine follow-up care (standard care) Term-equivalent age 9 visits during first 12 mo of age BSID-III: cognitive composite score, motor composite score, and language composite score 
van Kempen 202034  Otherwise healthy newborns (born >2000 g, ≥35 wk’ GA), found to have asymptomatic moderate hypoglycemia, falling into 4 risk subgroups Procedural Treatment decision based on lower (36 mg/dL) glucose concentration threshold for neonatal hypoglycemia Treatment decision based on traditional (47 mg/dL) glucose concentration (standard care) 3–24 h after birth As clinically indicated BSID-III, Dutch: cognitive composite score and motor composite score 
Williams 201735  Extremely preterm infants (<31 wk’ GA) Nutritional supplement Sodium iodide solution, 30 µg/kg per d, given daily Sodium chloride solution, 30 µg/kg per d, given daily (placebo) Within 42 h of birth Until 34 wk’ GA equivalent BSID-III: cognitive composite score, motor composite score, and language composite score 
Xia 202136  Healthy, term infants Nutritional supplement Formula supplemented with milk fat globule membrane (17.9 mg gangliosides/100 g for 0–6 mo, then 16.9 mg/100 g for 6–12 mo), given exclusively Control formula, given exclusively NS Until 12 mo of age BSID-III: cognitive composite score, language composite score, motor composite score, social–emotional composite score, and general adaptive behavior composite score 
SourceParticipantsIntervention and ComparatorPrimary Outcome/s
TypeIntervention DetailsComparator DetailsAge at OnsetDuration
Andrew 201817  Infants with neurologic impairment risk factors (eg, preterm infants with low birth weight or a brain injury) Nutritional supplement Treatment supplement with DHA, eicosapentaenoic acid, arachidonic acid, choline, UMP, cytidine monophosphate, vitamin B12, zinc, and iodine, given daily mixed into formula, breast milk, or food Control supplement, given daily mixed into formula, breast milk, or food (placebo) NS (once infants were on full milk feeds) 2 y from trial entry BSID-III: cognitive composite score 
Balakrishnan 201818  Infants with very low birth weight (<1250 g) Nutritional supplement High-dose parenteral amino acids in HAL solution (increasing quickly by 3–4 g/kg per d) Standard-dose parenteral amino acids in HAL solution (increasing slowly by 0.5 g/kg per d) Within 19 h of birth As clinically indicated (HAL administered until on full feeds) BSID-III: cognitive composite score, language composite score, and motor composite score 
Carlo 201319  Infants with birth asphyxia who received bag and mask ventilationa Behavioral Home-based, early developmental intervention, implemented daily by parents with fortnightly trainer visits Fortnightly health and safety counseling (standard care) Within first mo of life Until 36 mo of age BSID-II: mental development index 
da Cunha 201620  Preterm infants with very low birth weight (<1500 g) Nutritional supplement Breast milk supplemented with a multinutrient given twice daily Breast milk alone 7–10 d after NICU discharge Until 4–6 mo’ corrected age BSID-III: motor composite score, cognitive composite score, and language composite score 
Field 201321  Infants with acute respiratory failure requiring ECMO Procedural Cooling (34°C for the first 48–72 h) while administering ECMO Normothermic (37°C) ECMO (standard care) As clinically indicated 48–72 h from ECMO initiation, standard ECMO course as required BSID-III: cognitive composite score 
Hulzebos 201422  Preterm infants (≤31 + 6 wk’ GA) Procedural Treatment decisions based on TSB/albumin ratio, together with TSB for evaluation of hyperbilirubinemia Treatment decisions based on TSB level only (standard care) NS (as required) First 10 d of monitoring/treatment BSID-III: motor composite score 
Khan 201823  Healthy infants visiting private GP clinic, >2500 g at birth, living in poor, urban location Behavioral Clinic-based, flip-book program including parent training for age-appropriate activities for early childhood development, improved nutrition, and management of mother’s depression; mother implemented with quarterly counseling by clinic assistants Routine care in control (nonchild development) clinics (standard care) <40 d old From 3–9 mo of age ASQ-3, Urdu: communication, gross motor, fine motor, problem-solving, and personal–social 
Kimberlin 201124 
Kimberlin 201124  
Infants with HSV disease with CNS involvement
Infants with HSV disease with skin, eye, and mouth disease only 
Drug Oral acyclovir, 300 mg per square meter of body-surface area, 3 times daily (after initial parenteral administration) NS (placebo) Within 28 d of life 6 mo from onset BSID-II: mental development index
BSID-II: mental development index 
Kulkarni 201725  Infants with postinfectious hydrocephalus Procedural Endoscopic third ventriculostomy with choroid plexus cauterization Ventriculoperitoneal shunting (standard care) As required (within 180 d of birth) Once-off surgical procedure BSID-III: cognitive scaled score 
Li 201926  Healthy, term infants Nutritional supplement Formula supplemented with bovine milk fat globule membrane and lactoferrin given exclusively Control formula given exclusively (placebo) D 10–14 of life Until 12 mo of age BSID-III: cognitive composite score 
McCann 201927  Infants (born >26 wk’ GA) scheduled for inguinal herniorrhaphy Procedural General anesthesia Awake–regional anesthesia ≤60 wk’ postmenstrual age Once-off during surgery WPPSI-III: full-scale IQ score 
Nair, 200928  Term infants with postasphyxial encephalopathy Drug Pyritinol, increasing dose from 20 mg per d to 100 mg per d by 6 mo NS (placebo) D 8 of life Until 6 mo of age BSID-II, Baroda, India norms: mental development index and psychomotor development index 
Nair 200929  At-risk infants (infants admitted to level II neonatal nursery) Behavioral Home-based program, including visual, auditory, tactile, and vestibular-kinaesthetic stimulations, parent-administered with monthly follow-up visits Routine postnatal checkup (standard care) NS Until 12 mo of age BSID-II, Baroda, India norms: mental development index and psychomotor development index 
Natalucci 201630  Very preterm infants (26 + 0–31 + 6 wk’ GA) Drug High-dose recombinant human erythropoietin, 3000 U/kg intravenously Isotonic saline (placebo) Within 3 h of birth 3 doses at 3, 12–18 and 36–42 h after birth BSID-II: mental development index 
O’Connor 201631  Very low birth weight infants (<1500 g) Nutritional supplement Nutrient-fortified, pasteurized donor breast milk, to supplement mother’s milk Preterm formula, to supplement mother’s milk Within 96 h of birth 90 d from onset or until discharge from the hospital BSID-III: cognitive composite score 
Shi 202032  Healthy, term infants, >2500 g at birth, living in urban, developing communities Behavioral Clinic-based program, including parent training for age-appropriate games and activities; parenting training sessions for child development, feeding, parent–child communication and early stimulation skills; telephone intervention for children at risk for developmental delay; parent implemented with 2 training sessions from child development experts Routine primary health care services (standard care) 1–2 mo of age Until 14 mo of age ASQ-3, Chinese: total score 
Spittle 201033  Preterm infants (<30 wk’ GA) Behavioral Home-based program to support infant development, parent mental health, and the parent–infant relationship, delivered by a psychologist and a physiotherapist Routine follow-up care (standard care) Term-equivalent age 9 visits during first 12 mo of age BSID-III: cognitive composite score, motor composite score, and language composite score 
van Kempen 202034  Otherwise healthy newborns (born >2000 g, ≥35 wk’ GA), found to have asymptomatic moderate hypoglycemia, falling into 4 risk subgroups Procedural Treatment decision based on lower (36 mg/dL) glucose concentration threshold for neonatal hypoglycemia Treatment decision based on traditional (47 mg/dL) glucose concentration (standard care) 3–24 h after birth As clinically indicated BSID-III, Dutch: cognitive composite score and motor composite score 
Williams 201735  Extremely preterm infants (<31 wk’ GA) Nutritional supplement Sodium iodide solution, 30 µg/kg per d, given daily Sodium chloride solution, 30 µg/kg per d, given daily (placebo) Within 42 h of birth Until 34 wk’ GA equivalent BSID-III: cognitive composite score, motor composite score, and language composite score 
Xia 202136  Healthy, term infants Nutritional supplement Formula supplemented with milk fat globule membrane (17.9 mg gangliosides/100 g for 0–6 mo, then 16.9 mg/100 g for 6–12 mo), given exclusively Control formula, given exclusively NS Until 12 mo of age BSID-III: cognitive composite score, language composite score, motor composite score, social–emotional composite score, and general adaptive behavior composite score 

CNS, central nervous system; DHA, docosahexaenoic acid; ECMO, extracorporeal membrane oxygenation; GA, gestational age; GP, general practitioner; HAL, hyperalimentation; NS, not specified; TSB, total serum bilirubin; UMP, uridine-5-monophosphate; WPPSI, Wechsler Preschool and Primary Scale of Intelligence.

a

This study also included infants who did not require resuscitation who were randomized to 2 groups (intervention and control). These results are not presented here because the study objective was to determine if the intervention improves neurodevelopment in resuscitated children.

TABLE 2

Summary of Extracted Primary Outcome, Statistical Analysis and Results Data From Included Studies

SourcePrimary Outcome, Statistical Analysis, and Results
TestSubdomain/ Outcome ScoreAge AssessedAnticipated Effect SizeTarget Power LevelRecruitment Target (n)Analysis Target (n)Analyzed n (Intervention, Control)Sample Size Met?aStated ITT Analysis?Missing Data Strategy Provided?Analysis MethodDetected Effect SizeOutcomePb
Andrew 201817  BSID- III Cognitive composite score 24 mo 12.5 points 80% 60 (30 per group) 48 (24 per group) 43 (24, 19) No Yes Yes Parametric complex 9.0 points Null .13 
Balakrishnan 201818  BSID-III Cognitive composite score 18–24 mo CA 8.5 points 80% 168 (84 per group) 100 (50 per group) 112 (54, 58) Yes No No Parametric simple 0.4 points Null .86 
Language composite score 2.3 points Null .41 
Motor composite score 0.1 points Null .99 
Carlo 201319  BSID-II Mental development index 36 mo 10 points 90% 120 (60 per group)c 80 (40 per group) 123 (59,64) Yes Yes No Parametric Simple 4.6 points Null .0202 
da Cunha 201620  BSID-III Motor composite score 12 mo CA 10 points 80% 46 (23 per group) 46 (23 per group) 53 (26, 27) Yes No No Parametric simple 4.6 points Null .174 
Cognitive composite score 2.6 points Null .443 
Language composite score 3.4 points Null .344 
Field 201321  BSID-III Cognitive composite score 24 mo CA 10 points 90% 118 (59 per group) 94 (47 per group) 93 (45, 48) Yes Yes Yes Parametric simple 2.6 points Null NS 
Hulzebos 201422  BSID-III Motor composite score 18–24 mo CA 7 points 80% 614 (307 per group) 434 (217 per group) 480 (237, 243) Yes Yes No Parametric simple 1 point Null .49 
Khan 201823  ASQ-3, Urdu Communication, gross motor, fine motor, problem-solving, and personal– sociald 12 mo 20% absolute difference in risk of delay 80% 2112 (1056 per group) 1900 (950 per group) 1957 (1037, 920) Yes Yes No Parametric simple 18% difference in risk Finding <.001 
Kimberlin 201124  BSID-II Mental development index 12 mo 20% absolute difference in risk of no or mild impairment 80% 66 (33 per group)e 58 (29 per group)e 28 (16, 12) No No No Parametric complex 20.1 pointsf Finding .046 
Kimberlin 201124  BSID-II Mental development index 12 mo 20% absolute difference in risk of no or mild impairment 80% 66 (33 per group)e 58 (29 per group)e 15 (8, 7) No No No Parametric simple NS Null NS 
Kulkarni 201725  BSID-III Cognitive scaled score 12 mo postsurgery 3 points 90% 100 (50 per group) 75 (37 per group) 94 (47, 47) Yes Yes Yes Nonparametric simple 2 points Equivalence confirmed .35 
Li 201926  BSID-III Cognitive composite score 12 mo 5 points 80% 450 (225 per group) 286 (143 per group) 291 (143, 148) Yes No No Parametric simple 8.7 points Finding <.001 
McCann 201927  WPPSI-III Full-scale IQ score 5 y ± 4 mo <5 points 90% 720 (360 per group) 598 (299 per group) 719 (358, 361) Yes Nog Yes Parametric complex 0.2 points Equivalence confirmed NS 
Nair 200928  BSID-II, Baroda, India norms Mental development index
Psychomotor development index 
12 mo 16 points 90% 108 (54 per group) 102 (51 per group) 100 (51, 49) Yes Yes No Parametric simple 1.2 points
4.3 points 
Null
Null 
.75
.31 
Nair 200929  BSID-II, Baroda, India norms Mental development index 12 mo
24 mo 
4 points 80% 800 (400 per group) 672 (336 per group) 12 mo: 665 (324, 341) 24 mo: 735 (358, 377) Yes No No Parametric simple 5.1 points
7.2 points 
Finding
Finding 
<.001
<.005 
  Psychomotor development index 12 mo
24 mo 
         2.8 points
4.1 points 
Finding
Finding 
<.001
<.005 
Natalucci 201630  BSID-II Mental development index 24 mo CA 16 points 90% 422 (211 per group) 352 (176 per group) 365 (191, 174) Yes Yes Yes Parametric simple 1 point Null .56 
O’Connor 201631  BSID-III Cognitive composite score 18 mo CA 5 points 80% 352 (176 per group) 282 (141 per group) 299 (151, 148) Yes Yes Yes Parametric complex 1.6 points Null .41 
Shi 202032  ASQ-3, Chinese Total score 14 mo 10 points 80% 166 (83 per group) 132 (66 per group) 140 (71, 69) Yes No No Parametric complex z score = 0.25 Finding <.01 
Spittle 201033  BSID-III Cognitive composite score 24 mo CA 6 points 80% 120 (60 per group) 120 (60 per group)h 115 (58, 57) No No No Parametric simple 3.4 points Null .20 
Motor composite score 1.4 points Null .66 
Language composite score 1.3 points Null .67 
van Kempen 202034  BSID-III, Dutch Cognitive composite score 18 mo CA 7.5 points 90% 800 (400 per group; 200 per risk subgroup) 680 (340 per group; 170 per risk subgroup) 582 (287, 295) No Yes Yes Parametric simple 0.7 points Noninferiority confirmed NS 
Motor composite score 0.3 points Noninferiority confirmed NS 
Williams 201735  BSID-III Cognitive composite score 24 ± 1 mo CA 6 points 90% 1400 (700 per group) 1400 (700 per group) 1259 (631, 628) No Yes Yes Parametric simple 0.3 points Null .77 
Motor composite score 0.2 points Null .87 
Language composite score 0.0 points Null .97 
Xia 202136  BSID-III Cognitive composite score 12 mo 4 points 90% 240 (120 per group) 176 (88 per group) 175 (92, 83) No Yes No Parametric simple 2.6 points Null .77 
Language composite score 0.4 points Null .27 
Motor composite score 0.9 points Null .80 
Social–emotional composite score 3.5 points Nulli .82 
General adaptive behavior composite score 5.6 points Nullj .06 
SourcePrimary Outcome, Statistical Analysis, and Results
TestSubdomain/ Outcome ScoreAge AssessedAnticipated Effect SizeTarget Power LevelRecruitment Target (n)Analysis Target (n)Analyzed n (Intervention, Control)Sample Size Met?aStated ITT Analysis?Missing Data Strategy Provided?Analysis MethodDetected Effect SizeOutcomePb
Andrew 201817  BSID- III Cognitive composite score 24 mo 12.5 points 80% 60 (30 per group) 48 (24 per group) 43 (24, 19) No Yes Yes Parametric complex 9.0 points Null .13 
Balakrishnan 201818  BSID-III Cognitive composite score 18–24 mo CA 8.5 points 80% 168 (84 per group) 100 (50 per group) 112 (54, 58) Yes No No Parametric simple 0.4 points Null .86 
Language composite score 2.3 points Null .41 
Motor composite score 0.1 points Null .99 
Carlo 201319  BSID-II Mental development index 36 mo 10 points 90% 120 (60 per group)c 80 (40 per group) 123 (59,64) Yes Yes No Parametric Simple 4.6 points Null .0202 
da Cunha 201620  BSID-III Motor composite score 12 mo CA 10 points 80% 46 (23 per group) 46 (23 per group) 53 (26, 27) Yes No No Parametric simple 4.6 points Null .174 
Cognitive composite score 2.6 points Null .443 
Language composite score 3.4 points Null .344 
Field 201321  BSID-III Cognitive composite score 24 mo CA 10 points 90% 118 (59 per group) 94 (47 per group) 93 (45, 48) Yes Yes Yes Parametric simple 2.6 points Null NS 
Hulzebos 201422  BSID-III Motor composite score 18–24 mo CA 7 points 80% 614 (307 per group) 434 (217 per group) 480 (237, 243) Yes Yes No Parametric simple 1 point Null .49 
Khan 201823  ASQ-3, Urdu Communication, gross motor, fine motor, problem-solving, and personal– sociald 12 mo 20% absolute difference in risk of delay 80% 2112 (1056 per group) 1900 (950 per group) 1957 (1037, 920) Yes Yes No Parametric simple 18% difference in risk Finding <.001 
Kimberlin 201124  BSID-II Mental development index 12 mo 20% absolute difference in risk of no or mild impairment 80% 66 (33 per group)e 58 (29 per group)e 28 (16, 12) No No No Parametric complex 20.1 pointsf Finding .046 
Kimberlin 201124  BSID-II Mental development index 12 mo 20% absolute difference in risk of no or mild impairment 80% 66 (33 per group)e 58 (29 per group)e 15 (8, 7) No No No Parametric simple NS Null NS 
Kulkarni 201725  BSID-III Cognitive scaled score 12 mo postsurgery 3 points 90% 100 (50 per group) 75 (37 per group) 94 (47, 47) Yes Yes Yes Nonparametric simple 2 points Equivalence confirmed .35 
Li 201926  BSID-III Cognitive composite score 12 mo 5 points 80% 450 (225 per group) 286 (143 per group) 291 (143, 148) Yes No No Parametric simple 8.7 points Finding <.001 
McCann 201927  WPPSI-III Full-scale IQ score 5 y ± 4 mo <5 points 90% 720 (360 per group) 598 (299 per group) 719 (358, 361) Yes Nog Yes Parametric complex 0.2 points Equivalence confirmed NS 
Nair 200928  BSID-II, Baroda, India norms Mental development index
Psychomotor development index 
12 mo 16 points 90% 108 (54 per group) 102 (51 per group) 100 (51, 49) Yes Yes No Parametric simple 1.2 points
4.3 points 
Null
Null 
.75
.31 
Nair 200929  BSID-II, Baroda, India norms Mental development index 12 mo
24 mo 
4 points 80% 800 (400 per group) 672 (336 per group) 12 mo: 665 (324, 341) 24 mo: 735 (358, 377) Yes No No Parametric simple 5.1 points
7.2 points 
Finding
Finding 
<.001
<.005 
  Psychomotor development index 12 mo
24 mo 
         2.8 points
4.1 points 
Finding
Finding 
<.001
<.005 
Natalucci 201630  BSID-II Mental development index 24 mo CA 16 points 90% 422 (211 per group) 352 (176 per group) 365 (191, 174) Yes Yes Yes Parametric simple 1 point Null .56 
O’Connor 201631  BSID-III Cognitive composite score 18 mo CA 5 points 80% 352 (176 per group) 282 (141 per group) 299 (151, 148) Yes Yes Yes Parametric complex 1.6 points Null .41 
Shi 202032  ASQ-3, Chinese Total score 14 mo 10 points 80% 166 (83 per group) 132 (66 per group) 140 (71, 69) Yes No No Parametric complex z score = 0.25 Finding <.01 
Spittle 201033  BSID-III Cognitive composite score 24 mo CA 6 points 80% 120 (60 per group) 120 (60 per group)h 115 (58, 57) No No No Parametric simple 3.4 points Null .20 
Motor composite score 1.4 points Null .66 
Language composite score 1.3 points Null .67 
van Kempen 202034  BSID-III, Dutch Cognitive composite score 18 mo CA 7.5 points 90% 800 (400 per group; 200 per risk subgroup) 680 (340 per group; 170 per risk subgroup) 582 (287, 295) No Yes Yes Parametric simple 0.7 points Noninferiority confirmed NS 
Motor composite score 0.3 points Noninferiority confirmed NS 
Williams 201735  BSID-III Cognitive composite score 24 ± 1 mo CA 6 points 90% 1400 (700 per group) 1400 (700 per group) 1259 (631, 628) No Yes Yes Parametric simple 0.3 points Null .77 
Motor composite score 0.2 points Null .87 
Language composite score 0.0 points Null .97 
Xia 202136  BSID-III Cognitive composite score 12 mo 4 points 90% 240 (120 per group) 176 (88 per group) 175 (92, 83) No Yes No Parametric simple 2.6 points Null .77 
Language composite score 0.4 points Null .27 
Motor composite score 0.9 points Null .80 
Social–emotional composite score 3.5 points Nulli .82 
General adaptive behavior composite score 5.6 points Nullj .06 

CA, corrected age; NS, not specified; WPPSI, Wechsler Preschool and Primary Scale of Intelligence.

a

Sample size determined to be met if within 5%.

b

Unadjusted/least-adjusted P value reported.

c

Target sample size for total study = 240: 4 groups of 60 per group.

d

Primary outcome was binary indicator of whether a child had reduced delay in 2 or more child development domains of the ASQ. These domains are provided.

e

The original sample size was revised down in a protocol amendment.

f

Protocol was amended to change the primary outcome from a percentage change to a point change.

g

A per-protocol analysis was explicitly selected for methodological reasons.

h

Sample size calculation estimated that a sample size of 200 (100 per group) was required; however, sample was reduced because of funding constraints.

i

Xia 2021 reported a statistically significant adjusted P value of .048 for this outcome; however, the detected effect size was below their anticipated 4-point change.

j

Xia 2021 reported a finding on this outcome according to their adjusted P value of .004.

Participants, Interventions, Comparators, and Outcomes

Most studies (n = 17) recruited infants from high-risk populations, including preterm and/or low birth weight infants; infants with birth asphyxia; infants diagnosed with a range of conditions such as hydrocephalus, respiratory failure, HSV disease, or hypoglycemia; or those scheduled for inguinal herniorrhaphy. In contrast, 4 studies recruited healthy infants (Table 1). The 21 studies aimed to recruit 9048 participants in total, with individual study sample sizes ranging from 46 to 2112 participants (Table 2).

A variety of interventions was studied, broadly categorized into nutritional supplements (n = 7), behavioral interventions (n = 5), procedural interventions (n = 5), and drug interventions (n = 4) (Table 1). Because of the diversity of study interventions investigated, the intervention details, comparator, age at onset, and intervention duration varied widely.

Outcome assessment tools used were the Bayley Scales of Infant and Toddler Development (BSID, n = 18), Ages and Stages Questionnaire (ASQ, n = 2), and Wechsler Preschool and Primary Scale of Intelligence (n = 1) (Table 2). Among studies that administered the BSID, 6 used the BSID-II and 12 used the BSID-III. There was substantial variability among studies as to the number and range of neurodevelopmental domains used for the primary outcome. Ten of the 21 studies reported a single domain/composite score as the primary outcome. The remainder of studies elected the primary outcome to encompass >1 developmental domain/composite score (eg, cognitive, motor, and language composite scores of the BSID-III, or both the mental and psychomotor developmental indexes of the BSID-II) (Table 2). This resulted in a total of 38 trial outcomes/results reported across all studies. Study primary endpoints ranged from 12 months to 5 years and 4 months of age, with the majority following up at either 12 (n = 7 of 21) or 24 (n = 5 of 21) months of age (Table 2). One study29  specified 2 primary outcome time points (12 and 24 months).

Methodologic Quality

Overall, 8 of the 21 studies (38%) were rated as high risk of bias in at least 1 domain, resulting in an overall high risk of bias assessment (Fig 2). Missing outcome data were the most common source of bias identified in high-risk studies, followed by deviations from the intended interventions. A further 7 studies (33%) were rated to have some concerns, often related to risk arising from deviations from the intended interventions. Six studies (29%) had low risk of bias across all domains.

FIGURE 2

Summary of risk of bias for included studies.

FIGURE 2

Summary of risk of bias for included studies.

Close modal

Of the 18 superiority trials, 5 (28%) reported a finding (ie, a statistically different between-group difference of at least the anticipated effect size, favoring the intervention) on the primary outcome/s of the trial,23,24,26,29,32  and 13 studies (72%) reported a null finding on the primary outcome/s1722,24,28,30,31,33,35,36  (Table 2). Of note, Kimberlin 2011 reported a finding for infants with central nervous system involvement, but not for those with skin, eye, and mouth disease only.24  The 3 equivalence/noninferiority studies all confirmed equivalence/noninferiority across their primary outcomes.25,27,34  These studies are excluded from our analysis of finding versus null finding because, in this type of study, a small or no-between group difference is the hypothesis under investigation.

Selection of Primary Outcome Measure

Of the trials that reported a finding, 3 of 5 (60%) used the BSID-II or -III and 2 of 5 (40%) used the ASQ-3. Of the 13 null finding studies, 100% used the BSID (n = 4 used BSID-II and n = 9 used BSID-III).

Timing of Primary Endpoint

Accounting for the double endpoint (12 and 24 months) used by Nair 200929  by dividing this study in half, 90% (n = 4.5 of 5) of studies that reported a finding had a primary endpoint at <18 months of age. In contrast, only 31% (n = 4 of 13) of null finding studies captured the primary outcome before 18 months.

Anticipated Clinical Effect Size

Of the 14 studies that targeted a minimum points-change using the BSID, the mean anticipated effect size across null finding studies was 9.3 composite score points, compared with only 4.5 for the findings studies. The anticipated effect size for the remaining studies could not be directly compared because the 2 studies that used the ASQ-3 for their primary outcome employed different analysis methods (categorical23  versus continuous32 ), and the anticipated clinical effect size for Kimberlin 201124  could not be determined from the study protocol.

Study Sample Size and Power

Three of the 5 findings studies (60%) and 8 of the 13 studies that reported a null finding (62%) achieved their target sample size during recruitment. Of studies that reported a finding, the median target sample size required for primary outcome analysis was 286 (range 58–1900) and the median analyzed was 291 participants (range 28–1957). In contrast, for studies reporting a null finding, the median target sample size was 102 (range 46–1400), with 115 participants analyzed (range 15–1259).

All superiority trials used either 80% or 90% as their level of statistical power when calculating their target sample size. Interestingly, all 5 (100%) trials that reported a finding targeted the lower 80% power, whereas, of those studies that did not report a finding, 7 (54%) targeted 80% power and 6 (46%) targeted the higher 90% level of power.

Statistical Analysis Methodology and Rigor

Of the 14 superiority studies reporting an unadjusted treatment effect as their primary outcome, 11 reported a null finding and 3 reported a finding. Nine of these 14 studies also undertook secondary analyses where adjustment for confounders was considered; however, this resulted in a different study outcome in only 1 case.36  Of the remaining 4 studies, 2 reported an adjusted treatment effect from a multiple linear regression/analysis of covariance (1 finding and 1 null finding), and 2 used adjusted mixed effects models (1 finding and 1 null finding).

Seventeen of the superiority trials used a (frequentist) parametric analysis method requiring validation of parametric assumptions. Notably, no studies that reported a finding commented on the parametric assumptions of their chosen analysis method, whereas just over half (n = 7 of 13) of the studies that reported a null finding referenced these assumptions, either explicitly stating they were checked and satisfied or outlining strategies to check the validity of these assumptions in the protocol. The remaining study used a Bayesian model and reported a finding, but similarly did not include details of model-checking appropriate for Bayesian methods.

Only 1 (20%) of the studies that reported a finding stated that an ITT analysis was performed, whereas, in the remaining 4 studies (80%) there was no comment made or it was unclear. Of the 13 studies that did not report a finding, 9 (69%) stated that an ITT analysis had been conducted, whereas it was unclear for the remaining 4. Only 5 of the 18 included superiority studies, all of which found a null finding, made specific mention of strategies for dealing with missing data.

Methodological Quality

All studies that reported a finding were assessed to be either high risk of bias (3 of 5, 60%) or had some concerns (2 of 5, 40%). For those studies that reported a null finding, 38% (5 of 13) were assessed to have a low overall risk of bias, with 4 of 13 (31%) studies deemed high risk, and 4 of 13 (31%) had some concerns.

With the discovery of life-preserving treatments for at-risk neonate and infant populations, the need to improve neurodevelopmental outcome has gained focus. Effective neurodevelopmental interventions for these infants are a high priority, yet there remains a paucity of evidence-based treatments available. We conducted a systematic review to better understand the challenges and issues related to conducting infant clinical trials aimed at improving neurodevelopmental outcomes, with the aim to make recommendations, where appropriate, for future trials (Table 3). Results demonstrated that the majority of the included studies reported a null finding. This was a surprising result given the known problem of preferential publication of studies with positive findings,38  and supports the importance of publishing all trial results, regardless of outcome.

TABLE 3

Recommendations for Future Infant Neurodevelopmental Clinical Trials

Recommendations From This Systematic Review
• Develop, validate, and renorm assessment tools to ensure measures with sensitive-outcome measurement properties, such as responsiveness to change, are available. 
• Consider more sophisticated primary outcome designs/analyses that allow for powered assessment of multiple outcomes of interest. 
• Follow up with trial participants beyond 1–2 years of life, ideally to school age. 
• Use endpoint measures that have responsivity psychometric properties. 
• Do not overinflate the anticipated effect size to improve trial feasibility. 
• Carefully and transparently report decisions and assumptions underpinning the statistical analysis process. 
• Conduct statistical analyses that adjust for attrition. 
• When available, adopt novel early outcome measures that allow diagnosis or prediction of neurodevelopmental outcome/s at earlier time points. 
Recommendations From This Systematic Review
• Develop, validate, and renorm assessment tools to ensure measures with sensitive-outcome measurement properties, such as responsiveness to change, are available. 
• Consider more sophisticated primary outcome designs/analyses that allow for powered assessment of multiple outcomes of interest. 
• Follow up with trial participants beyond 1–2 years of life, ideally to school age. 
• Use endpoint measures that have responsivity psychometric properties. 
• Do not overinflate the anticipated effect size to improve trial feasibility. 
• Carefully and transparently report decisions and assumptions underpinning the statistical analysis process. 
• Conduct statistical analyses that adjust for attrition. 
• When available, adopt novel early outcome measures that allow diagnosis or prediction of neurodevelopmental outcome/s at earlier time points. 

Examination of the primary outcome measures used revealed that the BSID (versions II and III) is the most commonly used neurodevelopmental outcome measure in infant trials since 2009. This was expected because the BSID is considered to be the “gold standard” developmental assessment tool for infants aged 0 to 42 months.39  Despite its widespread use, the BSID has been criticized, in particular the BSID-III, for markedly underestimating developmental delay in high-risk groups and there are cultural differences between countries in norms.4043  In addition to the problems with norms, the BSID suite of tools are discriminatory tools, not designed for detection of change from intervention. As such, the common use of the BSID-II and -III as a primary outcome measure may have contributed to the high proportion of null findings reported. It will be interesting to observe whether the recently released BSID-4, with updated normative data and scoring metric, performs better as it is adopted. Recommendation: Specific tools with sensitive-outcome measurement properties should continue to be developed, validated, and renormed (where discriminative) to ensure tests with the best possible psychometric properties are available.

Whether selection of a different outcome measure might have changed the result of any of the null studies is an interesting question. Variable psychometric properties of neurodevelopmental outcome measures in infancy is not limited to the BSID.44  As discussed, the natural variability in development in infancy is a challenge for assessment, as are environmental factors such as mood, responsiveness to assessor, behavior, and understanding preverbal child intentions.44  For this reason, it is common in clinical practice to administer two measures of domains of interest or low performance, because convergent results give increased confidence in the reliable interpretation of results. However, because of feasibility considerations, such as power and sample size, this is often not possible in clinical trials. Moreover, even when multiple measures of a domain are included, only one is typically designated as the primary outcome, with others included as secondary outcomes, which are often underpowered. Plus, multiple statistical calculations of data measuring the same domain can introduce other flaws. One way that studies have addressed this is by combining multiple endpoints into a composite outcome. However, the conventional composite outcome fails to account for the relative importance and possible correlation among its components, and may prove challenging for the interpretation of results. Specifically, neonatal trials frequently combine mortality and disability outcomes into a single composite, which can be problematic because they often have discordant patient importance, effect sizes, and event rates.45  In addition, children may improve on some domains and not others, and statistically, their gains may be washed out by children who make gains in the opposite domains. Recommendation: Researchers should consider more sophisticated primary outcome design/analyses. For example, the Global Statistical Test is one method that can enable analysis of multiple important outcomes of interest with appropriate weighting, and should allow for true effects to be detected with high sensitivity and specificity with smaller sample sizes than current trial methodologies.46 

Timing of the primary endpoint is another important consideration for trial design. This review found that nearly all studies with a finding (90%) measured the primary outcome at <18 months, compared with only 31% of null finding studies. Because of the known variability in development, neurodevelopmental assessment tools are generally more reliable and predictive of future ability when administered at later ages.8  Moreover, intervention effects that are maintained and/or increase over time postintervention, reduce the likelihood of cascading development impairment.47,48  It is therefore an interesting and somewhat disheartening result that the majority of clinical trials with slightly later primary endpoints were not able to detect a significant improvement from their interventions. However, these trials with longer duration follow-up points are confounded by the lack of control for socioeconomic opportunity and parental input, which also affect outcomes.12,13  Recommendation: Longer-term follow-up (ie, school age assessment) may be warranted to determine if positive effects of interventions are sustained beyond the first 1 to 2 years of life, and when more subtle differences can be more accurately detected.8 

Looking at other design aspects, a notable difference in anticipated clinical effect size, sample size, and power was observed among studies reporting a null finding versus a finding. Specifically, in studies that reported a null finding, the anticipated effect size was approximately double and the sample size was smaller (35%–40%), despite being more likely to target the higher (90%) power. These elements go hand in hand because anticipated effect size and desired statistical power contribute directly to the target sample size. Crucially, larger effect sizes and lower power will result in a smaller sample size because it is easier to observe differences among groups if the groups are farther apart. Researchers must often balance these aspects of trial design with the availability of trial funding, which can ultimately restrict sample size and may contribute to the high proportion of promising interventions resulting in null findings. In addition, the psychometric properties of the tool will change the effect size and the sample size estimate. For example, discriminative measures redistribute data to a normal curve, and consequently, much larger gains are needed by infants with disabilities in the lower second standard deviation to show any intervention effect than on instruments psychometrically designed to measure change after intervention. Recommendation: Trial endpoint measures should ideally have responsivity psychometric properties. Researchers should also exercise caution to not overinflate the anticipated effect size in the interest of improving trial feasibility by lowering the required sample size, because this could prove to be a fatal study design flaw. Rather, consideration should be given to the adoption of innovative trial designs, for example, adaptive trials, to overcome some of these challenges.

Adaptive trials allow results from interim data analyses to inform adjustments to the trial protocol, including sample size recalculation and early trial termination for success or lack of efficacy.49  Although adaptive trials generally require greater statistical input at earlier stages of trial design, sophisticated trial design methods have the potential to identify effective interventions more efficiently by detecting true change in shorter time frames.50  Furthermore, researchers should determine a priori whether it is reasonable to aggregate the results of children with normal and abnormal outcomes into the same analyses. When the research question is, “Does the intervention prevent the occurrence of a specified outcome?” then a discriminative measure that determines normal from abnormal is appropriate. However, if the aim of the intervention is to reduce the severity of a specified outcome, then a discriminative measure is likely to underestimate the intervention effect. Thus, the trial would be better designed using an outcome measure with good responsivity to change properties, as well as adequately powered for subgroup analysis of those with disability.

From this review, we see that simple, unadjusted analysis methods for estimating treatment effect were most commonly applied. At first glance, this observation may suggest that failure to adjust for potential confounders of treatment effects may have led to the large proportion of null findings. However, even though many studies with a null finding conducted secondary analyses with adjustment for confounders, these analyses rarely produced different results. Thus, although it is possible that, in some studies, potential confounding factors were poorly chosen or not correctly identified, in general, this does not appear to be a primary cause of null findings.

Although the included studies clearly indicate an awareness of the ITT principle,51  the absence of an explicit strategy for dealing with missing data in the majority of the included superiority trials suggests that adherence may be lacking in practice.52,53  Similarly of concern, from a statistical perspective, is the small number of studies that gave explicit consideration to the parametric assumptions underlying their chosen analysis method.54  Recommendation: Although it is not possible to know from this review whether these factors may have contributed to the high proportion of null finding studies, as best practice, researchers should report more carefully and transparently on the decisions and assumptions underpinning their statistical analysis process. Moreover, as editors and peer reviewers for the field, we must hold each other accountable to these good reporting practices and allow scope for this within publication word counts.

Notably, all studies that reported a finding had either a high risk of bias or some concerns. This potentially raises doubts over the validity of these findings, although low methodological quality was observed in >70% of all included studies. Risk of bias because of missing data was identified as a common area of concern. This is not overly surprising because loss to participant dropout is a known challenge for studies with longer follow-up times, and some trials noted an expected mortality rate because of the target participant population. Recommendation: Given that loss to follow-up of varying degrees will be almost universal (and unavoidable) in such trials, statistical analysis that adjusts for attrition should be uniformly reported to help researchers, clinicians, and policymakers arrive at a meaningful conclusion. Furthermore, in the future, there may be novel early outcome measures that allow us to better diagnose/predict poor neurodevelopmental outcome at earlier time points, enabling shorter follow-up times and minimizing loss to follow-up.

A major limitation of this review is that bias against publication of null studies likely resulted in an underestimation of null findings studies, and important information and insights from those studies could not be incorporated. Although much thought went into the search strategy to balance finding relevant studies with a manageable number of studies to screen, by not including additional terms such as developmental delay, intellectual disability, speech delay, fine motor delay, or gross motor delay, it is possible that some studies may have been missed. Further, the decision to exclude studies investigating interventions that were exclusively parental education may have excluded studies that produced a finding, because these interventions may positively alter the infant’s environment. However, studies were eligible for inclusion if the intervention was infant-directed, even if they were administered by a parent. This is important because parent-administered interventions may be the best route to achieve adequate intervention dose. Another limitation is that studies reporting neurodevelopmental outcome as a secondary outcome, likely at longer time points, were excluded from the systematic review. The purpose of this review was to examine findings of neurodevelopmental studies and it was therefore deemed critical that included studies were established for this specific purpose and were adequately powered to detect change. In the future, it may be of interest to conduct a review of trials where neurodevelopment was assessed as a secondary outcome to determine whether a similar high proportion of null findings exists. Moreover, studies were excluded where the authors devised a single composite score from multiple outcomes. Often, these encompassed outcome/s not included in the 16 identified assessment tools included in our search, and it was not possible within the scope of this review to evaluate the analysis and interpretation employed by researchers to arrive at the composite variable. The authors acknowledge that composite outcomes are frequently used in neonatal medicine, and thus this decision may have resulted in the exclusion of relevant studies.

Identification and investigation of new interventions targeted at improving neurodevelopmental outcomes is critically important to ensure that infants born at risk have access to the best available evidence-based interventions. The purpose of randomized controlled trials is to determine whether an intervention is efficacious or not. Although there is an assumption that the decision to test an intervention in a randomized controlled trial is based on promising data from preclinical studies and early phase trials, there are many reasons why an intervention may be ineffective in a specific population. This systematic review identified a high proportion of infant neurodevelopmental trials that produced a null finding, and detected several methodological and design considerations common to these studies. Although it is not possible to determine from this review to what extent these methodological and design flaws contributed to the high proportion of null findings detected in the literature, it is the responsibility of clinical trialists working in the field to address these issues so that truly effective interventions can be identified and translated into clinical care. As such, we provide a number of recommendations for the field, including the use of more sophisticated approaches to trial design, outcome assessment, and analysis. Moreover, we caution against compromising on trial quality to increase perceived trial feasibility because trial design decisions may influence our ability to confidently interpret intervention efficacy, and this may ultimately impact the availability of effective interventions for infants in need.

We thank Charlotte Frazer for assistance with reference checking and formatting.

Drs Finch-Edmondson and Paton conceptualized and designed the study, ran the searches, screened studies and extracted data, drafted the initial manuscript, and reviewed and revised the manuscript; Dr Honan screened studies and extracted data, drafted the initial manuscript, and reviewed and revised the manuscript; Ms Galea conceptualized the study, screened studies, and critically reviewed the manuscript for important intellectual content; Ms Webb extracted data, conducted data analysis, drafted the initial manuscript, and reviewed and revised the manuscript Dr Novak extracted data, conducted data analysis, and critically reviewed the manuscript for important intellectual content; Dr Badawi conceptualized the study and critically reviewed the manuscript for important intellectual content; Dr Trivedi conceptualized the study, extracted data, conducted data analysis, drafted the initial manuscript, and reviewed and revised the manuscript; and all authors approved the final manuscript as submitted and agree to be accountable for all aspects of the work.

FUNDING: No external funding.

CONFLICT OF INTEREST DISCLAIMER: The authors have indicated they have no conflicts of interest relevant to this article to disclose.

ASQ

Age and Stages Questionnaire

BSID

Bayley Scales of Infant and Toddler Development

HSV

herpes simplex virus

ITT

intention-to-treat

1
World Health Organization
.
Newborns: improving survival and well-being
.
2
Lawn
JE
,
Cousens
S
,
Zupan
J
.
Lancet Neonatal Survival Steering Team
.
4 million neonatal deaths: when? Where? Why?
Lancet
.
2005
;
365
(
9462
):
891
900
3
Johnson
S
,
Marlow
N
.
Early and long-term outcome of infants born extremely preterm
.
Arch Dis Child
.
2017
;
102
(
1
):
97
102
4
Kwan
C
,
Gitimoghaddam
M
,
Collet
JP
.
Effects of social isolation and loneliness in children with neurodevelopmental disabilities: a scoping review
.
Brain Sci
.
2020
;
10
(
11
):
786
5
Reddihough
DS
,
Jiang
B
,
Lanigan
A
,
Reid
SM
,
Walstab
JE
,
Davis
E
.
Social outcomes of young adults with cerebral palsy
.
J Intellect Dev Disabil
.
2013
;
38
(
3
):
215
222
6
Gale
C
,
McGuire
W
,
Juszczak
E
.
Randomized controlled trials for informing perinatal care
.
Neonatology
.
2020
;
117
(
1
):
8
14
7
Laventhal
N
,
Tarini
BA
,
Lantos
J
.
Ethical issues in neonatal and pediatric clinical trials
.
Pediatr Clin North Am
.
2012
;
59
(
5
):
1205
1220
8
Marlow
N
.
Measuring neurodevelopmental outcome in neonatal trials: a continuing and increasing challenge
.
Arch Dis Child Fetal Neonatal Ed
.
2013
;
98
(
6
):
F554
F558
9
Turner
MA
.
Clinical trials of medicines in neonates: the influence of ethical and practical issues on design and conduct
.
Br J Clin Pharmacol
.
2015
;
79
(
3
):
370
378
10
Grantham-McGregor
S
,
Cheung
YB
,
Cueto
S
,
Glewwe
P
,
Richter
L
,
Strupp
B
.
International Child Development Steering Group
.
Developmental potential in the first 5 years for children in developing countries
.
Lancet
.
2007
;
369
(
9555
):
60
70
11
Thompson
RA
,
Nelson
CA
.
Developmental science and the media. Early brain development
.
Am Psychol
.
2001
;
56
(
1
):
5
15
12
Bush
NR
,
Wakschlag
LS
,
LeWinn
KZ
, et al
.
Family environment, neurodevelopmental risk, and the environmental influences on child health outcomes (ECHO) initiative: looking back and moving forward
.
Front Psychiatry
.
2020
;
11
:
547
13
Ronfani
L
,
Vecchi Brumatti
L
,
Mariuz
M
, et al
.
The complex interaction between home environment, socioeconomic status, maternal IQ and early child neurocognitive development: a multivariate analysis of data collected in a newborn cohort study
.
PLoS One
.
2015
;
10
(
5
):
e0127052
14
Morgan
C
,
Honan
I
,
Allsop
A
,
Novak
I
,
Badawi
N
.
Psychometric properties of assessments of cognition in infants with cerebral palsy or motor impairment: a systematic review
.
J Pediatr Psychol
.
2019
;
44
(
2
):
238
252
15
Soll
RF
,
Edwards
EM
,
Badger
GJ
, et al
.
Obstetric and neonatal care practices for infants 501 to 1500 g from 2000 to 2009
.
Pediatrics
.
2013
;
132
(
2
):
222
228
16
Sterne
JAC
,
Savović
J
,
Page
MJ
, et al
.
RoB 2: a revised tool for assessing risk of bias in randomized trials
.
BMJ
.
2019
;
366
:
4898
17
Andrew
MJ
,
Parr
JR
,
Montague-Johnson
C
, et al
.
Neurodevelopmental outcome of nutritional intervention in newborn infants at risk of neurodevelopmental impairment: the Dolphin neonatal double-blind randomized controlled trial
.
Dev Med Child Neurol
.
2018
;
60
(
9
):
897
905
18
Balakrishnan
M
,
Jennings
A
,
Przystac
L
, et al
.
Growth and neurodevelopmental outcomes of early, high-dose parenteral amino acid intake in very low birth weight infants: a randomized controlled trial
.
JPEN J Parenter Enteral Nutr
.
2018
;
42
(
3
):
597
606
19
Carlo
WA
,
Goudar
SS
,
Pasha
O
, et al.
Brain Research to Ameliorate Impaired Neurodevelopment-Home-Based Intervention Trial Committee and the National Institute of Child Health and Human Development Global Network for Women’s and Children’s Health Research Investigators
.
Randomized trial of early developmental intervention on outcomes in children after birth asphyxia in developing countries
.
J Pediatr
.
2013
;
162
(
4
):
705
712.e3
20
da Cunha
RD
,
Lamy Filho
F
,
Rafael
EV
,
Lamy
ZC
,
de Queiroz
AL
.
Breast milk supplementation and preterm infant development after hospital discharge: a randomized clinical trial
.
J Pediatr (Rio J)
.
2016
;
92
(
2
):
136
142
21
Field
D
.
Neonatal ECMO study of temperature (NEST): a randomized controlled trial
.
Pediatrics
.
2013
;
132
(
5
):
e1247
e1256
22
Hulzebos
CV
,
Dijk
PH
,
van Imhoff
DE
, et al.
BARTrial Study Group
.
The bilirubin albumin ratio in the management of hyperbilirubinemia in preterm infants to improve neurodevelopmental outcome: a randomized controlled trial–BARTrial
.
PLoS One
.
2014
;
9
(
6
):
e99466
23
Khan
MA
,
Owais
SS
,
Maqbool
S
, et al
.
Is integrated private-clinic based early child development care effective? A clustered randomized trial in Pakistan
.
BJGP Open
.
2018
;
2
(
2
):
bjgpopen18X101593
24
Kimberlin
DW
,
Whitley
RJ
,
Wan
W
, et al.
National Institute of Allergy and Infectious Diseases Collaborative Antiviral Study Group
.
Oral acyclovir suppression and neurodevelopment after neonatal herpes
.
N Engl J Med
.
2011
;
365
(
14
):
1284
1292
25
Kulkarni
AV
,
Schiff
SJ
,
Mbabazi-Kabachelor
E
, et al
.
Endoscopic treatment versus shunting for infant hydrocephalus in Uganda
.
N Engl J Med
.
2017
;
377
(
25
):
2456
2464
26
Li
F
,
Wu
SS
,
Berseth
CL
, et al
.
Improved neurodevelopmental outcomes associated with bovine milk fat globule membrane and lactoferrin in infant formula: a randomized, controlled trial
.
J Pediatr
.
2019
;
215
:
24
31.e8
27
McCann
ME
,
de Graaff
JC
,
Dorris
L
, et al.
GAS Consortium
.
Neurodevelopmental outcome at 5 years of age after general anaesthesia or awake-regional anaesthesia in infancy (GAS): an international, multicentre, randomised, controlled equivalence trial
.
Lancet
.
2019
;
393
(
10172
):
664
677
28
Nair
MK
,
George
B
,
Jeyaseelan
L
.
Pyritinol for post asphyxial encephalopathy in term babies–a randomized double-blind controlled trial
.
Indian Pediatr
.
2009
;
46
(
Suppl
):
s37
s42
29
Nair
MK
,
Philip
E
,
Jeyaseelan
L
,
George
B
,
Mathews
S
,
Padma
K
.
Effect of Child Development Centre model early stimulation among at risk babies–a randomized controlled trial
.
Indian Pediatr
.
2009
;
46
(
Suppl
):
s20
s26
30
Natalucci
G
,
Latal
B
,
Koller
B
, et al.
Swiss EPO Neuroprotection Trial Group
.
Effect of early prophylactic high-dose recombinant human erythropoietin in very preterm infants on neurodevelopmental outcome at 2 years: a randomized clinical trial
.
JAMA
.
2016
;
315
(
19
):
2079
2085
31
O’Connor
DL
,
Gibbins
S
,
Kiss
A
, et al.
GTA DoMINO Feeding Group
.
Effect of supplemental donor human milk compared with preterm formula on neurodevelopment of very low birth weight infants at 18 months: a randomized clinical trial
.
JAMA
.
2016
;
316
(
18
):
1897
1905
32
Shi
H
,
Li
X
,
Fang
H
,
Zhang
J
,
Wang
X
.
The effectiveness and cost-effectiveness of a parenting intervention integrated with primary health care on early childhood development: a cluster-randomized controlled trial
.
Prev Sci
.
2020
;
21
(
5
):
661
671
33
Spittle
AJ
,
Anderson
PJ
,
Lee
KJ
, et al
.
Preventive care at home for very preterm infants improves infant and caregiver outcomes at 2 years
.
Pediatrics
.
2010
;
126
(
1
):
e171
e178
34
van Kempen
AAMW
,
Eskes
PF
,
Nuytemans
DHGM
, et al.
HypoEXIT Study Group
.
Lower versus traditional treatment threshold for neonatal hypoglycemia
.
N Engl J Med
.
2020
;
382
(
6
):
534
544
35
Williams
FLR
,
Ogston
S
,
Hume
R
, et al.
I2S2 Team
.
Supplemental iodide for preterm infants and developmental outcomes at 2 years: an RCT
.
Pediatrics
.
2017
;
139
(
5
):
e20163703
36
Xia
Y
,
Jiang
B
,
Zhou
L
, et al
.
Neurodevelopmental outcomes of healthy Chinese term infants fed infant formula enriched in bovine milk fat globule membrane for 12 months–a randomized controlled trial
.
Asia Pac J Clin Nutr
.
2021
;
30
(
3
):
401
414
37
Page
MJ
,
McKenzie
JE
,
Bossuyt
PM
, et al
.
The PRISMA 2020 statement: an updated guideline for reporting systematic reviews
.
BMJ
.
2021
;
372
(
71
):
n71
38
Johnson
RT
,
Dickersin
K
.
Publication bias against negative results from clinical trials: 3 of the 7 deadly sins
.
Nat Clin Pract Neurol
.
2007
;
3
(
11
):
590
591
39
Walder
DJ
,
Sherman
JC
,
Pulsifer
MB
.
Neurodevelopmental Assessment
. In:
Mowder
BA
,
Rubinson
F
,
Yasik
AE
, eds.
Evidence‐Based Practice in Infant and Early Childhood Psychology
.
John Wiley & Sons
;
2009
:
167
205
,
E-book DOI:10.1002/9781118269602
40
Anderson
PJ
,
De Luca
CR
,
Hutchinson
E
,
Roberts
G
,
Doyle
LW
.
Victorian Infant Collaborative Group
.
Underestimation of developmental delay by the new Bayley-III Scale
.
Arch Pediatr Adolesc Med
.
2010
;
164
(
4
):
352
356
41
Sharp
M
,
DeMauro
SB
.
Counterbalanced comparison of the BSID-II and Bayley-III at 18 to 22 months’ corrected age
.
J Dev Behav Pediatr
.
2017
;
38
(
5
):
322
329
42
Vohr
BR
,
Stephens
BE
,
Higgins
RD
, et al.
Eunice Kennedy Shriver National Institute of Child Health and Human Development Neonatal Research Network
.
Are outcomes of extremely preterm infants improving? Impact of Bayley assessment on outcomes
.
J Pediatr
.
2012
;
161
(
2
):
222
228.e3
43
Walker
K
,
Badawi
N
,
Halliday
R
,
Laing
S
.
Brief report: performance of Australian children at 1 year of age on the Bayley Scales of Infant and Toddler Development (Version III)
.
Aust J Educ Dev Psychol
.
2010
;
27
(
1
):
54
58
44
Ellingsen
KM
.
Standardized Assessment of Cognitive Development: Instruments and Issues
. In:
Early Childhood Assessment in School and Clinical Child Psychology
.
New York, NY
:
Springer Science + Business Media
;
2016
:
25
49
45
Lai
NM
,
Yap
AQY
,
Ong
HC
, et al
.
Use of composite outcomes in neonatal trials: an analysis of the Cochrane Reviews
.
Neonatology
.
2021
;
118
(
3
):
259
263
46
Baraniuk
S
,
Seay
R
,
Sinha
AK
,
Piller
LB
.
Comparison of the global statistical test and composite outcome for secondary analyses of multiple coronary heart disease outcomes
.
Prog Cardiovasc Dis
.
2012
;
54
(
4
):
357
361
47
Campbell
FA
,
Pungello
EP
,
Miller-Johnson
S
,
Burchinal
M
,
Ramey
CT
.
The development of cognitive and academic abilities: growth curves from an early childhood educational experiment
.
Dev Psychol
.
2001
;
37
(
2
):
231
242
48
Sansavini
A
,
Guarini
A
,
Caselli
MC
.
Preterm birth: neuropsychological profiles and atypical developmental pathways
.
Dev Disabil Res Rev
.
2011
;
17
(
2
):
102
113
49
Pallmann
P
,
Bedding
AW
,
Choodari-Oskooei
B
, et al
.
Adaptive designs in clinical trials: why use them, and how to run and report them
.
BMC Med
.
2018
;
16
(
1
):
29
50
Lorch
U
,
Berelowitz
K
,
Ozen
C
,
Naseem
A
,
Akuffo
E
,
Taubel
J
.
The practical application of adaptive study design in early phase clinical trials: a retrospective analysis of time savings
.
Eur J Clin Pharmacol
.
2012
;
68
(
5
):
543
551
51
Detry
MA
,
Lewis
RJ
.
The intention-to-treat principle: how to assess the true effect of choosing a medical treatment
.
JAMA
.
2014
;
312
(
1
):
85
86
52
Gravel
J
,
Opatrny
L
,
Shapiro
S
.
The intention-to-treat approach in randomized controlled trials: are authors saying what they do and doing what they say?
Clin Trials
.
2007
;
4
(
4
):
350
356
53
Hollis
S
,
Campbell
F
.
What is meant by intention to treat analysis? Survey of published randomized controlled trials
.
BMJ
.
1999
;
319
(
7211
):
670
674
54
Lee
PH
,
Tse
ACY
.
Diagnosis checking of statistical analysis in RCTs indexed in PubMed
.
Eur J Clin Invest
.
2017
;
47
(
11
):
847
852

Supplementary data