Video Abstract
Discovering new interventions to improve neurodevelopmental outcomes is a priority; however, clinical trials are challenging and methodological issues may impact the interpretation of intervention efficacy.
Characterize the proportion of infant neurodevelopment trials reporting a null finding and identify features that may contribute to a null result.
The Cochrane library, Medline, Embase, and CINAHL databases.
Randomized controlled trials recruiting infants aged <6 months comparing any “infant-directed” intervention against standard care, placebo, or another intervention. Neurodevelopment assessed as the primary outcome between 12 months and 10 years of age using a defined list of tools.
Two reviewers independently extracted data and assessed quality of included studies.
Of n = 1283 records screened, 21 studies (from 20 reports) were included. Of 18 superiority studies, >70% reported a null finding. Features were identified that may have contributed to the high proportion of null findings, including selection and timing of the primary outcome measure, anticipated effect size, sample size and power, and statistical analysis methodology and rigor.
Publication bias against null studies means the proportion of null findings is likely underestimated. Studies assessing neurodevelopment as a secondary or within a composite outcome were excluded.
This review identified a high proportion of infant neurodevelopmental trials that produced a null finding and detected several methodological and design considerations which may have contributed. We make several recommendations for future trials, including more sophisticated approaches to trial design, outcome assessment, and analysis.
Over the last few decades, implementation of antenatal interventions to reduce the risk of brain injury (eg, administration of corticosteroids) and neuroprotective strategies for newborns (eg, magnesium sulfate and therapeutic hypothermia), as well as advancements in the management of high-risk neonates and infants have resulted in a significant decline in mortality, particularly in high-income countries.1 Increased survival has resulted in a shift in focus toward reduction in morbidity, which remains a priority for families.2 Neurodevelopmental impairment has long-term implications, such as lower educational attainment, wealth, and job-related income; increased risk for psychiatric and mood disorders and social isolation; and decreased likelihood of achieving independent living, marriage, and employment.3–5 Thus, there is a need to identify and trial new interventions targeted at improving neurodevelopment and other outcomes which affect quality of life (eg, chronic pain, blindness, and deafness) for at-risk infants.
Despite a strong, collective desire to improve long-term neurodevelopmental outcomes, literature has highlighted some of the challenges associated with capturing robust and meaningful outcome data in clinical trials focused on at-risk infants.6–9 Some of these challenges are relevant to clinical trials in general, such as ethics, consent, recruitment, follow-up, cost, infrastructure, and personnel. Others, however, are more specific to the assessment of long-term neurodevelopmental outcome, including the natural variability in human development, especially during infancy, which is a period of rapid development, and the nature of neurodevelopment as a continuum.10,11 Variation in parenting practices and socioeconomic affordances also affect outcomes.12,13 In addition, issues related to the validity, reliability (including sensitivity and specificity to predict prognosis, and responsivity to detect change), and application of available assessment tools provide additional challenges.14 Most conventional tools were designed psychometrically for discriminative purposes (ie, discriminating normal from abnormal) and not for detecting change from intervention. Yet another factor complicating reliable neurodevelopmental assessment in at-risk infants is that many assessment tools rely on fine motor manipulation and/or verbal output to demonstrate ability across domains.14 As such, children with a motor and/or speech delay may be “untestable” or unable to accurately demonstrate ability across other domains, such as cognition. This could create significant inaccuracies if cognition was the primary outcome for an intervention study, for example.14
Although some interventions will fail to produce a between-group difference in superiority studies because they are truly ineffective, it is also possible that some clinical trials assessing neurodevelopmental outcomes will generate a null finding because of the inherent challenges of conducting neurodevelopmental clinical trials, and/or because of their choice of outcome measure. Because clinical trials are costly to run in terms of time, money, and staffing resources, careful design, including selection of appropriate outcome measure/s, is critical to reduce research wastage and ensure that effective interventions are translated into clinical care.
To better understand the challenges related to conducting infant clinical trials aimed at improving neurodevelopment, we conducted a systematic review with the following aims:
to identify the most commonly used outcome assessment tools in infant neurodevelopmental clinical trials;
to determine the proportion of infant neurodevelopmental clinical trials reporting a null finding versus a finding on the primary outcome; and
to identify methodological features of infant neurodevelopmental clinical trials that could contribute to a null finding result.
Ultimately, the objective of this systematic review was to make recommendations for the field to maximize the likelihood of achieving successful results by avoiding fatal trial design flaws.
Methods
The protocol for this review was prospectively registered on PROSPERO (CRD42019129004, registration date August 27, 2019). We conducted a systematic review using the standard methods of the Cochrane Neonatal Review Group.
Defining the Neurodevelopmental Assessment Tools
Before conducting this systematic review, we consulted a panel of 14 international experts (7 clinical neuropsychologists and/or psychologists, 4 pediatric allied health professionals, and 3 specialist neurology/pediatric physicians, from 5 countries) to generate a list of the most common neurodevelopmental assessment tools for infants and children. Sixteen tools were identified, and for each tool, we derived a list of keywords to incorporate into our search terms (Supplemental Table 4).
Inclusion and Exclusion Criteria
We included randomized, quasirandomized, and cluster-randomized controlled clinical trials that compared the neurodevelopmental outcome/s of any “infant-directed” intervention versus standard care or placebo or another intervention, administered to infants recruited before 6 months of age. The neurodevelopmental outcome must have been assessed between 12 months (corrected) and 10 years of age using any edition of at least 1of the 16 tools listed in Supplemental Table 4.
Studies were excluded from this review if they met the following criteria:
intervention was exclu sively “parental education” because this was deemed not infant-directed;
absence of an explicit primary outcome (ie, multiple potential primary outcomes with no explicit nomination);
reported neurodevelop mental outcome as a secondary outcome/follow-up analysis only; or
used a composite outcome of multiple measures (excluding composite scores within a single neurodevelopmental tool).
Data Sources and Search Strategy
We searched the Cochrane Central Register of Controlled Trials (CENTRAL) (the Cochrane Library, latest issue), PubMed (Medline), and Embase using OVID, in addition to the CINAHL database. The search strategy is described in the supplemental material (Supplemental Table 5). Searches were limited to English language articles and publication date 2009 onward because of the significant improvement in neonatal care practices (eg, magnesium sulfate) and guidelines documented since then.15 The search was initially conducted on May 6, 2021, and was rerun on December 13, 2021.
Deduplicated results from OVID and CINAHL were combined and exported into EndNote (version ×9). Additional deduplication was conducted before results were imported into Covidence Systematic Review Software (http://www.covidence.org).
Data Extraction
Titles and/or abstracts of studies retrieved using the search strategy were screened independently by two review authors (M.F.E. and M.P.). Full texts of studies were then retrieved and independently assessed for eligibility by two authors (M.P., M.F.E., I.H.), with any disagreements resolved by the third author.
Data extraction was performed by at least two review authors (M.P., M.F.E., I.H.), with any discrepancies identified and resolved through discussion with the third author. Extracted data included details of participants, intervention, comparator, and primary outcome. Details of the statistical analysis and trial results, including P value, were also captured. Where possible, the unadjusted (or least adjusted) P value, derived from intention-to-treat (ITT) analysis, was extracted. When recording the outcome of a study (finding versus null finding), a study was determined to be a null finding if the size of the clinical effect of the primary outcome did not meet their anticipated effect size, even if the authors found a statistically significant between-group difference.
Quality Assessment
For risk of bias, two authors (split between I.N., M.P., M.F.E., and A.T.) independently analyzed each study as per the updated Cochrane risk-of-bias tool 2.16 Bias in studies was graded as low, some concerns, or high for each of the 5 domains before an overall assessment was made.
Analysis
As a descriptive analysis of results, we reported the overall proportion of studies with/without a null finding; then, within each group (finding versus null finding), we reported on various trial aspects, including: (1) selection of primary outcome measure, (2) timing of primary endpoint, (3) anticipated clinical effect size, (4) study sample size and power, (5) statistical analysis methodology and rigor, and (6) methodological quality.
Results
Search Results
After the literature search and deduplication, 1283 records were identified. After title and abstract screening, 121 full-text reports were reviewed and 20 met eligibility criteria.17–36 These 20 reports included 21 studies since Kimberlin 201124 reported 2 parallel trials investigating 2 distinct participant subpopulations:
infants with herpes simplex virus (HSV) disease with central nervous system involvement; and
infants with HSV with skin, eye, and mouth involvement only.
The Preferred Reporting Items for Systemic Reviews and Meta-Analyses37 flowchart of the search process is presented in Fig 1.
Preferred Reporting Items for Systemic Reviews and Meta-Analyses flow diagram of the study selection process.
Preferred Reporting Items for Systemic Reviews and Meta-Analyses flow diagram of the study selection process.
Study Characteristics
Summaries of the included studies are presented in Table 1, including details of the participants, intervention, and comparator, and Table 2, detailing primary outcome, statistical analysis, and results. Of the 21 included studies, the majority (n = 18) were superiority trials, with 3 designed to determine either equivalence or noninferiority.
Summary of Extracted Participant, Intervention, Comparator and Outcome Data From Included Studies
Source . | Participants . | Intervention and Comparator . | Primary Outcome/s . | ||||
---|---|---|---|---|---|---|---|
Type . | Intervention Details . | Comparator Details . | Age at Onset . | Duration . | |||
Andrew 201817 | Infants with neurologic impairment risk factors (eg, preterm infants with low birth weight or a brain injury) | Nutritional supplement | Treatment supplement with DHA, eicosapentaenoic acid, arachidonic acid, choline, UMP, cytidine monophosphate, vitamin B12, zinc, and iodine, given daily mixed into formula, breast milk, or food | Control supplement, given daily mixed into formula, breast milk, or food (placebo) | NS (once infants were on full milk feeds) | 2 y from trial entry | BSID-III: cognitive composite score |
Balakrishnan 201818 | Infants with very low birth weight (<1250 g) | Nutritional supplement | High-dose parenteral amino acids in HAL solution (increasing quickly by 3–4 g/kg per d) | Standard-dose parenteral amino acids in HAL solution (increasing slowly by 0.5 g/kg per d) | Within 19 h of birth | As clinically indicated (HAL administered until on full feeds) | BSID-III: cognitive composite score, language composite score, and motor composite score |
Carlo 201319 | Infants with birth asphyxia who received bag and mask ventilationa | Behavioral | Home-based, early developmental intervention, implemented daily by parents with fortnightly trainer visits | Fortnightly health and safety counseling (standard care) | Within first mo of life | Until 36 mo of age | BSID-II: mental development index |
da Cunha 201620 | Preterm infants with very low birth weight (<1500 g) | Nutritional supplement | Breast milk supplemented with a multinutrient given twice daily | Breast milk alone | 7–10 d after NICU discharge | Until 4–6 mo’ corrected age | BSID-III: motor composite score, cognitive composite score, and language composite score |
Field 201321 | Infants with acute respiratory failure requiring ECMO | Procedural | Cooling (34°C for the first 48–72 h) while administering ECMO | Normothermic (37°C) ECMO (standard care) | As clinically indicated | 48–72 h from ECMO initiation, standard ECMO course as required | BSID-III: cognitive composite score |
Hulzebos 201422 | Preterm infants (≤31 + 6 wk’ GA) | Procedural | Treatment decisions based on TSB/albumin ratio, together with TSB for evaluation of hyperbilirubinemia | Treatment decisions based on TSB level only (standard care) | NS (as required) | First 10 d of monitoring/treatment | BSID-III: motor composite score |
Khan 201823 | Healthy infants visiting private GP clinic, >2500 g at birth, living in poor, urban location | Behavioral | Clinic-based, flip-book program including parent training for age-appropriate activities for early childhood development, improved nutrition, and management of mother’s depression; mother implemented with quarterly counseling by clinic assistants | Routine care in control (nonchild development) clinics (standard care) | <40 d old | From 3–9 mo of age | ASQ-3, Urdu: communication, gross motor, fine motor, problem-solving, and personal–social |
Kimberlin 201124 Kimberlin 201124 | Infants with HSV disease with CNS involvement Infants with HSV disease with skin, eye, and mouth disease only | Drug | Oral acyclovir, 300 mg per square meter of body-surface area, 3 times daily (after initial parenteral administration) | NS (placebo) | Within 28 d of life | 6 mo from onset | BSID-II: mental development index BSID-II: mental development index |
Kulkarni 201725 | Infants with postinfectious hydrocephalus | Procedural | Endoscopic third ventriculostomy with choroid plexus cauterization | Ventriculoperitoneal shunting (standard care) | As required (within 180 d of birth) | Once-off surgical procedure | BSID-III: cognitive scaled score |
Li 201926 | Healthy, term infants | Nutritional supplement | Formula supplemented with bovine milk fat globule membrane and lactoferrin given exclusively | Control formula given exclusively (placebo) | D 10–14 of life | Until 12 mo of age | BSID-III: cognitive composite score |
McCann 201927 | Infants (born >26 wk’ GA) scheduled for inguinal herniorrhaphy | Procedural | General anesthesia | Awake–regional anesthesia | ≤60 wk’ postmenstrual age | Once-off during surgery | WPPSI-III: full-scale IQ score |
Nair, 200928 | Term infants with postasphyxial encephalopathy | Drug | Pyritinol, increasing dose from 20 mg per d to 100 mg per d by 6 mo | NS (placebo) | D 8 of life | Until 6 mo of age | BSID-II, Baroda, India norms: mental development index and psychomotor development index |
Nair 200929 | At-risk infants (infants admitted to level II neonatal nursery) | Behavioral | Home-based program, including visual, auditory, tactile, and vestibular-kinaesthetic stimulations, parent-administered with monthly follow-up visits | Routine postnatal checkup (standard care) | NS | Until 12 mo of age | BSID-II, Baroda, India norms: mental development index and psychomotor development index |
Natalucci 201630 | Very preterm infants (26 + 0–31 + 6 wk’ GA) | Drug | High-dose recombinant human erythropoietin, 3000 U/kg intravenously | Isotonic saline (placebo) | Within 3 h of birth | 3 doses at 3, 12–18 and 36–42 h after birth | BSID-II: mental development index |
O’Connor 201631 | Very low birth weight infants (<1500 g) | Nutritional supplement | Nutrient-fortified, pasteurized donor breast milk, to supplement mother’s milk | Preterm formula, to supplement mother’s milk | Within 96 h of birth | 90 d from onset or until discharge from the hospital | BSID-III: cognitive composite score |
Shi 202032 | Healthy, term infants, >2500 g at birth, living in urban, developing communities | Behavioral | Clinic-based program, including parent training for age-appropriate games and activities; parenting training sessions for child development, feeding, parent–child communication and early stimulation skills; telephone intervention for children at risk for developmental delay; parent implemented with 2 training sessions from child development experts | Routine primary health care services (standard care) | 1–2 mo of age | Until 14 mo of age | ASQ-3, Chinese: total score |
Spittle 201033 | Preterm infants (<30 wk’ GA) | Behavioral | Home-based program to support infant development, parent mental health, and the parent–infant relationship, delivered by a psychologist and a physiotherapist | Routine follow-up care (standard care) | Term-equivalent age | 9 visits during first 12 mo of age | BSID-III: cognitive composite score, motor composite score, and language composite score |
van Kempen 202034 | Otherwise healthy newborns (born >2000 g, ≥35 wk’ GA), found to have asymptomatic moderate hypoglycemia, falling into 4 risk subgroups | Procedural | Treatment decision based on lower (36 mg/dL) glucose concentration threshold for neonatal hypoglycemia | Treatment decision based on traditional (47 mg/dL) glucose concentration (standard care) | 3–24 h after birth | As clinically indicated | BSID-III, Dutch: cognitive composite score and motor composite score |
Williams 201735 | Extremely preterm infants (<31 wk’ GA) | Nutritional supplement | Sodium iodide solution, 30 µg/kg per d, given daily | Sodium chloride solution, 30 µg/kg per d, given daily (placebo) | Within 42 h of birth | Until 34 wk’ GA equivalent | BSID-III: cognitive composite score, motor composite score, and language composite score |
Xia 202136 | Healthy, term infants | Nutritional supplement | Formula supplemented with milk fat globule membrane (17.9 mg gangliosides/100 g for 0–6 mo, then 16.9 mg/100 g for 6–12 mo), given exclusively | Control formula, given exclusively | NS | Until 12 mo of age | BSID-III: cognitive composite score, language composite score, motor composite score, social–emotional composite score, and general adaptive behavior composite score |
Source . | Participants . | Intervention and Comparator . | Primary Outcome/s . | ||||
---|---|---|---|---|---|---|---|
Type . | Intervention Details . | Comparator Details . | Age at Onset . | Duration . | |||
Andrew 201817 | Infants with neurologic impairment risk factors (eg, preterm infants with low birth weight or a brain injury) | Nutritional supplement | Treatment supplement with DHA, eicosapentaenoic acid, arachidonic acid, choline, UMP, cytidine monophosphate, vitamin B12, zinc, and iodine, given daily mixed into formula, breast milk, or food | Control supplement, given daily mixed into formula, breast milk, or food (placebo) | NS (once infants were on full milk feeds) | 2 y from trial entry | BSID-III: cognitive composite score |
Balakrishnan 201818 | Infants with very low birth weight (<1250 g) | Nutritional supplement | High-dose parenteral amino acids in HAL solution (increasing quickly by 3–4 g/kg per d) | Standard-dose parenteral amino acids in HAL solution (increasing slowly by 0.5 g/kg per d) | Within 19 h of birth | As clinically indicated (HAL administered until on full feeds) | BSID-III: cognitive composite score, language composite score, and motor composite score |
Carlo 201319 | Infants with birth asphyxia who received bag and mask ventilationa | Behavioral | Home-based, early developmental intervention, implemented daily by parents with fortnightly trainer visits | Fortnightly health and safety counseling (standard care) | Within first mo of life | Until 36 mo of age | BSID-II: mental development index |
da Cunha 201620 | Preterm infants with very low birth weight (<1500 g) | Nutritional supplement | Breast milk supplemented with a multinutrient given twice daily | Breast milk alone | 7–10 d after NICU discharge | Until 4–6 mo’ corrected age | BSID-III: motor composite score, cognitive composite score, and language composite score |
Field 201321 | Infants with acute respiratory failure requiring ECMO | Procedural | Cooling (34°C for the first 48–72 h) while administering ECMO | Normothermic (37°C) ECMO (standard care) | As clinically indicated | 48–72 h from ECMO initiation, standard ECMO course as required | BSID-III: cognitive composite score |
Hulzebos 201422 | Preterm infants (≤31 + 6 wk’ GA) | Procedural | Treatment decisions based on TSB/albumin ratio, together with TSB for evaluation of hyperbilirubinemia | Treatment decisions based on TSB level only (standard care) | NS (as required) | First 10 d of monitoring/treatment | BSID-III: motor composite score |
Khan 201823 | Healthy infants visiting private GP clinic, >2500 g at birth, living in poor, urban location | Behavioral | Clinic-based, flip-book program including parent training for age-appropriate activities for early childhood development, improved nutrition, and management of mother’s depression; mother implemented with quarterly counseling by clinic assistants | Routine care in control (nonchild development) clinics (standard care) | <40 d old | From 3–9 mo of age | ASQ-3, Urdu: communication, gross motor, fine motor, problem-solving, and personal–social |
Kimberlin 201124 Kimberlin 201124 | Infants with HSV disease with CNS involvement Infants with HSV disease with skin, eye, and mouth disease only | Drug | Oral acyclovir, 300 mg per square meter of body-surface area, 3 times daily (after initial parenteral administration) | NS (placebo) | Within 28 d of life | 6 mo from onset | BSID-II: mental development index BSID-II: mental development index |
Kulkarni 201725 | Infants with postinfectious hydrocephalus | Procedural | Endoscopic third ventriculostomy with choroid plexus cauterization | Ventriculoperitoneal shunting (standard care) | As required (within 180 d of birth) | Once-off surgical procedure | BSID-III: cognitive scaled score |
Li 201926 | Healthy, term infants | Nutritional supplement | Formula supplemented with bovine milk fat globule membrane and lactoferrin given exclusively | Control formula given exclusively (placebo) | D 10–14 of life | Until 12 mo of age | BSID-III: cognitive composite score |
McCann 201927 | Infants (born >26 wk’ GA) scheduled for inguinal herniorrhaphy | Procedural | General anesthesia | Awake–regional anesthesia | ≤60 wk’ postmenstrual age | Once-off during surgery | WPPSI-III: full-scale IQ score |
Nair, 200928 | Term infants with postasphyxial encephalopathy | Drug | Pyritinol, increasing dose from 20 mg per d to 100 mg per d by 6 mo | NS (placebo) | D 8 of life | Until 6 mo of age | BSID-II, Baroda, India norms: mental development index and psychomotor development index |
Nair 200929 | At-risk infants (infants admitted to level II neonatal nursery) | Behavioral | Home-based program, including visual, auditory, tactile, and vestibular-kinaesthetic stimulations, parent-administered with monthly follow-up visits | Routine postnatal checkup (standard care) | NS | Until 12 mo of age | BSID-II, Baroda, India norms: mental development index and psychomotor development index |
Natalucci 201630 | Very preterm infants (26 + 0–31 + 6 wk’ GA) | Drug | High-dose recombinant human erythropoietin, 3000 U/kg intravenously | Isotonic saline (placebo) | Within 3 h of birth | 3 doses at 3, 12–18 and 36–42 h after birth | BSID-II: mental development index |
O’Connor 201631 | Very low birth weight infants (<1500 g) | Nutritional supplement | Nutrient-fortified, pasteurized donor breast milk, to supplement mother’s milk | Preterm formula, to supplement mother’s milk | Within 96 h of birth | 90 d from onset or until discharge from the hospital | BSID-III: cognitive composite score |
Shi 202032 | Healthy, term infants, >2500 g at birth, living in urban, developing communities | Behavioral | Clinic-based program, including parent training for age-appropriate games and activities; parenting training sessions for child development, feeding, parent–child communication and early stimulation skills; telephone intervention for children at risk for developmental delay; parent implemented with 2 training sessions from child development experts | Routine primary health care services (standard care) | 1–2 mo of age | Until 14 mo of age | ASQ-3, Chinese: total score |
Spittle 201033 | Preterm infants (<30 wk’ GA) | Behavioral | Home-based program to support infant development, parent mental health, and the parent–infant relationship, delivered by a psychologist and a physiotherapist | Routine follow-up care (standard care) | Term-equivalent age | 9 visits during first 12 mo of age | BSID-III: cognitive composite score, motor composite score, and language composite score |
van Kempen 202034 | Otherwise healthy newborns (born >2000 g, ≥35 wk’ GA), found to have asymptomatic moderate hypoglycemia, falling into 4 risk subgroups | Procedural | Treatment decision based on lower (36 mg/dL) glucose concentration threshold for neonatal hypoglycemia | Treatment decision based on traditional (47 mg/dL) glucose concentration (standard care) | 3–24 h after birth | As clinically indicated | BSID-III, Dutch: cognitive composite score and motor composite score |
Williams 201735 | Extremely preterm infants (<31 wk’ GA) | Nutritional supplement | Sodium iodide solution, 30 µg/kg per d, given daily | Sodium chloride solution, 30 µg/kg per d, given daily (placebo) | Within 42 h of birth | Until 34 wk’ GA equivalent | BSID-III: cognitive composite score, motor composite score, and language composite score |
Xia 202136 | Healthy, term infants | Nutritional supplement | Formula supplemented with milk fat globule membrane (17.9 mg gangliosides/100 g for 0–6 mo, then 16.9 mg/100 g for 6–12 mo), given exclusively | Control formula, given exclusively | NS | Until 12 mo of age | BSID-III: cognitive composite score, language composite score, motor composite score, social–emotional composite score, and general adaptive behavior composite score |
CNS, central nervous system; DHA, docosahexaenoic acid; ECMO, extracorporeal membrane oxygenation; GA, gestational age; GP, general practitioner; HAL, hyperalimentation; NS, not specified; TSB, total serum bilirubin; UMP, uridine-5-monophosphate; WPPSI, Wechsler Preschool and Primary Scale of Intelligence.
This study also included infants who did not require resuscitation who were randomized to 2 groups (intervention and control). These results are not presented here because the study objective was to determine if the intervention improves neurodevelopment in resuscitated children.
Summary of Extracted Primary Outcome, Statistical Analysis and Results Data From Included Studies
Source . | Primary Outcome, Statistical Analysis, and Results . | ||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
. | Test . | Subdomain/ Outcome Score . | Age Assessed . | Anticipated Effect Size . | Target Power Level . | Recruitment Target (n) . | Analysis Target (n) . | Analyzed n (Intervention, Control) . | Sample Size Met?a . | Stated ITT Analysis? . | Missing Data Strategy Provided? . | Analysis Method . | Detected Effect Size . | Outcome . | Pb . |
Andrew 201817 | BSID- III | Cognitive composite score | 24 mo | 12.5 points | 80% | 60 (30 per group) | 48 (24 per group) | 43 (24, 19) | No | Yes | Yes | Parametric complex | 9.0 points | Null | .13 |
Balakrishnan 201818 | BSID-III | Cognitive composite score | 18–24 mo CA | 8.5 points | 80% | 168 (84 per group) | 100 (50 per group) | 112 (54, 58) | Yes | No | No | Parametric simple | 0.4 points | Null | .86 |
Language composite score | 2.3 points | Null | .41 | ||||||||||||
Motor composite score | 0.1 points | Null | .99 | ||||||||||||
Carlo 201319 | BSID-II | Mental development index | 36 mo | 10 points | 90% | 120 (60 per group)c | 80 (40 per group) | 123 (59,64) | Yes | Yes | No | Parametric Simple | 4.6 points | Null | .0202 |
da Cunha 201620 | BSID-III | Motor composite score | 12 mo CA | 10 points | 80% | 46 (23 per group) | 46 (23 per group) | 53 (26, 27) | Yes | No | No | Parametric simple | 4.6 points | Null | .174 |
Cognitive composite score | 2.6 points | Null | .443 | ||||||||||||
Language composite score | 3.4 points | Null | .344 | ||||||||||||
Field 201321 | BSID-III | Cognitive composite score | 24 mo CA | 10 points | 90% | 118 (59 per group) | 94 (47 per group) | 93 (45, 48) | Yes | Yes | Yes | Parametric simple | 2.6 points | Null | NS |
Hulzebos 201422 | BSID-III | Motor composite score | 18–24 mo CA | 7 points | 80% | 614 (307 per group) | 434 (217 per group) | 480 (237, 243) | Yes | Yes | No | Parametric simple | 1 point | Null | .49 |
Khan 201823 | ASQ-3, Urdu | Communication, gross motor, fine motor, problem-solving, and personal– sociald | 12 mo | 20% absolute difference in risk of delay | 80% | 2112 (1056 per group) | 1900 (950 per group) | 1957 (1037, 920) | Yes | Yes | No | Parametric simple | 18% difference in risk | Finding | <.001 |
Kimberlin 201124 | BSID-II | Mental development index | 12 mo | 20% absolute difference in risk of no or mild impairment | 80% | 66 (33 per group)e | 58 (29 per group)e | 28 (16, 12) | No | No | No | Parametric complex | 20.1 pointsf | Finding | .046 |
Kimberlin 201124 | BSID-II | Mental development index | 12 mo | 20% absolute difference in risk of no or mild impairment | 80% | 66 (33 per group)e | 58 (29 per group)e | 15 (8, 7) | No | No | No | Parametric simple | NS | Null | NS |
Kulkarni 201725 | BSID-III | Cognitive scaled score | 12 mo postsurgery | 3 points | 90% | 100 (50 per group) | 75 (37 per group) | 94 (47, 47) | Yes | Yes | Yes | Nonparametric simple | 2 points | Equivalence confirmed | .35 |
Li 201926 | BSID-III | Cognitive composite score | 12 mo | 5 points | 80% | 450 (225 per group) | 286 (143 per group) | 291 (143, 148) | Yes | No | No | Parametric simple | 8.7 points | Finding | <.001 |
McCann 201927 | WPPSI-III | Full-scale IQ score | 5 y ± 4 mo | <5 points | 90% | 720 (360 per group) | 598 (299 per group) | 719 (358, 361) | Yes | Nog | Yes | Parametric complex | 0.2 points | Equivalence confirmed | NS |
Nair 200928 | BSID-II, Baroda, India norms | Mental development index Psychomotor development index | 12 mo | 16 points | 90% | 108 (54 per group) | 102 (51 per group) | 100 (51, 49) | Yes | Yes | No | Parametric simple | 1.2 points 4.3 points | Null Null | .75 .31 |
Nair 200929 | BSID-II, Baroda, India norms | Mental development index | 12 mo 24 mo | 4 points | 80% | 800 (400 per group) | 672 (336 per group) | 12 mo: 665 (324, 341) 24 mo: 735 (358, 377) | Yes | No | No | Parametric simple | 5.1 points 7.2 points | Finding Finding | <.001 <.005 |
Psychomotor development index | 12 mo 24 mo | 2.8 points 4.1 points | Finding Finding | <.001 <.005 | |||||||||||
Natalucci 201630 | BSID-II | Mental development index | 24 mo CA | 16 points | 90% | 422 (211 per group) | 352 (176 per group) | 365 (191, 174) | Yes | Yes | Yes | Parametric simple | 1 point | Null | .56 |
O’Connor 201631 | BSID-III | Cognitive composite score | 18 mo CA | 5 points | 80% | 352 (176 per group) | 282 (141 per group) | 299 (151, 148) | Yes | Yes | Yes | Parametric complex | 1.6 points | Null | .41 |
Shi 202032 | ASQ-3, Chinese | Total score | 14 mo | 10 points | 80% | 166 (83 per group) | 132 (66 per group) | 140 (71, 69) | Yes | No | No | Parametric complex | z score = 0.25 | Finding | <.01 |
Spittle 201033 | BSID-III | Cognitive composite score | 24 mo CA | 6 points | 80% | 120 (60 per group) | 120 (60 per group)h | 115 (58, 57) | No | No | No | Parametric simple | 3.4 points | Null | .20 |
Motor composite score | 1.4 points | Null | .66 | ||||||||||||
Language composite score | 1.3 points | Null | .67 | ||||||||||||
van Kempen 202034 | BSID-III, Dutch | Cognitive composite score | 18 mo CA | 7.5 points | 90% | 800 (400 per group; 200 per risk subgroup) | 680 (340 per group; 170 per risk subgroup) | 582 (287, 295) | No | Yes | Yes | Parametric simple | 0.7 points | Noninferiority confirmed | NS |
Motor composite score | 0.3 points | Noninferiority confirmed | NS | ||||||||||||
Williams 201735 | BSID-III | Cognitive composite score | 24 ± 1 mo CA | 6 points | 90% | 1400 (700 per group) | 1400 (700 per group) | 1259 (631, 628) | No | Yes | Yes | Parametric simple | 0.3 points | Null | .77 |
Motor composite score | 0.2 points | Null | .87 | ||||||||||||
Language composite score | 0.0 points | Null | .97 | ||||||||||||
Xia 202136 | BSID-III | Cognitive composite score | 12 mo | 4 points | 90% | 240 (120 per group) | 176 (88 per group) | 175 (92, 83) | No | Yes | No | Parametric simple | 2.6 points | Null | .77 |
Language composite score | 0.4 points | Null | .27 | ||||||||||||
Motor composite score | 0.9 points | Null | .80 | ||||||||||||
Social–emotional composite score | 3.5 points | Nulli | .82 | ||||||||||||
General adaptive behavior composite score | 5.6 points | Nullj | .06 |
Source . | Primary Outcome, Statistical Analysis, and Results . | ||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
. | Test . | Subdomain/ Outcome Score . | Age Assessed . | Anticipated Effect Size . | Target Power Level . | Recruitment Target (n) . | Analysis Target (n) . | Analyzed n (Intervention, Control) . | Sample Size Met?a . | Stated ITT Analysis? . | Missing Data Strategy Provided? . | Analysis Method . | Detected Effect Size . | Outcome . | Pb . |
Andrew 201817 | BSID- III | Cognitive composite score | 24 mo | 12.5 points | 80% | 60 (30 per group) | 48 (24 per group) | 43 (24, 19) | No | Yes | Yes | Parametric complex | 9.0 points | Null | .13 |
Balakrishnan 201818 | BSID-III | Cognitive composite score | 18–24 mo CA | 8.5 points | 80% | 168 (84 per group) | 100 (50 per group) | 112 (54, 58) | Yes | No | No | Parametric simple | 0.4 points | Null | .86 |
Language composite score | 2.3 points | Null | .41 | ||||||||||||
Motor composite score | 0.1 points | Null | .99 | ||||||||||||
Carlo 201319 | BSID-II | Mental development index | 36 mo | 10 points | 90% | 120 (60 per group)c | 80 (40 per group) | 123 (59,64) | Yes | Yes | No | Parametric Simple | 4.6 points | Null | .0202 |
da Cunha 201620 | BSID-III | Motor composite score | 12 mo CA | 10 points | 80% | 46 (23 per group) | 46 (23 per group) | 53 (26, 27) | Yes | No | No | Parametric simple | 4.6 points | Null | .174 |
Cognitive composite score | 2.6 points | Null | .443 | ||||||||||||
Language composite score | 3.4 points | Null | .344 | ||||||||||||
Field 201321 | BSID-III | Cognitive composite score | 24 mo CA | 10 points | 90% | 118 (59 per group) | 94 (47 per group) | 93 (45, 48) | Yes | Yes | Yes | Parametric simple | 2.6 points | Null | NS |
Hulzebos 201422 | BSID-III | Motor composite score | 18–24 mo CA | 7 points | 80% | 614 (307 per group) | 434 (217 per group) | 480 (237, 243) | Yes | Yes | No | Parametric simple | 1 point | Null | .49 |
Khan 201823 | ASQ-3, Urdu | Communication, gross motor, fine motor, problem-solving, and personal– sociald | 12 mo | 20% absolute difference in risk of delay | 80% | 2112 (1056 per group) | 1900 (950 per group) | 1957 (1037, 920) | Yes | Yes | No | Parametric simple | 18% difference in risk | Finding | <.001 |
Kimberlin 201124 | BSID-II | Mental development index | 12 mo | 20% absolute difference in risk of no or mild impairment | 80% | 66 (33 per group)e | 58 (29 per group)e | 28 (16, 12) | No | No | No | Parametric complex | 20.1 pointsf | Finding | .046 |
Kimberlin 201124 | BSID-II | Mental development index | 12 mo | 20% absolute difference in risk of no or mild impairment | 80% | 66 (33 per group)e | 58 (29 per group)e | 15 (8, 7) | No | No | No | Parametric simple | NS | Null | NS |
Kulkarni 201725 | BSID-III | Cognitive scaled score | 12 mo postsurgery | 3 points | 90% | 100 (50 per group) | 75 (37 per group) | 94 (47, 47) | Yes | Yes | Yes | Nonparametric simple | 2 points | Equivalence confirmed | .35 |
Li 201926 | BSID-III | Cognitive composite score | 12 mo | 5 points | 80% | 450 (225 per group) | 286 (143 per group) | 291 (143, 148) | Yes | No | No | Parametric simple | 8.7 points | Finding | <.001 |
McCann 201927 | WPPSI-III | Full-scale IQ score | 5 y ± 4 mo | <5 points | 90% | 720 (360 per group) | 598 (299 per group) | 719 (358, 361) | Yes | Nog | Yes | Parametric complex | 0.2 points | Equivalence confirmed | NS |
Nair 200928 | BSID-II, Baroda, India norms | Mental development index Psychomotor development index | 12 mo | 16 points | 90% | 108 (54 per group) | 102 (51 per group) | 100 (51, 49) | Yes | Yes | No | Parametric simple | 1.2 points 4.3 points | Null Null | .75 .31 |
Nair 200929 | BSID-II, Baroda, India norms | Mental development index | 12 mo 24 mo | 4 points | 80% | 800 (400 per group) | 672 (336 per group) | 12 mo: 665 (324, 341) 24 mo: 735 (358, 377) | Yes | No | No | Parametric simple | 5.1 points 7.2 points | Finding Finding | <.001 <.005 |
Psychomotor development index | 12 mo 24 mo | 2.8 points 4.1 points | Finding Finding | <.001 <.005 | |||||||||||
Natalucci 201630 | BSID-II | Mental development index | 24 mo CA | 16 points | 90% | 422 (211 per group) | 352 (176 per group) | 365 (191, 174) | Yes | Yes | Yes | Parametric simple | 1 point | Null | .56 |
O’Connor 201631 | BSID-III | Cognitive composite score | 18 mo CA | 5 points | 80% | 352 (176 per group) | 282 (141 per group) | 299 (151, 148) | Yes | Yes | Yes | Parametric complex | 1.6 points | Null | .41 |
Shi 202032 | ASQ-3, Chinese | Total score | 14 mo | 10 points | 80% | 166 (83 per group) | 132 (66 per group) | 140 (71, 69) | Yes | No | No | Parametric complex | z score = 0.25 | Finding | <.01 |
Spittle 201033 | BSID-III | Cognitive composite score | 24 mo CA | 6 points | 80% | 120 (60 per group) | 120 (60 per group)h | 115 (58, 57) | No | No | No | Parametric simple | 3.4 points | Null | .20 |
Motor composite score | 1.4 points | Null | .66 | ||||||||||||
Language composite score | 1.3 points | Null | .67 | ||||||||||||
van Kempen 202034 | BSID-III, Dutch | Cognitive composite score | 18 mo CA | 7.5 points | 90% | 800 (400 per group; 200 per risk subgroup) | 680 (340 per group; 170 per risk subgroup) | 582 (287, 295) | No | Yes | Yes | Parametric simple | 0.7 points | Noninferiority confirmed | NS |
Motor composite score | 0.3 points | Noninferiority confirmed | NS | ||||||||||||
Williams 201735 | BSID-III | Cognitive composite score | 24 ± 1 mo CA | 6 points | 90% | 1400 (700 per group) | 1400 (700 per group) | 1259 (631, 628) | No | Yes | Yes | Parametric simple | 0.3 points | Null | .77 |
Motor composite score | 0.2 points | Null | .87 | ||||||||||||
Language composite score | 0.0 points | Null | .97 | ||||||||||||
Xia 202136 | BSID-III | Cognitive composite score | 12 mo | 4 points | 90% | 240 (120 per group) | 176 (88 per group) | 175 (92, 83) | No | Yes | No | Parametric simple | 2.6 points | Null | .77 |
Language composite score | 0.4 points | Null | .27 | ||||||||||||
Motor composite score | 0.9 points | Null | .80 | ||||||||||||
Social–emotional composite score | 3.5 points | Nulli | .82 | ||||||||||||
General adaptive behavior composite score | 5.6 points | Nullj | .06 |
CA, corrected age; NS, not specified; WPPSI, Wechsler Preschool and Primary Scale of Intelligence.
Sample size determined to be met if within 5%.
Unadjusted/least-adjusted P value reported.
Target sample size for total study = 240: 4 groups of 60 per group.
Primary outcome was binary indicator of whether a child had reduced delay in 2 or more child development domains of the ASQ. These domains are provided.
The original sample size was revised down in a protocol amendment.
Protocol was amended to change the primary outcome from a percentage change to a point change.
A per-protocol analysis was explicitly selected for methodological reasons.
Sample size calculation estimated that a sample size of 200 (100 per group) was required; however, sample was reduced because of funding constraints.
Xia 2021 reported a statistically significant adjusted P value of .048 for this outcome; however, the detected effect size was below their anticipated 4-point change.
Xia 2021 reported a finding on this outcome according to their adjusted P value of .004.
Participants, Interventions, Comparators, and Outcomes
Most studies (n = 17) recruited infants from high-risk populations, including preterm and/or low birth weight infants; infants with birth asphyxia; infants diagnosed with a range of conditions such as hydrocephalus, respiratory failure, HSV disease, or hypoglycemia; or those scheduled for inguinal herniorrhaphy. In contrast, 4 studies recruited healthy infants (Table 1). The 21 studies aimed to recruit 9048 participants in total, with individual study sample sizes ranging from 46 to 2112 participants (Table 2).
A variety of interventions was studied, broadly categorized into nutritional supplements (n = 7), behavioral interventions (n = 5), procedural interventions (n = 5), and drug interventions (n = 4) (Table 1). Because of the diversity of study interventions investigated, the intervention details, comparator, age at onset, and intervention duration varied widely.
Outcome assessment tools used were the Bayley Scales of Infant and Toddler Development (BSID, n = 18), Ages and Stages Questionnaire (ASQ, n = 2), and Wechsler Preschool and Primary Scale of Intelligence (n = 1) (Table 2). Among studies that administered the BSID, 6 used the BSID-II and 12 used the BSID-III. There was substantial variability among studies as to the number and range of neurodevelopmental domains used for the primary outcome. Ten of the 21 studies reported a single domain/composite score as the primary outcome. The remainder of studies elected the primary outcome to encompass >1 developmental domain/composite score (eg, cognitive, motor, and language composite scores of the BSID-III, or both the mental and psychomotor developmental indexes of the BSID-II) (Table 2). This resulted in a total of 38 trial outcomes/results reported across all studies. Study primary endpoints ranged from 12 months to 5 years and 4 months of age, with the majority following up at either 12 (n = 7 of 21) or 24 (n = 5 of 21) months of age (Table 2). One study29 specified 2 primary outcome time points (12 and 24 months).
Methodologic Quality
Overall, 8 of the 21 studies (38%) were rated as high risk of bias in at least 1 domain, resulting in an overall high risk of bias assessment (Fig 2). Missing outcome data were the most common source of bias identified in high-risk studies, followed by deviations from the intended interventions. A further 7 studies (33%) were rated to have some concerns, often related to risk arising from deviations from the intended interventions. Six studies (29%) had low risk of bias across all domains.
Proportion of Studies Reporting a Finding versus Null Finding
Of the 18 superiority trials, 5 (28%) reported a finding (ie, a statistically different between-group difference of at least the anticipated effect size, favoring the intervention) on the primary outcome/s of the trial,23,24,26,29,32 and 13 studies (72%) reported a null finding on the primary outcome/s17–22,24,28,30,31,33,35,36 (Table 2). Of note, Kimberlin 2011 reported a finding for infants with central nervous system involvement, but not for those with skin, eye, and mouth disease only.24 The 3 equivalence/noninferiority studies all confirmed equivalence/noninferiority across their primary outcomes.25,27,34 These studies are excluded from our analysis of finding versus null finding because, in this type of study, a small or no-between group difference is the hypothesis under investigation.
Features of Studies Potentially Contributing to a Null Finding Result
Selection of Primary Outcome Measure
Of the trials that reported a finding, 3 of 5 (60%) used the BSID-II or -III and 2 of 5 (40%) used the ASQ-3. Of the 13 null finding studies, 100% used the BSID (n = 4 used BSID-II and n = 9 used BSID-III).
Timing of Primary Endpoint
Accounting for the double endpoint (12 and 24 months) used by Nair 200929 by dividing this study in half, 90% (n = 4.5 of 5) of studies that reported a finding had a primary endpoint at <18 months of age. In contrast, only 31% (n = 4 of 13) of null finding studies captured the primary outcome before 18 months.
Anticipated Clinical Effect Size
Of the 14 studies that targeted a minimum points-change using the BSID, the mean anticipated effect size across null finding studies was 9.3 composite score points, compared with only 4.5 for the findings studies. The anticipated effect size for the remaining studies could not be directly compared because the 2 studies that used the ASQ-3 for their primary outcome employed different analysis methods (categorical23 versus continuous32 ), and the anticipated clinical effect size for Kimberlin 201124 could not be determined from the study protocol.
Study Sample Size and Power
Three of the 5 findings studies (60%) and 8 of the 13 studies that reported a null finding (62%) achieved their target sample size during recruitment. Of studies that reported a finding, the median target sample size required for primary outcome analysis was 286 (range 58–1900) and the median analyzed was 291 participants (range 28–1957). In contrast, for studies reporting a null finding, the median target sample size was 102 (range 46–1400), with 115 participants analyzed (range 15–1259).
All superiority trials used either 80% or 90% as their level of statistical power when calculating their target sample size. Interestingly, all 5 (100%) trials that reported a finding targeted the lower 80% power, whereas, of those studies that did not report a finding, 7 (54%) targeted 80% power and 6 (46%) targeted the higher 90% level of power.
Statistical Analysis Methodology and Rigor
Of the 14 superiority studies reporting an unadjusted treatment effect as their primary outcome, 11 reported a null finding and 3 reported a finding. Nine of these 14 studies also undertook secondary analyses where adjustment for confounders was considered; however, this resulted in a different study outcome in only 1 case.36 Of the remaining 4 studies, 2 reported an adjusted treatment effect from a multiple linear regression/analysis of covariance (1 finding and 1 null finding), and 2 used adjusted mixed effects models (1 finding and 1 null finding).
Seventeen of the superiority trials used a (frequentist) parametric analysis method requiring validation of parametric assumptions. Notably, no studies that reported a finding commented on the parametric assumptions of their chosen analysis method, whereas just over half (n = 7 of 13) of the studies that reported a null finding referenced these assumptions, either explicitly stating they were checked and satisfied or outlining strategies to check the validity of these assumptions in the protocol. The remaining study used a Bayesian model and reported a finding, but similarly did not include details of model-checking appropriate for Bayesian methods.
Only 1 (20%) of the studies that reported a finding stated that an ITT analysis was performed, whereas, in the remaining 4 studies (80%) there was no comment made or it was unclear. Of the 13 studies that did not report a finding, 9 (69%) stated that an ITT analysis had been conducted, whereas it was unclear for the remaining 4. Only 5 of the 18 included superiority studies, all of which found a null finding, made specific mention of strategies for dealing with missing data.
Methodological Quality
All studies that reported a finding were assessed to be either high risk of bias (3 of 5, 60%) or had some concerns (2 of 5, 40%). For those studies that reported a null finding, 38% (5 of 13) were assessed to have a low overall risk of bias, with 4 of 13 (31%) studies deemed high risk, and 4 of 13 (31%) had some concerns.
Discussion
With the discovery of life-preserving treatments for at-risk neonate and infant populations, the need to improve neurodevelopmental outcome has gained focus. Effective neurodevelopmental interventions for these infants are a high priority, yet there remains a paucity of evidence-based treatments available. We conducted a systematic review to better understand the challenges and issues related to conducting infant clinical trials aimed at improving neurodevelopmental outcomes, with the aim to make recommendations, where appropriate, for future trials (Table 3). Results demonstrated that the majority of the included studies reported a null finding. This was a surprising result given the known problem of preferential publication of studies with positive findings,38 and supports the importance of publishing all trial results, regardless of outcome.
Recommendations for Future Infant Neurodevelopmental Clinical Trials
Recommendations From This Systematic Review . |
---|
• Develop, validate, and renorm assessment tools to ensure measures with sensitive-outcome measurement properties, such as responsiveness to change, are available. |
• Consider more sophisticated primary outcome designs/analyses that allow for powered assessment of multiple outcomes of interest. |
• Follow up with trial participants beyond 1–2 years of life, ideally to school age. |
• Use endpoint measures that have responsivity psychometric properties. |
• Do not overinflate the anticipated effect size to improve trial feasibility. |
• Carefully and transparently report decisions and assumptions underpinning the statistical analysis process. |
• Conduct statistical analyses that adjust for attrition. |
• When available, adopt novel early outcome measures that allow diagnosis or prediction of neurodevelopmental outcome/s at earlier time points. |
Recommendations From This Systematic Review . |
---|
• Develop, validate, and renorm assessment tools to ensure measures with sensitive-outcome measurement properties, such as responsiveness to change, are available. |
• Consider more sophisticated primary outcome designs/analyses that allow for powered assessment of multiple outcomes of interest. |
• Follow up with trial participants beyond 1–2 years of life, ideally to school age. |
• Use endpoint measures that have responsivity psychometric properties. |
• Do not overinflate the anticipated effect size to improve trial feasibility. |
• Carefully and transparently report decisions and assumptions underpinning the statistical analysis process. |
• Conduct statistical analyses that adjust for attrition. |
• When available, adopt novel early outcome measures that allow diagnosis or prediction of neurodevelopmental outcome/s at earlier time points. |
Examination of the primary outcome measures used revealed that the BSID (versions II and III) is the most commonly used neurodevelopmental outcome measure in infant trials since 2009. This was expected because the BSID is considered to be the “gold standard” developmental assessment tool for infants aged 0 to 42 months.39 Despite its widespread use, the BSID has been criticized, in particular the BSID-III, for markedly underestimating developmental delay in high-risk groups and there are cultural differences between countries in norms.40–43 In addition to the problems with norms, the BSID suite of tools are discriminatory tools, not designed for detection of change from intervention. As such, the common use of the BSID-II and -III as a primary outcome measure may have contributed to the high proportion of null findings reported. It will be interesting to observe whether the recently released BSID-4, with updated normative data and scoring metric, performs better as it is adopted. Recommendation: Specific tools with sensitive-outcome measurement properties should continue to be developed, validated, and renormed (where discriminative) to ensure tests with the best possible psychometric properties are available.
Whether selection of a different outcome measure might have changed the result of any of the null studies is an interesting question. Variable psychometric properties of neurodevelopmental outcome measures in infancy is not limited to the BSID.44 As discussed, the natural variability in development in infancy is a challenge for assessment, as are environmental factors such as mood, responsiveness to assessor, behavior, and understanding preverbal child intentions.44 For this reason, it is common in clinical practice to administer two measures of domains of interest or low performance, because convergent results give increased confidence in the reliable interpretation of results. However, because of feasibility considerations, such as power and sample size, this is often not possible in clinical trials. Moreover, even when multiple measures of a domain are included, only one is typically designated as the primary outcome, with others included as secondary outcomes, which are often underpowered. Plus, multiple statistical calculations of data measuring the same domain can introduce other flaws. One way that studies have addressed this is by combining multiple endpoints into a composite outcome. However, the conventional composite outcome fails to account for the relative importance and possible correlation among its components, and may prove challenging for the interpretation of results. Specifically, neonatal trials frequently combine mortality and disability outcomes into a single composite, which can be problematic because they often have discordant patient importance, effect sizes, and event rates.45 In addition, children may improve on some domains and not others, and statistically, their gains may be washed out by children who make gains in the opposite domains. Recommendation: Researchers should consider more sophisticated primary outcome design/analyses. For example, the Global Statistical Test is one method that can enable analysis of multiple important outcomes of interest with appropriate weighting, and should allow for true effects to be detected with high sensitivity and specificity with smaller sample sizes than current trial methodologies.46
Timing of the primary endpoint is another important consideration for trial design. This review found that nearly all studies with a finding (90%) measured the primary outcome at <18 months, compared with only 31% of null finding studies. Because of the known variability in development, neurodevelopmental assessment tools are generally more reliable and predictive of future ability when administered at later ages.8 Moreover, intervention effects that are maintained and/or increase over time postintervention, reduce the likelihood of cascading development impairment.47,48 It is therefore an interesting and somewhat disheartening result that the majority of clinical trials with slightly later primary endpoints were not able to detect a significant improvement from their interventions. However, these trials with longer duration follow-up points are confounded by the lack of control for socioeconomic opportunity and parental input, which also affect outcomes.12,13 Recommendation: Longer-term follow-up (ie, school age assessment) may be warranted to determine if positive effects of interventions are sustained beyond the first 1 to 2 years of life, and when more subtle differences can be more accurately detected.8
Looking at other design aspects, a notable difference in anticipated clinical effect size, sample size, and power was observed among studies reporting a null finding versus a finding. Specifically, in studies that reported a null finding, the anticipated effect size was approximately double and the sample size was smaller (35%–40%), despite being more likely to target the higher (90%) power. These elements go hand in hand because anticipated effect size and desired statistical power contribute directly to the target sample size. Crucially, larger effect sizes and lower power will result in a smaller sample size because it is easier to observe differences among groups if the groups are farther apart. Researchers must often balance these aspects of trial design with the availability of trial funding, which can ultimately restrict sample size and may contribute to the high proportion of promising interventions resulting in null findings. In addition, the psychometric properties of the tool will change the effect size and the sample size estimate. For example, discriminative measures redistribute data to a normal curve, and consequently, much larger gains are needed by infants with disabilities in the lower second standard deviation to show any intervention effect than on instruments psychometrically designed to measure change after intervention. Recommendation: Trial endpoint measures should ideally have responsivity psychometric properties. Researchers should also exercise caution to not overinflate the anticipated effect size in the interest of improving trial feasibility by lowering the required sample size, because this could prove to be a fatal study design flaw. Rather, consideration should be given to the adoption of innovative trial designs, for example, adaptive trials, to overcome some of these challenges.
Adaptive trials allow results from interim data analyses to inform adjustments to the trial protocol, including sample size recalculation and early trial termination for success or lack of efficacy.49 Although adaptive trials generally require greater statistical input at earlier stages of trial design, sophisticated trial design methods have the potential to identify effective interventions more efficiently by detecting true change in shorter time frames.50 Furthermore, researchers should determine a priori whether it is reasonable to aggregate the results of children with normal and abnormal outcomes into the same analyses. When the research question is, “Does the intervention prevent the occurrence of a specified outcome?” then a discriminative measure that determines normal from abnormal is appropriate. However, if the aim of the intervention is to reduce the severity of a specified outcome, then a discriminative measure is likely to underestimate the intervention effect. Thus, the trial would be better designed using an outcome measure with good responsivity to change properties, as well as adequately powered for subgroup analysis of those with disability.
From this review, we see that simple, unadjusted analysis methods for estimating treatment effect were most commonly applied. At first glance, this observation may suggest that failure to adjust for potential confounders of treatment effects may have led to the large proportion of null findings. However, even though many studies with a null finding conducted secondary analyses with adjustment for confounders, these analyses rarely produced different results. Thus, although it is possible that, in some studies, potential confounding factors were poorly chosen or not correctly identified, in general, this does not appear to be a primary cause of null findings.
Although the included studies clearly indicate an awareness of the ITT principle,51 the absence of an explicit strategy for dealing with missing data in the majority of the included superiority trials suggests that adherence may be lacking in practice.52,53 Similarly of concern, from a statistical perspective, is the small number of studies that gave explicit consideration to the parametric assumptions underlying their chosen analysis method.54 Recommendation: Although it is not possible to know from this review whether these factors may have contributed to the high proportion of null finding studies, as best practice, researchers should report more carefully and transparently on the decisions and assumptions underpinning their statistical analysis process. Moreover, as editors and peer reviewers for the field, we must hold each other accountable to these good reporting practices and allow scope for this within publication word counts.
Notably, all studies that reported a finding had either a high risk of bias or some concerns. This potentially raises doubts over the validity of these findings, although low methodological quality was observed in >70% of all included studies. Risk of bias because of missing data was identified as a common area of concern. This is not overly surprising because loss to participant dropout is a known challenge for studies with longer follow-up times, and some trials noted an expected mortality rate because of the target participant population. Recommendation: Given that loss to follow-up of varying degrees will be almost universal (and unavoidable) in such trials, statistical analysis that adjusts for attrition should be uniformly reported to help researchers, clinicians, and policymakers arrive at a meaningful conclusion. Furthermore, in the future, there may be novel early outcome measures that allow us to better diagnose/predict poor neurodevelopmental outcome at earlier time points, enabling shorter follow-up times and minimizing loss to follow-up.
Limitations
A major limitation of this review is that bias against publication of null studies likely resulted in an underestimation of null findings studies, and important information and insights from those studies could not be incorporated. Although much thought went into the search strategy to balance finding relevant studies with a manageable number of studies to screen, by not including additional terms such as developmental delay, intellectual disability, speech delay, fine motor delay, or gross motor delay, it is possible that some studies may have been missed. Further, the decision to exclude studies investigating interventions that were exclusively parental education may have excluded studies that produced a finding, because these interventions may positively alter the infant’s environment. However, studies were eligible for inclusion if the intervention was infant-directed, even if they were administered by a parent. This is important because parent-administered interventions may be the best route to achieve adequate intervention dose. Another limitation is that studies reporting neurodevelopmental outcome as a secondary outcome, likely at longer time points, were excluded from the systematic review. The purpose of this review was to examine findings of neurodevelopmental studies and it was therefore deemed critical that included studies were established for this specific purpose and were adequately powered to detect change. In the future, it may be of interest to conduct a review of trials where neurodevelopment was assessed as a secondary outcome to determine whether a similar high proportion of null findings exists. Moreover, studies were excluded where the authors devised a single composite score from multiple outcomes. Often, these encompassed outcome/s not included in the 16 identified assessment tools included in our search, and it was not possible within the scope of this review to evaluate the analysis and interpretation employed by researchers to arrive at the composite variable. The authors acknowledge that composite outcomes are frequently used in neonatal medicine, and thus this decision may have resulted in the exclusion of relevant studies.
Conclusions
Identification and investigation of new interventions targeted at improving neurodevelopmental outcomes is critically important to ensure that infants born at risk have access to the best available evidence-based interventions. The purpose of randomized controlled trials is to determine whether an intervention is efficacious or not. Although there is an assumption that the decision to test an intervention in a randomized controlled trial is based on promising data from preclinical studies and early phase trials, there are many reasons why an intervention may be ineffective in a specific population. This systematic review identified a high proportion of infant neurodevelopmental trials that produced a null finding, and detected several methodological and design considerations common to these studies. Although it is not possible to determine from this review to what extent these methodological and design flaws contributed to the high proportion of null findings detected in the literature, it is the responsibility of clinical trialists working in the field to address these issues so that truly effective interventions can be identified and translated into clinical care. As such, we provide a number of recommendations for the field, including the use of more sophisticated approaches to trial design, outcome assessment, and analysis. Moreover, we caution against compromising on trial quality to increase perceived trial feasibility because trial design decisions may influence our ability to confidently interpret intervention efficacy, and this may ultimately impact the availability of effective interventions for infants in need.
Acknowledgments
We thank Charlotte Frazer for assistance with reference checking and formatting.
Drs Finch-Edmondson and Paton conceptualized and designed the study, ran the searches, screened studies and extracted data, drafted the initial manuscript, and reviewed and revised the manuscript; Dr Honan screened studies and extracted data, drafted the initial manuscript, and reviewed and revised the manuscript; Ms Galea conceptualized the study, screened studies, and critically reviewed the manuscript for important intellectual content; Ms Webb extracted data, conducted data analysis, drafted the initial manuscript, and reviewed and revised the manuscript Dr Novak extracted data, conducted data analysis, and critically reviewed the manuscript for important intellectual content; Dr Badawi conceptualized the study and critically reviewed the manuscript for important intellectual content; Dr Trivedi conceptualized the study, extracted data, conducted data analysis, drafted the initial manuscript, and reviewed and revised the manuscript; and all authors approved the final manuscript as submitted and agree to be accountable for all aspects of the work.
FUNDING: No external funding.
CONFLICT OF INTEREST DISCLAIMER: The authors have indicated they have no conflicts of interest relevant to this article to disclose.
Comments