There is variability in the selection and reporting of outcomes in neonatal trials with key information frequently omitted. This can impact applicability of trial findings to clinicians, families, and caregivers, and impair evidence synthesis. The Neonatal Core Outcomes Set describes outcomes agreed as clinically important that should be assessed in all neonatal trials, and Consolidated Standards of Reporting Trials (CONSORT)-Outcomes 2022 is a new, harmonized, evidence-based reporting guideline for trial outcomes. We reviewed published trials using CONSORT-Outcomes 2022 guidance to identify exemplars of neonatal core outcome reporting to strengthen description of outcomes in future trial publications.
Neonatal trials including >100 participants per arm published between 2015 to 2020 with a primary outcome included in the Neonatal Core Outcome Set were identified. Primary outcome reporting was reviewed using CONSORT 2010 and CONSORT-Outcomes 2022 guidelines by assessors recruited from Cochrane Neonatal. Examples of clear and complete outcome reporting were identified with verbatim text extracted from trial reports.
Thirty-six trials were reviewed by 39 assessors. Examples of good reporting for CONSORT 2010 and CONSORT-Outcomes 2022 criteria were identified and subdivided into 3 outcome categories: “survival,” “short-term neonatal complications,” and “long-term developmental outcomes” depending on the core outcomes to which they relate. These examples are presented to strengthen future research reporting.
We have identified examples of good trial outcome reporting. These illustrate how important neonatal outcomes should be reported to meet the CONSORT 2010 and CONSORT-Outcomes 2022 guidelines. Emulating these examples will improve the transmission of information relating to outcomes and reduce associated research waste.
Inconsistent reporting and outcome selection is an issue in neonatal trials. Reporting outcomes according to the Consolidated Standards of Reporting Trials and Consolidated Standards of Reporting Trials-Outcomes extension could improve the usability of trial reports and the impact of research findings.
A list of clear neonatal trial outcome reporting examples may help to improve the transmission of trial findings to authors, peer reviewers, journal editors, and evidence end-users, and may help to reduce research waste.
For clinical research to translate into improved clinical care, key findings must be recognized, interpreted, and implemented. Unfortunately, there is evidence that the majority of research never contributes to improvements in clinical care.3 There is confusion in how outcomes are selected, defined, measured, collected, analyzed, and reported.4 These issues impair meaningful evidence synthesis and reduce the relevance of research to both the experiences of patients and the uncertainties that clinicians face. Problems related to research outcomes are common in pediatric clinical trials5–7 and limit progress toward determining optimal treatments.8 Improvements in selection and reporting of outcomes are needed.
Selecting trial outcomes that matter to patients, their families, and healthcare professionals will ensure that outcome data are meaningful and improve clinical practice. Core outcome sets are one way to ensure that the most important outcomes are recorded in a uniform manner.9 A recent review identified 77 core outcome sets across pediatrics,10 with improvements noted in both the quantity and quality of core outcome sets developed. One example is the Core Outcomes In Neonatology (COIN) project, which identified 12 outcomes that form a core outcome set for neonatal research.11 These should be reported by all research conducted in this field. Widespread adoption of this core outcome set has the potential to increase the volume of research that contributes to meta-analyses, as has been shown in fields such as rheumatology and pediatric dermatology.12,13 However, if outcomes information is limited because authors do not communicate it clearly, or if readers are unable to interpret it, the translation of results into practice will be impaired.
Consolidated Standards of Reporting Trials (CONSORT) 2010 trial reporting standards,14,15 that were informed by evidence and established by expert consensus, have been shown to improve reporting quality.16,17 An extension of this standard relating to critically important details of outcome reporting has recently been developed: CONSORT-Outcomes 2022.18 This new tool describes a minimal set of reporting items for trial outcomes. To help researchers meet these standards, we sought to identify examples of “best reporting practice.” To address this aim, we conducted a review of outcome reporting in neonatal trials with over 100 participants in each arm to identify examples of best practice.
Methods
This work is part of an ongoing collaboration between The Hospital for Sick Children, Imperial College London, and Cochrane Neonatal.
Systematic Review and Outcome Reporting Assessment
We used the results of the systematic review and outcome reporting assessment described in full in the linked publication.1 This project included randomized or cluster-randomized controlled trials with over 100 participants in each arm and were published in English between 2015 and 2020. Trials were identified using the results of a previously published systematic review and an updated search to capture trials published up to 2020, described elsewhere.1 Trials were included if they related to neonates requiring care during NICU admission and reported a primary outcome that was listed in the COIN. The 12 COIN outcomes are listed in Table 1.11
No. . | Outcomes . |
---|---|
1. | Survivala |
2. | Sepsis |
3. | Necrotizing enterocolitis |
4. | Brain injury on imaging |
5. | Retinopathy of prematurity (preterm only) |
6. | General gross motor ability |
7. | General cognitive ability |
8. | Quality of life |
9. | Adverse events |
10. | Visual impairment or blindness |
11. | Hearing impairment or deafness |
12. | Chronic lung disease or bronchopulmonary dysplasia (preterm only) |
No. . | Outcomes . |
---|---|
1. | Survivala |
2. | Sepsis |
3. | Necrotizing enterocolitis |
4. | Brain injury on imaging |
5. | Retinopathy of prematurity (preterm only) |
6. | General gross motor ability |
7. | General cognitive ability |
8. | Quality of life |
9. | Adverse events |
10. | Visual impairment or blindness |
11. | Hearing impairment or deafness |
12. | Chronic lung disease or bronchopulmonary dysplasia (preterm only) |
The COIN outcome survival has been expanded to include death and mortality to reflect the outcome measured in the trial.
Reporting of the primary outcome for each trial was assessed using the criteria in the CONSORT 2010 and CONSORT-Outcomes 2022 trial reporting guidelines in an online survey created on the REDCap platform.19 For the purposes of our study, we included 12 items relevant to outcome reporting from CONSORT 2010 and all 17 items from CONSORT-Outcomes. Certain items from both CONSORT and CONSORT-Outcomes were split into subparts to capture good reporting for each component that the item addresses, and some CONSORT items were made specific to the primary outcome (eg, CONSORT 4b: setting(s) and location(s) where the data were collected, was split into CONSORT 4b(i): setting(s) where the [primary outcome] data were collected and CONSORT 4b(ii): location(s) where the [primary outcome] data were collected). Raters from Cochrane Neonatal were assigned trial reports for assessment; each trial was analyzed by at least 3 raters. Raters also extracted verbatim text to justify their rating. Agreement of all raters that an item had been reported represented that outcome information had been conveyed clearly from the manuscript to readers. Members of the project steering group (J.W., A.B., M.O.) reviewed verbatim text for items that raters agreed were well reported to identify examples of gold-standard outcome reporting that would meet CONSORT 2010 and CONSORT-Outcomes 2022 standards. An example was sought for each CONSORT and CONSORT-Outcomes point for each COIN outcome, except in cases where an item would not be relevant for that particular outcome. These exceptions related predominantly to survival: concepts such as the validity or reliability of study instruments are not appropriate for this outcome.18 In the small number of cases where the item was applicable to the COIN outcome, but either a gold-standard outcome reporting example could not be found among the extracted text, or reporting was optimal but there were concerns about the methodology used, the project steering group (J.W., C.G., R.F.S., A.B., N.J.B., M.O.) collectively agreed upon an example of good outcome reporting for the item, with additional statistical input (M.K.) where relevant.
Data Analysis and Presentation
Because of the number of COIN outcomes and reporting criteria (12 outcomes and 38 reporting items), the COIN outcomes were reviewed by the project steering group and those that would require similar reporting were combined. Verbatim text extracts served as examples to illustrate gold-standard reporting for each of the COIN outcomes assessed in the trial reports. These text extracts were collated into tables relating to each of the 38 reporting items of the trial reporting guidelines. For reference, the CONSORT-Outcomes Glossary and 5 core elements of a defined outcome can be found in Appendices 4 and 5, respectively.
Results
The results describing the searches and screening are detailed fully in the systematic review in the linked publication.1
Good Reporting Examples
The updated searches identified 2031 papers, of which, after screening, 36 trials were included, described elsewhere.1 These were reviewed by 39 raters who identified whether the CONSORT 2010 and CONSORT-Outcomes 2022 reporting criteria had been met. Examples of good reporting, where all raters agreed that reporting met the guideline criteria, were identified and reviewed by the project steering group. These examples were grouped according to the COIN outcome they related to, which were combined into 3 categories. These were survival, short-term neonatal complications (reported during the initial neonatal unit admission; including sepsis, necrotizing enterocolitis, bronchopulmonary dysplasia, retinopathy of prematurity, brain injury on imaging), and long-term neurodevelopmental and sensory outcomes (reported after discharge from the neonatal unit, including general cognitive ability and general gross motor ability, hearing impairment, and visual impairment).
For survival, examples could be found for all CONSORT 2010 items (Table 2A), and all CONSORT-Outcomes 2022 items (Table 2B), except for items which would not be applicable. These items included 6a.3, 6a.4, 6a.5, 6a.8(i), 6a.8(ii), 6a.8(iii), 6a.8(iv), 6a.9(ii), and 12a.3(i), which pertained to minimally important change; methods of aggregation for continuous variables; the use of multiple outcome assessment time points; description of the study instrument, including its reliability, validity, and responsiveness; qualifications or trial-specific training to administer the study instrument to assess the outcome; and efforts to assess patterns of missingness, respectively.
Section or Topic . | CONSORT Item (n = 13) . | Description of Reporting Item . | Outcome Domain, Outcome Type (Single [S], Composite [C], Multiple [M]), Verbatim Text From Trials . |
---|---|---|---|
Methods: participants | 4b(i)a | Setting(s) where the primary outcome data were collected | Outcome domain: BPD, survivalb (C). Between April 1, 2009 and March 1, 2013, all infants with respiratory distress shortly after birth were assessed for eligibility for the study in 3 tertiary centers: John H. Stroger, Jr Hospital of Cook County (JSH, Chicago, IL), National Taiwan University Hospital (NTUH, Taipei, Taiwan), and China Medical University Hospital (CMUH, Taichung, Taiwan). (Yeh et al 2016)25 |
Methods: participants | 4b(ii)a | Location(s) where the primary outcome data were collected | Outcome domain: sepsis (S). We conducted this investigator-led, phase II, open-label, randomized, parallel group study at 2 stand-alone university maternity hospitals with tertiary level NICUs in Dublin, Ireland (National Maternity Hospital [NMH] and Coombe Women and Infants University Hospital [CWIUH]). Each of the hospitals has approximately 9500 deliveries per year. (Kieran et al 2018)26 |
Methods: outcomes | 6a | Completely defined prespecified primary outcome measures, including how and when they were assessed | Outcome domain: survivalb, GCA (C). All surviving infants were regularly evaluated every 3 mo through a standardized nervous system examination, and a final examination at 18 mo of corrected age was performed to evaluate motor functions and MDI using the Bayley Scales of Infant Development (2nd edition). Hearing status was determined from parental reports and supplemented with auditory brainstem response measurements. Deafness was defined as a hearing disability that required amplification. Blindness was defined as a corrected visual acuity of <20 of 200. Moderate or severe disabilities were defined as survival with at least 1 of the following complications: cerebral palsy, MDI <70, deafness, or blindness. (Song et al 2016)27 |
Methods: outcomes | 6b | Any changes to trial outcomes after the trial commenced, with reasons | Outcome domain: survivalb, sepsis, NEC, brain injury on imaging, ROP (C). …in March 2014, the trial management committee, which was monitoring pooled event rates with blinding to results according to treatment group, decided to remove chronic lung disease from the primary outcome. This followed recognition in November 2013 that the pooled primary outcome rate was 64%, much higher than the expected pooled rate of 26%, because more infants than expected met the trial definition of chronic lung disease,18,19 owing to a higher-than-expected rate of the use of continuous positive airway pressure by means of nasal cannula until 36 wk of postmenstrual age, without supplemental oxygen.20,21 This decision was communicated to the independent data and safety monitoring committee, which supported this recommendation. The protocol was amended in July 2016 to reflect the updated primary outcome of death, severe brain injury, severe retinopathy of prematurity, necrotizing enterocolitis, or late-onset sepsis. (Tarnow-Mordi et al 2017)28 |
Methods: blinding | 11a | If done, who was blinded after assignment to interventions (for example, participants, care providers, those assessing outcomes) and how | Outcome domain: sepsis (S). Parents, clinicians, investigators, and outcome assessors were masked to group assignment. The opaque containers used to store the products did not allow them to be seen unless the sealed stopper was removed intentionally. As lactoferrin was more likely than sucrose to retain a light pink tinge, we supplied all sites with a laminated picture of a range of possible colors for the lactoferrin mixture in syringes and stressed that this applied to both lactoferrin and sucrose. (ELFIN et al 2019)29 |
Methods: statistical methods | 12a | Statistical methods used to compare groups for primary outcomes | Outcome domain: BPD, survivalb (M). For both binary outcomes, GEE models with a log link function were used to estimate the relative risk and corresponding 95% confidence interval between the treatment and control groups. In fitting the GEE, we adjusted for gestational age at birth and intracenter correlations. We explored interactions and higher-order terms for included covariates; the final model was selected based on the deviance, Akaike information criterion, and overall model parsimony. To account for possibly endogenous variables, we fitted the GEEs with an independent working correlation matrix. Given that interim analyses were conducted, to account for Type I and Type II error rates, we applied the group sequential analysis framework with Pocock-type α- and β-spending functions, with the previously stated power of 0.8 and significance level of 0.05. (Fabricated example) |
Methods: statistical methods | 12b | Methods for additional analyses, such as subgroup analyses and adjusted analyses | Outcome domain: sepsis (S). We did 4 prespecified sensitivity analyses of the primary outcome as follows: time to serious bloodstream infection, defined as treatment with antimicrobial agents for 72 h or longer or death during treatment; time from PICC insertion to first bloodstream infection; time to first bloodstream infection, excluding samples obtained via arterial cannulas or CVCs; and time to first bloodstream infection, including clearly pathogenic organisms and excluding skin organisms (eg, coagulase-negative staphylococci). For comparability with published studies, we also reported bloodstream infection rates per 1000 d with PICC between randomization and PICC removal. (Gilbert et al 2019)30 |
Results: participant flow | 13a | For each group, the number of participants who were randomly assigned, received intended treatment, and were analyzed for the primary outcome | Outcome domain: BPD, survivalb (C). Figure 1: a total 211 of 558 eligible infants (37.8%) were recruited. Of these, 200 infants (35.8%) were excluded because parental consent was not obtained within the short time available for patient enrolment, (prenatally to 120 min after birth) owing to organizational reasons. Two infants (0.9%) were excluded because the randomization envelope was opened before all inclusion criteria were fulfilled. A total of 104 infants were assigned to the control group and 107 were assigned to the intervention group (Fig 1). All infants had complete follow ups performed; last follow-up was June 21, 2012. Recruitment rates differed markedly between the various study centers, ranging from 9% to 70% of eligible infants. (Consort diagram, ITT analysis all analyzed) (Kribs et al 2015)31 |
Results: participant flow | 13b | For each group, losses and exclusions after randomization, together with reasons | Outcome domain: BPD, survivalb (C). A total of 863 infants at 40 study centers in 9 countries underwent randomization from April 1, 2010, to August 3, 2013 (Fig 1). The study population included 10 infants who were part of a multiple birth and who were not the second in birth order (in 9 cases, the second multiple had died prenatally, was not considered viable at birth, or died before randomization; in 1 case, both infants in a set of twins underwent randomization by mistake). The outcome for 7 infants was unknown owing to withdrawal of consent or of the right to use the data, leaving 856 in the analysis population. (Details summarized in Fig 1) (Bassler et al 2015)32 |
Results: numbers analyzed | 16 | For each group, number of participants (denominator) included in each analysis and whether the analysis was by original assigned groups | Outcome domain: sepsis (S). A total of 310 infants were randomized at the National Maternity Hospital and Coombe Women and Infants University Hospital between November 2011 and September 2014 (151 CHX-IA and 159 PI) (Fig 1). Six infants were withdrawn post randomization as they met the protocol-specified exclusion criteria (5 with congenital anomalies and 1 infant born at 31 wk). We enrolled 304 infants (CHX-IA 148 vs PI 156) in whom 815 CVCs–200 UVCs and 615 PICCs-(CHX-IA 384 vs PI 431) were inserted and remained in situ for 3078 (CHX-IA 1465 vs PI 1613) days. Data were analyzed for all 304 infants. There were 2 protocol violations where infants randomized to one agent received the other when a second CVC insertion was attempted. All analyses presented were performed using a modified ‘intention-to-treat’ principle (ie, not including the 6 infants who met the exclusion criteria). We have not performed a separate per-protocol analysis. (Results, Fig 1 and Table 2) (Kieran et al 2018)26 |
Results: outcomes and estimation | 17a | For each primary outcome, results for each group, and the estimated effect size and its precision (such as 95% confidence interval) | Outcome domain: survivalb (S). There was no significant difference in mortality between the 2 groups with 83 deaths (20.5%) in the wrap group and 79 deaths (20%) in the no-wrap group. (OR 1.0, 95% CI 0.7 to 1.5) (Reilly et al 2015)33 |
Results: outcomes and estimation | 17b | For binary outcomes, presentation of both absolute and relative effect sizes is recommended | Outcome domain: NEC, sepsis, survivalb (C). The cumulative incidence of the composite outcome was 44.7% (95% CI, 37.6% to 51.9%) and 42.1% (95% CI, 34.9% to 49.3%) in the preterm formula and donor milk groups, respectively (Fig 2), with a mean difference of 2.6% (95% CI, −12.7% to 7.4%). The adjusted hazard ratio was 0.87 (95% CI, 0.63 to 1.19; P = .37) (Table 2) (Corpeleijn et al 2016)34 |
Results: ancillary analyses | 18 | Results of any other analyses performed, including subgroup analyses and adjusted analyses, distinguishing prespecified from exploratory | Outcome domain: sepsis (S). After seeing the results of the primary analysis, we specified an additional posthoc analysis of the primary outcome to investigate whether the treatment effect varied by gestational age at birth (before 28 wk of gestation or at 28 wk or more of gestation) using a Cox proportional hazards model, including an interaction between treatment and gestational age. We found no evidence of a difference in treatment effect for babies with a gestational age of less than 28 wk compared with 28 wk or more (P = .28) (Gilbert et al 2019)30 |
Section or Topic . | CONSORT Item (n = 13) . | Description of Reporting Item . | Outcome Domain, Outcome Type (Single [S], Composite [C], Multiple [M]), Verbatim Text From Trials . |
---|---|---|---|
Methods: participants | 4b(i)a | Setting(s) where the primary outcome data were collected | Outcome domain: BPD, survivalb (C). Between April 1, 2009 and March 1, 2013, all infants with respiratory distress shortly after birth were assessed for eligibility for the study in 3 tertiary centers: John H. Stroger, Jr Hospital of Cook County (JSH, Chicago, IL), National Taiwan University Hospital (NTUH, Taipei, Taiwan), and China Medical University Hospital (CMUH, Taichung, Taiwan). (Yeh et al 2016)25 |
Methods: participants | 4b(ii)a | Location(s) where the primary outcome data were collected | Outcome domain: sepsis (S). We conducted this investigator-led, phase II, open-label, randomized, parallel group study at 2 stand-alone university maternity hospitals with tertiary level NICUs in Dublin, Ireland (National Maternity Hospital [NMH] and Coombe Women and Infants University Hospital [CWIUH]). Each of the hospitals has approximately 9500 deliveries per year. (Kieran et al 2018)26 |
Methods: outcomes | 6a | Completely defined prespecified primary outcome measures, including how and when they were assessed | Outcome domain: survivalb, GCA (C). All surviving infants were regularly evaluated every 3 mo through a standardized nervous system examination, and a final examination at 18 mo of corrected age was performed to evaluate motor functions and MDI using the Bayley Scales of Infant Development (2nd edition). Hearing status was determined from parental reports and supplemented with auditory brainstem response measurements. Deafness was defined as a hearing disability that required amplification. Blindness was defined as a corrected visual acuity of <20 of 200. Moderate or severe disabilities were defined as survival with at least 1 of the following complications: cerebral palsy, MDI <70, deafness, or blindness. (Song et al 2016)27 |
Methods: outcomes | 6b | Any changes to trial outcomes after the trial commenced, with reasons | Outcome domain: survivalb, sepsis, NEC, brain injury on imaging, ROP (C). …in March 2014, the trial management committee, which was monitoring pooled event rates with blinding to results according to treatment group, decided to remove chronic lung disease from the primary outcome. This followed recognition in November 2013 that the pooled primary outcome rate was 64%, much higher than the expected pooled rate of 26%, because more infants than expected met the trial definition of chronic lung disease,18,19 owing to a higher-than-expected rate of the use of continuous positive airway pressure by means of nasal cannula until 36 wk of postmenstrual age, without supplemental oxygen.20,21 This decision was communicated to the independent data and safety monitoring committee, which supported this recommendation. The protocol was amended in July 2016 to reflect the updated primary outcome of death, severe brain injury, severe retinopathy of prematurity, necrotizing enterocolitis, or late-onset sepsis. (Tarnow-Mordi et al 2017)28 |
Methods: blinding | 11a | If done, who was blinded after assignment to interventions (for example, participants, care providers, those assessing outcomes) and how | Outcome domain: sepsis (S). Parents, clinicians, investigators, and outcome assessors were masked to group assignment. The opaque containers used to store the products did not allow them to be seen unless the sealed stopper was removed intentionally. As lactoferrin was more likely than sucrose to retain a light pink tinge, we supplied all sites with a laminated picture of a range of possible colors for the lactoferrin mixture in syringes and stressed that this applied to both lactoferrin and sucrose. (ELFIN et al 2019)29 |
Methods: statistical methods | 12a | Statistical methods used to compare groups for primary outcomes | Outcome domain: BPD, survivalb (M). For both binary outcomes, GEE models with a log link function were used to estimate the relative risk and corresponding 95% confidence interval between the treatment and control groups. In fitting the GEE, we adjusted for gestational age at birth and intracenter correlations. We explored interactions and higher-order terms for included covariates; the final model was selected based on the deviance, Akaike information criterion, and overall model parsimony. To account for possibly endogenous variables, we fitted the GEEs with an independent working correlation matrix. Given that interim analyses were conducted, to account for Type I and Type II error rates, we applied the group sequential analysis framework with Pocock-type α- and β-spending functions, with the previously stated power of 0.8 and significance level of 0.05. (Fabricated example) |
Methods: statistical methods | 12b | Methods for additional analyses, such as subgroup analyses and adjusted analyses | Outcome domain: sepsis (S). We did 4 prespecified sensitivity analyses of the primary outcome as follows: time to serious bloodstream infection, defined as treatment with antimicrobial agents for 72 h or longer or death during treatment; time from PICC insertion to first bloodstream infection; time to first bloodstream infection, excluding samples obtained via arterial cannulas or CVCs; and time to first bloodstream infection, including clearly pathogenic organisms and excluding skin organisms (eg, coagulase-negative staphylococci). For comparability with published studies, we also reported bloodstream infection rates per 1000 d with PICC between randomization and PICC removal. (Gilbert et al 2019)30 |
Results: participant flow | 13a | For each group, the number of participants who were randomly assigned, received intended treatment, and were analyzed for the primary outcome | Outcome domain: BPD, survivalb (C). Figure 1: a total 211 of 558 eligible infants (37.8%) were recruited. Of these, 200 infants (35.8%) were excluded because parental consent was not obtained within the short time available for patient enrolment, (prenatally to 120 min after birth) owing to organizational reasons. Two infants (0.9%) were excluded because the randomization envelope was opened before all inclusion criteria were fulfilled. A total of 104 infants were assigned to the control group and 107 were assigned to the intervention group (Fig 1). All infants had complete follow ups performed; last follow-up was June 21, 2012. Recruitment rates differed markedly between the various study centers, ranging from 9% to 70% of eligible infants. (Consort diagram, ITT analysis all analyzed) (Kribs et al 2015)31 |
Results: participant flow | 13b | For each group, losses and exclusions after randomization, together with reasons | Outcome domain: BPD, survivalb (C). A total of 863 infants at 40 study centers in 9 countries underwent randomization from April 1, 2010, to August 3, 2013 (Fig 1). The study population included 10 infants who were part of a multiple birth and who were not the second in birth order (in 9 cases, the second multiple had died prenatally, was not considered viable at birth, or died before randomization; in 1 case, both infants in a set of twins underwent randomization by mistake). The outcome for 7 infants was unknown owing to withdrawal of consent or of the right to use the data, leaving 856 in the analysis population. (Details summarized in Fig 1) (Bassler et al 2015)32 |
Results: numbers analyzed | 16 | For each group, number of participants (denominator) included in each analysis and whether the analysis was by original assigned groups | Outcome domain: sepsis (S). A total of 310 infants were randomized at the National Maternity Hospital and Coombe Women and Infants University Hospital between November 2011 and September 2014 (151 CHX-IA and 159 PI) (Fig 1). Six infants were withdrawn post randomization as they met the protocol-specified exclusion criteria (5 with congenital anomalies and 1 infant born at 31 wk). We enrolled 304 infants (CHX-IA 148 vs PI 156) in whom 815 CVCs–200 UVCs and 615 PICCs-(CHX-IA 384 vs PI 431) were inserted and remained in situ for 3078 (CHX-IA 1465 vs PI 1613) days. Data were analyzed for all 304 infants. There were 2 protocol violations where infants randomized to one agent received the other when a second CVC insertion was attempted. All analyses presented were performed using a modified ‘intention-to-treat’ principle (ie, not including the 6 infants who met the exclusion criteria). We have not performed a separate per-protocol analysis. (Results, Fig 1 and Table 2) (Kieran et al 2018)26 |
Results: outcomes and estimation | 17a | For each primary outcome, results for each group, and the estimated effect size and its precision (such as 95% confidence interval) | Outcome domain: survivalb (S). There was no significant difference in mortality between the 2 groups with 83 deaths (20.5%) in the wrap group and 79 deaths (20%) in the no-wrap group. (OR 1.0, 95% CI 0.7 to 1.5) (Reilly et al 2015)33 |
Results: outcomes and estimation | 17b | For binary outcomes, presentation of both absolute and relative effect sizes is recommended | Outcome domain: NEC, sepsis, survivalb (C). The cumulative incidence of the composite outcome was 44.7% (95% CI, 37.6% to 51.9%) and 42.1% (95% CI, 34.9% to 49.3%) in the preterm formula and donor milk groups, respectively (Fig 2), with a mean difference of 2.6% (95% CI, −12.7% to 7.4%). The adjusted hazard ratio was 0.87 (95% CI, 0.63 to 1.19; P = .37) (Table 2) (Corpeleijn et al 2016)34 |
Results: ancillary analyses | 18 | Results of any other analyses performed, including subgroup analyses and adjusted analyses, distinguishing prespecified from exploratory | Outcome domain: sepsis (S). After seeing the results of the primary analysis, we specified an additional posthoc analysis of the primary outcome to investigate whether the treatment effect varied by gestational age at birth (before 28 wk of gestation or at 28 wk or more of gestation) using a Cox proportional hazards model, including an interaction between treatment and gestational age. We found no evidence of a difference in treatment effect for babies with a gestational age of less than 28 wk compared with 28 wk or more (P = .28) (Gilbert et al 2019)30 |
BPD, bronchopulmonary dysplasia; CI, confidence interval; COIN, Core Outcomes in Neonatology; CONSORT, Consolidated Standards of Reporting Trials; CVC, central venous catheter; GCA, general cognitive ability; GEE, generalized estimating equation; GGMA, general gross motor ability; ITT, intention-to-treat; NEC, necrotizing enterocolitis; OR, odds ratio; PICC, Peripherally Inserted Central Catheter; ROP, retinopathy of prematurity.
To capture good reporting for each component the item addresses, these items were split into subparts and were specified for the primary outcome.
The COIN outcome survival has been expanded to include death and mortality to reflect the outcome measured in the trial.
Section or Topic . | CONSORT-Outcomes Item (n = 25) . | Description of Reporting Item . | Outcome Domain, Outcome Type (Single [S], Composite [C], Multiple [M]), Verbatim Text From Trials . |
---|---|---|---|
Methods: outcomes | 6a.1 | Provide a rationale for the selection of the domain for the trial’s primary outcome | Outcome domain: survivalb, GGMA, GCA, hearing impairment or deafness (C). Observational data had suggested that targeting an oxygen saturation below 90% was associated with a lower risk of severe retinopathy, with no difference in the rate of cerebral palsy or survival, and that the long-accepted “physiologic” targets of oxygen saturation may be too high. The trials therefore aimed to evaluate the hypothesis that targeting an oxygen saturation of 85% to 89% versus 91% to 95% would reduce the incidence of severe retinopathy with no effects on mortality or disability. (Tarnow-Mordi et al 2016)35 |
Methods: outcomes | 6a.2(i)a | Describe the specific measurement variable (eg, systolic blood pressure) | Outcome domain: survivalb (S). The primary outcome was all-cause mortality occurring before discharge from the hospital or 6 mo corrected age, whichever came first. We attempted to collect primary outcome data on infants transferred to other institutions. Trial infants who remained hospitalized at 6 mo corrected age were coded as being alive. (Reilly et al 2015)33 |
Methods: outcomes | 6a.2(ii)a | Describe the specific analysis metric (eg, change from baseline, final value, time to event) | Outcome domain: sepsis (S). The primary outcome was the time from random allocation to first bloodstream or CSF infection, defined as a microbiological culture of a bacteria or fungus from the blood or CSF sampled for clinical reasons. (Gilbert et al 2019)30 |
Methods: outcomes | 6a.2(iii)a | Describe the specific method of aggregation (eg, mean, proportion) | Outcome domain: survivalb (S). Descriptive statistics were calculated for all variables of interest. Continuous measures were summarized via mean and SD, whereas categorical measures were summarized by the use of the count and percentage. The primary outcome of mortality was assessed between groups using a logistic model adjusting for the correlation among observations taken at the same site. Results were reported with OR and their associated 95% CI. (Reilly et al 2015)33 |
Methods: outcomes | 6a.2(iv)a | Describe the specific time point | Outcome domain: survivalb (S). The primary outcome was death and major disability at 2 y corrected age (gestational age plus chronological age minus 40 wk). (Oei et al 2017)36 |
Methods: outcomes | 6a.3 | If the analysis metric for the primary outcome represents within-subject change, define, and justify the minimal important change in individuals | Outcome domain: NEC, sepsis, survivalb (M). Based on data collected for local service appraisals around the year 2000, it was thought that the event rate of each of the primary outcomes might be as high as 15%. At a 2-sided significance level of 5%, a trial of 1300 infants would have 90% power to detect a 40% relative risk reduction from 15% to 9.1% for each of the primary outcomes. If the outcomes were less frequent, the trial would have 90% power to detect a 44% relative risk reduction from 12% to 6.7% or from 10% to 5.6%. These reductions were all deemed to be of clinical importance by the investigators. (Costeloe et al 2016)37 |
Methods: outcomes | 6a.4 | If the outcome data were continuous but were analyzed as categorical (method of aggregation), specific the cutoff values used | Outcome domain: survivalb, GGMA, GCA, visual impairment or blindness, hearing impairment or deafness (C). General intelligence was estimated with the full-scale IQ from the 4-subtest version of the Wechsler Abbreviated Scale of Intelligence-II. The scale also generates a verbal comprehension index (a measure of verbal acquired knowledge and verbal reasoning abilities) and a perceptual reasoning index (a measure of visual perception organization and reasoning skills). The indices are age standardized (mean = 100; SD = 15), with higher scores reflecting higher intelligence. Cognitive impairment was defined as a full-scale IQ <85 (<1 SD relative to the normative mean). Children who could not be assessed because of severe intellectual impairment or severe autism were coded as having a severe cognitive impairment. Impairment in visuomotor integration, visual perception, fine motor coordination, working memory, attention, and executive function was defined as a performance <1 SD relative to the normative mean of the respective test. Behavioral impairment was defined as a score >1 SD compared with the mean of the normative sample. (Murner-Lavanchy et al 2018)38 |
Methods: outcomes | 6a.5 | If outcome assessments were performed at several time points after randomization, state the time points used for analysis | Outcome domain: death, GCA (C). All surviving infants were regularly evaluated every 3 mo through a standardized nervous system examination, and a final examination at 18 mo of corrected age was performed to evaluate motor functions and MDI; this exam was used for the main analysis. (Song et al 201627 ; modified) |
Methods: outcomes | 6a.6 | If a composite outcome, define all individual components of the composite outcome | Outcome domain: sepsis, NEC, survivala (C). The primary end point was the composite incidence of NEC, serious infection (sepsis or meningitis), or all-cause mortality between 72 h and 60 d of life. Sepsis was defined as 1 positive blood culture result with noncoagulase-negative staphylococci or 1 positive blood culture with a coagulase- negative staphylococci pathogen and C-reactive protein level greater than 10 mg/L (to convert to nanomoles per liter, multiply by 9.524) within 2 d of blood culture or 2 positive blood cultures results with coagulase-negative staphylococci drawn within 2 d. Meningitis was defined by positive cerebrospinal fluid culture result. Necrotizing enterocolitis was defined as a Bell stage of II or higher. (Corpeleijn et al 2016)34 |
Methods: outcomes | 6a.7 | Identify any outcomes that were not prespecified in a trial registry or protocol | Outcome domain: survivala, sepsis, NEC, brain injury on imaging, ROP, BPD (C). However, in March 2014, the trial management committee, which was monitoring pooled event rates with blinding to results according to treatment group, decided to remove chronic lung disease from the primary outcome. This decision was communicated to the independent data and safety monitoring committee, which supported this recommendation. The protocol was amended in July 2016 to reflect the updated primary outcome of death, severe brain injury, severe retinopathy of prematurity, necrotizing enterocolitis, or late-onset sepsis. (Tarnow-Mordi et al 2017)28 |
Methods: outcomes | 6a.8(i)a | Provide a description of study instruments used to assess the outcome (eg, questionnaires, laboratory tests) | Outcome domain: survivalb, GGMA, GCA, hearing impairment or deafness, visual impairment or blindness (C). Severe disability will be defined by any of the following: a Bayley III Cognitive score <70, GMF level of III-V, blindness or profound hearing loss (inability to understand commands despite amplification). Moderate disability will be defined as a Bayley Cognitive score 70 to 84 and either a GMF level of II, a currently active seizure disorder, or a hearing deficit requiring amplification to understand commands. Infants without the primary outcome will be categorized as normal or mildly impaired. Normal will be defined by a cognitive score ≥85 and absence of any neurosensory deficits. Mild impairment will be defined by a cognitive score 70 to 84, or a cognitive score ≥85 and any of the following: presence of a GMF level I to II, seizure disorder or hearing loss not requiring amplification. (Shankaran et al 2017)39 |
Methods: outcomes | 6a.8(ii)a | Provide a description of the study instrument’s reliability in a population similar to the study sample | Outcome domain: GCA (S). This cognitive test has been extensively used in former preterm patients: as we have reported previously this test has acceptable reliability in this age group (with a weighted κ of 0.85). (Fabricated example) |
Methods: outcomes | 6a.8(iii)a | Provide a description of the study instrument’s validity in a population similar to the study sample | Outcome domain: NEC (S). The primary outcome was NEC, defined according to the gestational age-specific NEC definition proposed by Battersby et al. This definition has been validated using population-level data from neonates admitted to neonatal units in the UK. (Fabricated example) |
Methods: outcomes | 6a.8(iv)a | Provide a description of the study instrument’s responsiveness in a population similar to the study sample | Outcome domain: visual impairment or blindness (S). Visual acuity was assessed at trial enrolment, at the end of the 6-week intervention period and at 6 mo corrected age using VEP. Normative data for VEP has been defined in this population (McCulloch et al 1999) and our pilot results suggested that our intervention would produce detectable changes in VEP that would persist until at least 6 months of age (see supplemental data). (Fabricated example) |
Methods: outcomes | 6a.9(i)a | Describe who assessed the outcome (eg, nurse, parent) | Outcome domain: sepsis (S). The primary outcome for all infants was determined from blood and CVC tip culture results by 1 consultant microbiologist (S.J.K.) who was masked to the infant’s group assignment. (Kieran et al 2018)26 |
Methods: outcomes | 6a.9(ii)a | Describe any qualifications or trial-specific training necessary to administer the study instruments to assess the outcome | Outcome domain: survivalb, brain injury on imaging (M). A single assessor reviewed the cranial ultrasound scan reports for intraventricular hemorrhage, blind to the allocated group. Then 8 trained clinicians (neonatologists or radiologists) independently adjudicated each scan, blind to allocation. If the adjudication disagreed with the scan report review, a second independent adjudicator assessed the scan images. Remaining discrepancies were resolved by discussion. (Duley et al 2018)40 |
Methods: outcomes | 6a.10 | Describe any processes used to promote outcome data quality during data collection (eg, duplicate measurements) and after data collection (eg, range checks of outcome data values), or state where details can be found) | Outcome domain: GCA (S). Quality control was maintained by a national coordinating psychologist. All study data were sent to the Murdoch Children’s Research Institute in Melbourne, Australia. All data forms were checked by a research assistant not involved in primary data collection or entry. Data on test forms that were not completed according to test manual instructions were rejected. An independent data safety monitoring committee met around every 6 mo during recruitment. Site visits were done by the national coordinating teams for each country annually or biennially, and site visits at the national coordinating sites were done by principal investigators from other nations to check the validity of data. Summary data by allocation were presented to this committee. (McCann et al 2019)41 |
Methods: sample size | 7a.1 | Define and justify the target difference between treatment groups (eg, the minimal important difference) | Outcome domain: GCA (S). A sample size of 176 infants in each treatment group was estimated to be sufficient to detect a 5-point difference in the Bayley-III cognitive composite score with 80% power (α = 0.05) and a SD of 15. This assumed a 30% rate of exclusive mother’s milk feeding, 10% loss to follow-up during hospitalization, and 10% loss to follow-up after discharge. An effect size of 5 points was chosen because the literature suggests that this difference could translate into a reduction in the number of children born preterm requiring special education services (with associated costs) and an improvement in longer-term academic achievement. A meta-analysis completed before study initiation reported a difference of 5.18 in cognitive scores between infants born weighing less than 2500 g who were fed mother’s milk versus formula, suggesting that this effect size was achievable. (O’Connor et al 2016)42 |
Methods: statistical methods | 12a.1 | Describe any methods used to account for multiplicity in the analysis or interpretation of the primary and secondary outcomes (eg, coprimary outcomes, same outcome assessed at multiple time points, or subgroup analyses of an outcome) | Outcome domain: survivalb, BPD (C). Given the multiplicity of the planned primary outcome analyses, we applied a Holm-Bonferroni correction to adjust for the increased probability of a Type I error resulting from multiple testing. Subsequently, we reported the Holm-adjusted 95% confidence intervals for the estimated relative risk corresponding to the primary outcomes. Furthermore, the finding from the secondary analyses should be regarded as exploratory since there was no adjustment for multiple comparisons among these outcomes. (Fabricated example) |
Methods: statistical methods | 12a.2 | State and justify any criteria for excluding any outcome data from the analysis and reporting, or report that no outcome data were excluded | Outcome domain: survivalb, sepsis, NEC (C). In total, 377 infants were randomized. Four infantts’ informed consent procedure did not comply with requirements, so data of 373 infants were analyzed in the intent- to-treat analysis (Fig 1). Exclusion criteria (congenital infection or anomaly) became clear in 18 infants, only after starting the intervention. In those cases, the intervention was stopped immediately. Modified intent-to-treat analysis was performed without them. Figure 1 lists the reasons for excluding 76 infants from the perprotocol analysis. (Corpeleijn et al 2016)34 |
Methods: statistical methods | 12a.3(i)a | Describe methods to assess patterns of missingness (eg, missing not at random) | Outcome domain: survivalb, sepsis, BPD, NEC, GCA (S). We explored the missing data mechanism by investigating the relationship between missingness (as an indicator variable) and the baseline covariates. To assess whether the missing-completely-at-random (as opposed to missing-at-random) assumption was reasonable, for continuous covariates, we compared missingness using the Kruskal-Wallis test. Similarly, for categorical covariates, we compared missingness using the χ2 test. (Fabricated example) |
Methods: statistical methods | 12a.3(ii)a | Describe methods to handle missing outcome items or entire assessments (eg, multiple imputation) | Outcome domain: survivalb, BPD (C). We used multiple imputation under a multivariable normal distribution to impute missing outcome data in the primary analysis of all outcomes. Multiple imputation was done using the mice package in R. Furthermore, we conducted a sensitivity analysis in which we compared treatment effect estimates resulting from the multiply-imputed data set to the complete-case-analysis. (Fabricated example) |
Methods: statistical methods | 12a.4 | Provide definition of outcome analysis population relating to protocol nonadherence (eg, as a randomized analysis) | Outcome domain: GCA (S). The primary investigation was an ITT analysis, the infants being compared according to the treatment they were assigned at study entry. (Natalucci et al 2016)43 |
Results: outcomes and estimation | 17a.1 | Include results for all prespecified outcome analyses or state where results can be found if not in this report | Outcome domain: survivalb (S). Mortality data were available for 799 of the 801 infants. There was no significant difference in mortality between the 2 groups with 83 deaths (20.5%) in the wrap group and 79 deaths (20%) in the no-wrap group (OR 1.0, 95% CI 0.7–1.5). After adjustment for variables that could impact on the risk of death (GA, sex, method of delivery, birth wt, race, antenatal steroids), logistic regression analysis revealed that the difference between groups remained nonsignificant (OR 0.9, 95% CI 0.6–1.3). In infants 24 0/7 to 25 6/7 wk gestation (stratum 1), 26.1% of the wrap group died compared with 33.2% of the no-wrap group (OR 0.7, 95% CI 0.5–1.1; Table 2). In infants 26 0/7 to 27 6/7 wk gestation (stratum 2), 15.7% of the wrap group died compared with 9.2% of the no-wrap group (OR 1.8, 95% CI 1.0–3.3). After adjustment for variables that could impact the risk of death, logistic regression analysis revealed that the difference between groups was no longer statistically significant (stratum 1: OR 0.66, 95% CI 0.40–1.1 and stratum 2: OR 1.6, 95% CI 0.8–3.0). [Tables 2 and 3] (Reilly et al 2015)33 |
Results: ancillary analyses | 18.1 | If there were any analyses that were not prespecified, explain why they were performed | Outcome domain: survivalb, brain injury on imaging (M). As the original protocol was for a pilot trial, outcomes were measures of feasibility and analysis by allocated group was not planned. The SAP for the extended study was agreed before data were unblinded. For the planned main trial, main outcomes were death before discharge and intraventricular hemorrhage (all grades), hence these were the main outcomes in this SAP. Data presented here are for outcomes at discharge. (Duley et al 2018)40 |
Section or Topic . | CONSORT-Outcomes Item (n = 25) . | Description of Reporting Item . | Outcome Domain, Outcome Type (Single [S], Composite [C], Multiple [M]), Verbatim Text From Trials . |
---|---|---|---|
Methods: outcomes | 6a.1 | Provide a rationale for the selection of the domain for the trial’s primary outcome | Outcome domain: survivalb, GGMA, GCA, hearing impairment or deafness (C). Observational data had suggested that targeting an oxygen saturation below 90% was associated with a lower risk of severe retinopathy, with no difference in the rate of cerebral palsy or survival, and that the long-accepted “physiologic” targets of oxygen saturation may be too high. The trials therefore aimed to evaluate the hypothesis that targeting an oxygen saturation of 85% to 89% versus 91% to 95% would reduce the incidence of severe retinopathy with no effects on mortality or disability. (Tarnow-Mordi et al 2016)35 |
Methods: outcomes | 6a.2(i)a | Describe the specific measurement variable (eg, systolic blood pressure) | Outcome domain: survivalb (S). The primary outcome was all-cause mortality occurring before discharge from the hospital or 6 mo corrected age, whichever came first. We attempted to collect primary outcome data on infants transferred to other institutions. Trial infants who remained hospitalized at 6 mo corrected age were coded as being alive. (Reilly et al 2015)33 |
Methods: outcomes | 6a.2(ii)a | Describe the specific analysis metric (eg, change from baseline, final value, time to event) | Outcome domain: sepsis (S). The primary outcome was the time from random allocation to first bloodstream or CSF infection, defined as a microbiological culture of a bacteria or fungus from the blood or CSF sampled for clinical reasons. (Gilbert et al 2019)30 |
Methods: outcomes | 6a.2(iii)a | Describe the specific method of aggregation (eg, mean, proportion) | Outcome domain: survivalb (S). Descriptive statistics were calculated for all variables of interest. Continuous measures were summarized via mean and SD, whereas categorical measures were summarized by the use of the count and percentage. The primary outcome of mortality was assessed between groups using a logistic model adjusting for the correlation among observations taken at the same site. Results were reported with OR and their associated 95% CI. (Reilly et al 2015)33 |
Methods: outcomes | 6a.2(iv)a | Describe the specific time point | Outcome domain: survivalb (S). The primary outcome was death and major disability at 2 y corrected age (gestational age plus chronological age minus 40 wk). (Oei et al 2017)36 |
Methods: outcomes | 6a.3 | If the analysis metric for the primary outcome represents within-subject change, define, and justify the minimal important change in individuals | Outcome domain: NEC, sepsis, survivalb (M). Based on data collected for local service appraisals around the year 2000, it was thought that the event rate of each of the primary outcomes might be as high as 15%. At a 2-sided significance level of 5%, a trial of 1300 infants would have 90% power to detect a 40% relative risk reduction from 15% to 9.1% for each of the primary outcomes. If the outcomes were less frequent, the trial would have 90% power to detect a 44% relative risk reduction from 12% to 6.7% or from 10% to 5.6%. These reductions were all deemed to be of clinical importance by the investigators. (Costeloe et al 2016)37 |
Methods: outcomes | 6a.4 | If the outcome data were continuous but were analyzed as categorical (method of aggregation), specific the cutoff values used | Outcome domain: survivalb, GGMA, GCA, visual impairment or blindness, hearing impairment or deafness (C). General intelligence was estimated with the full-scale IQ from the 4-subtest version of the Wechsler Abbreviated Scale of Intelligence-II. The scale also generates a verbal comprehension index (a measure of verbal acquired knowledge and verbal reasoning abilities) and a perceptual reasoning index (a measure of visual perception organization and reasoning skills). The indices are age standardized (mean = 100; SD = 15), with higher scores reflecting higher intelligence. Cognitive impairment was defined as a full-scale IQ <85 (<1 SD relative to the normative mean). Children who could not be assessed because of severe intellectual impairment or severe autism were coded as having a severe cognitive impairment. Impairment in visuomotor integration, visual perception, fine motor coordination, working memory, attention, and executive function was defined as a performance <1 SD relative to the normative mean of the respective test. Behavioral impairment was defined as a score >1 SD compared with the mean of the normative sample. (Murner-Lavanchy et al 2018)38 |
Methods: outcomes | 6a.5 | If outcome assessments were performed at several time points after randomization, state the time points used for analysis | Outcome domain: death, GCA (C). All surviving infants were regularly evaluated every 3 mo through a standardized nervous system examination, and a final examination at 18 mo of corrected age was performed to evaluate motor functions and MDI; this exam was used for the main analysis. (Song et al 201627 ; modified) |
Methods: outcomes | 6a.6 | If a composite outcome, define all individual components of the composite outcome | Outcome domain: sepsis, NEC, survivala (C). The primary end point was the composite incidence of NEC, serious infection (sepsis or meningitis), or all-cause mortality between 72 h and 60 d of life. Sepsis was defined as 1 positive blood culture result with noncoagulase-negative staphylococci or 1 positive blood culture with a coagulase- negative staphylococci pathogen and C-reactive protein level greater than 10 mg/L (to convert to nanomoles per liter, multiply by 9.524) within 2 d of blood culture or 2 positive blood cultures results with coagulase-negative staphylococci drawn within 2 d. Meningitis was defined by positive cerebrospinal fluid culture result. Necrotizing enterocolitis was defined as a Bell stage of II or higher. (Corpeleijn et al 2016)34 |
Methods: outcomes | 6a.7 | Identify any outcomes that were not prespecified in a trial registry or protocol | Outcome domain: survivala, sepsis, NEC, brain injury on imaging, ROP, BPD (C). However, in March 2014, the trial management committee, which was monitoring pooled event rates with blinding to results according to treatment group, decided to remove chronic lung disease from the primary outcome. This decision was communicated to the independent data and safety monitoring committee, which supported this recommendation. The protocol was amended in July 2016 to reflect the updated primary outcome of death, severe brain injury, severe retinopathy of prematurity, necrotizing enterocolitis, or late-onset sepsis. (Tarnow-Mordi et al 2017)28 |
Methods: outcomes | 6a.8(i)a | Provide a description of study instruments used to assess the outcome (eg, questionnaires, laboratory tests) | Outcome domain: survivalb, GGMA, GCA, hearing impairment or deafness, visual impairment or blindness (C). Severe disability will be defined by any of the following: a Bayley III Cognitive score <70, GMF level of III-V, blindness or profound hearing loss (inability to understand commands despite amplification). Moderate disability will be defined as a Bayley Cognitive score 70 to 84 and either a GMF level of II, a currently active seizure disorder, or a hearing deficit requiring amplification to understand commands. Infants without the primary outcome will be categorized as normal or mildly impaired. Normal will be defined by a cognitive score ≥85 and absence of any neurosensory deficits. Mild impairment will be defined by a cognitive score 70 to 84, or a cognitive score ≥85 and any of the following: presence of a GMF level I to II, seizure disorder or hearing loss not requiring amplification. (Shankaran et al 2017)39 |
Methods: outcomes | 6a.8(ii)a | Provide a description of the study instrument’s reliability in a population similar to the study sample | Outcome domain: GCA (S). This cognitive test has been extensively used in former preterm patients: as we have reported previously this test has acceptable reliability in this age group (with a weighted κ of 0.85). (Fabricated example) |
Methods: outcomes | 6a.8(iii)a | Provide a description of the study instrument’s validity in a population similar to the study sample | Outcome domain: NEC (S). The primary outcome was NEC, defined according to the gestational age-specific NEC definition proposed by Battersby et al. This definition has been validated using population-level data from neonates admitted to neonatal units in the UK. (Fabricated example) |
Methods: outcomes | 6a.8(iv)a | Provide a description of the study instrument’s responsiveness in a population similar to the study sample | Outcome domain: visual impairment or blindness (S). Visual acuity was assessed at trial enrolment, at the end of the 6-week intervention period and at 6 mo corrected age using VEP. Normative data for VEP has been defined in this population (McCulloch et al 1999) and our pilot results suggested that our intervention would produce detectable changes in VEP that would persist until at least 6 months of age (see supplemental data). (Fabricated example) |
Methods: outcomes | 6a.9(i)a | Describe who assessed the outcome (eg, nurse, parent) | Outcome domain: sepsis (S). The primary outcome for all infants was determined from blood and CVC tip culture results by 1 consultant microbiologist (S.J.K.) who was masked to the infant’s group assignment. (Kieran et al 2018)26 |
Methods: outcomes | 6a.9(ii)a | Describe any qualifications or trial-specific training necessary to administer the study instruments to assess the outcome | Outcome domain: survivalb, brain injury on imaging (M). A single assessor reviewed the cranial ultrasound scan reports for intraventricular hemorrhage, blind to the allocated group. Then 8 trained clinicians (neonatologists or radiologists) independently adjudicated each scan, blind to allocation. If the adjudication disagreed with the scan report review, a second independent adjudicator assessed the scan images. Remaining discrepancies were resolved by discussion. (Duley et al 2018)40 |
Methods: outcomes | 6a.10 | Describe any processes used to promote outcome data quality during data collection (eg, duplicate measurements) and after data collection (eg, range checks of outcome data values), or state where details can be found) | Outcome domain: GCA (S). Quality control was maintained by a national coordinating psychologist. All study data were sent to the Murdoch Children’s Research Institute in Melbourne, Australia. All data forms were checked by a research assistant not involved in primary data collection or entry. Data on test forms that were not completed according to test manual instructions were rejected. An independent data safety monitoring committee met around every 6 mo during recruitment. Site visits were done by the national coordinating teams for each country annually or biennially, and site visits at the national coordinating sites were done by principal investigators from other nations to check the validity of data. Summary data by allocation were presented to this committee. (McCann et al 2019)41 |
Methods: sample size | 7a.1 | Define and justify the target difference between treatment groups (eg, the minimal important difference) | Outcome domain: GCA (S). A sample size of 176 infants in each treatment group was estimated to be sufficient to detect a 5-point difference in the Bayley-III cognitive composite score with 80% power (α = 0.05) and a SD of 15. This assumed a 30% rate of exclusive mother’s milk feeding, 10% loss to follow-up during hospitalization, and 10% loss to follow-up after discharge. An effect size of 5 points was chosen because the literature suggests that this difference could translate into a reduction in the number of children born preterm requiring special education services (with associated costs) and an improvement in longer-term academic achievement. A meta-analysis completed before study initiation reported a difference of 5.18 in cognitive scores between infants born weighing less than 2500 g who were fed mother’s milk versus formula, suggesting that this effect size was achievable. (O’Connor et al 2016)42 |
Methods: statistical methods | 12a.1 | Describe any methods used to account for multiplicity in the analysis or interpretation of the primary and secondary outcomes (eg, coprimary outcomes, same outcome assessed at multiple time points, or subgroup analyses of an outcome) | Outcome domain: survivalb, BPD (C). Given the multiplicity of the planned primary outcome analyses, we applied a Holm-Bonferroni correction to adjust for the increased probability of a Type I error resulting from multiple testing. Subsequently, we reported the Holm-adjusted 95% confidence intervals for the estimated relative risk corresponding to the primary outcomes. Furthermore, the finding from the secondary analyses should be regarded as exploratory since there was no adjustment for multiple comparisons among these outcomes. (Fabricated example) |
Methods: statistical methods | 12a.2 | State and justify any criteria for excluding any outcome data from the analysis and reporting, or report that no outcome data were excluded | Outcome domain: survivalb, sepsis, NEC (C). In total, 377 infants were randomized. Four infantts’ informed consent procedure did not comply with requirements, so data of 373 infants were analyzed in the intent- to-treat analysis (Fig 1). Exclusion criteria (congenital infection or anomaly) became clear in 18 infants, only after starting the intervention. In those cases, the intervention was stopped immediately. Modified intent-to-treat analysis was performed without them. Figure 1 lists the reasons for excluding 76 infants from the perprotocol analysis. (Corpeleijn et al 2016)34 |
Methods: statistical methods | 12a.3(i)a | Describe methods to assess patterns of missingness (eg, missing not at random) | Outcome domain: survivalb, sepsis, BPD, NEC, GCA (S). We explored the missing data mechanism by investigating the relationship between missingness (as an indicator variable) and the baseline covariates. To assess whether the missing-completely-at-random (as opposed to missing-at-random) assumption was reasonable, for continuous covariates, we compared missingness using the Kruskal-Wallis test. Similarly, for categorical covariates, we compared missingness using the χ2 test. (Fabricated example) |
Methods: statistical methods | 12a.3(ii)a | Describe methods to handle missing outcome items or entire assessments (eg, multiple imputation) | Outcome domain: survivalb, BPD (C). We used multiple imputation under a multivariable normal distribution to impute missing outcome data in the primary analysis of all outcomes. Multiple imputation was done using the mice package in R. Furthermore, we conducted a sensitivity analysis in which we compared treatment effect estimates resulting from the multiply-imputed data set to the complete-case-analysis. (Fabricated example) |
Methods: statistical methods | 12a.4 | Provide definition of outcome analysis population relating to protocol nonadherence (eg, as a randomized analysis) | Outcome domain: GCA (S). The primary investigation was an ITT analysis, the infants being compared according to the treatment they were assigned at study entry. (Natalucci et al 2016)43 |
Results: outcomes and estimation | 17a.1 | Include results for all prespecified outcome analyses or state where results can be found if not in this report | Outcome domain: survivalb (S). Mortality data were available for 799 of the 801 infants. There was no significant difference in mortality between the 2 groups with 83 deaths (20.5%) in the wrap group and 79 deaths (20%) in the no-wrap group (OR 1.0, 95% CI 0.7–1.5). After adjustment for variables that could impact on the risk of death (GA, sex, method of delivery, birth wt, race, antenatal steroids), logistic regression analysis revealed that the difference between groups remained nonsignificant (OR 0.9, 95% CI 0.6–1.3). In infants 24 0/7 to 25 6/7 wk gestation (stratum 1), 26.1% of the wrap group died compared with 33.2% of the no-wrap group (OR 0.7, 95% CI 0.5–1.1; Table 2). In infants 26 0/7 to 27 6/7 wk gestation (stratum 2), 15.7% of the wrap group died compared with 9.2% of the no-wrap group (OR 1.8, 95% CI 1.0–3.3). After adjustment for variables that could impact the risk of death, logistic regression analysis revealed that the difference between groups was no longer statistically significant (stratum 1: OR 0.66, 95% CI 0.40–1.1 and stratum 2: OR 1.6, 95% CI 0.8–3.0). [Tables 2 and 3] (Reilly et al 2015)33 |
Results: ancillary analyses | 18.1 | If there were any analyses that were not prespecified, explain why they were performed | Outcome domain: survivalb, brain injury on imaging (M). As the original protocol was for a pilot trial, outcomes were measures of feasibility and analysis by allocated group was not planned. The SAP for the extended study was agreed before data were unblinded. For the planned main trial, main outcomes were death before discharge and intraventricular hemorrhage (all grades), hence these were the main outcomes in this SAP. Data presented here are for outcomes at discharge. (Duley et al 2018)40 |
BPD, bronchopulmonary dysplasia; CI, confidence interval; COIN, Core Outcomes in Neonatology; CONSORT, Consolidated Standards of Reporting Trials; CSF, cerebrospinal fluid; CVC, central venous catheter; GCA, general cognitive ability; GGMA, general gross motor ability; GMF, Gross Motor Functional; ITT, intention-to-treat; MDI, mental developmental index; NEC, necrotizing enterocolitis; OR, odds ratio; ROP, retinopathy of prematurity; SAP, statistical analysis plan; VEP, visual evoked potential.
To capture good reporting for each component the item addresses, these items were split into subparts, and were specified for the primary outcome.
The COIN outcome survival has been expanded to include death and mortality to reflect the outcome measured in the trial.
Examples were not found for all reporting items for outcomes related to short-term neonatal complications. Though relevant, no examples could be found for CONSORT-Outcomes 2022 items 6a.5, 6a.8(iii), and 12a.3(i) that relate to the justification of analysis timepoints for outcomes measured repeatedly, validity of study instruments, and efforts to assess patterns of missingness, respectively. For short-term neonatal complications, CONSORT-Outcomes 2022 6a.4 and 6a.8(iv) were not found to be applicable, as these items related to cut-off values for continuous outcomes analyzed as categorical, and the study instrument’s responsiveness in a population similar to the study sample.
For long-term neurodevelopmental outcomes, no examples of good reporting could be found for CONSORT-Outcomes 2022 items 6a.3, 6a.8(ii), 6a.8(iii), 6a.8(iv), and 12a.3(i): these items relate to definitions of minimally important change, the reliability of the study instrument, the validity of the study instrument, the responsiveness of study instruments, and efforts to assess patterns of missingness respectively.
For each reporting item, 1 gold-standard example is provided from 1 of the 3 outcomes categories where reporting was likely to be similar. These are presented in Table 2A for CONSORT 2010 items and Table 2B for CONSORT-Outcomes 2022 items. For a more comprehensive set of examples, a gold-standard example is provided for each outcome category for every item in the Supplemental Information (Supplemental Tables 3A and 3B), so long as the item is applicable to the outcome category. Examples were generated for the relevant items where examples of optimal reporting were not found; these are included in Tables 2A and 2B and Supplemental Tables 3A and 3B.
Discussion
Using evidence-informed and consensus-based clinical trial reporting standards, we identified examples of good reporting for important outcomes in neonatal trials. These examples should be considered for any future pediatric clinical trial reporting. By emulating the examples, researchers will meet the requirements of the CONSORT 2010 and CONSORT-Outcomes 2022 guidelines, clearly conveying their results to readers of their publication, thus ensuring that robust trial data can be critically appraised, synthesized, and translated into practice where appropriate.
We were able to identify examples of best reporting practice relating to most of the COIN outcomes and harmonized reporting items, but for some items no examples existed. In most cases this was because the reporting item was not applicable to the outcome. For other applicable items, however, no examples of good reporting could be found in the reviewed literature; we highlighted these here as they should be considered in future research. For example, although concepts such as “study instrument reliability” may not seem relevant to outcomes such as sepsis, bronchopulmonary dysplasia, or necrotizing enterocolitis, it should be recognized that different definitions of these outcomes may well have different measurement properties.20 The validity and reliability of pediatric outcomes has been assessed previously in relation to specific conditions such as upper extremity impairment,21 but although these properties may not have been well explored for many outcomes, they should be considered when outcome domains are selected during trial development and should be included in the final research report to allow readers to interpret the results correctly. The lack of examples of good reporting related to some concepts in the current study may reflect opportunities to improve the current neonatal evidence base. Although concepts such as “minimally important change” and “minimally important difference” are critical, they are rarely reported in neonatal research,1 in part because of the challenges in achieving a consensus threshold among different stakeholders from different settings. However, because it is feasible to involve parents and children in the identification of minimal clinically important differences,22 and it has been shown that reporting whether an intervention reached such agreed thresholds of a minimal clinically important difference produced trial results that were more meaningful than results assessing statistical significance alone,23,24 we need to begin the process of identifying these thresholds. Identifying these thresholds in a robust manner will not be easy, especially as they may vary depending on factors such as the importance of the outcome and the cost-benefit of any specific intervention. However, we believe that reporting these concepts will increase the potential for research to have impact in the neonatal unit or pediatric ward as well on the pages of a journal and should be a standard for trial development, particularly for outcomes such as length of stay and quality of life.
Strengths and Limitations
Strengths of this review include the focus on reporting of primary trial outcomes, which is what researchers and authors are likely to have reported most thoroughly, allowing us to identify examples of optimal reporting. We considered primary outcomes identified as the most important neonatal research outcomes by over 400 former patients, parents, healthcare professionals, and researchers from the COIN neonatal core outcomes set only.11 The standards we used to assess outcome reporting have been established by expert consensus and are recognized globally.15,18 This review also has limitations: there is no standardized methodology to identify examples of good reporting practice. However, because each example has been independently identified as good reporting by 3 expert reviewers, with further screening by the project steering group, we feel that these examples are of educational value and merit dissemination. Another limitation is that we only considered the primary outcomes of neonatal randomized controlled trials, which may have limited the type of outcomes we could include, because COIN outcomes such as quality of life are rarely the primary outcome for large trials. Although all trials were published after CONSORT 2010, another limitation is that the trials included in this review predate the CONSORT-Outcomes 2022 extension. Although outcome reporting should always be high quality and the guidelines were created from what was widely accepted to be good practice, we felt it was unreasonable to criticize trials for failing to meet this standard. Instead, in this review we sought to identify examples of excellent reporting and identify areas in which future research reporting can improve.
Transparency in trial reporting is important and relevant to everyone. We expect that our articles will be read by those who are seeking leadership and who want to improve their own practice. Trialists will want to improve their randomized clinical trials’ primary outcome reporting. Clinicians will want to be able to work out for themselves what is potentially clinically useful or not from reports of randomized clinical trials. Readers will want to know what outcomes have been studied and how they have been reported. This will be facilitated if trial reporting meets CONSORT 2010 and CONSORT-Outcomes 2022 recommendations. We recognize that meeting these standards is easier in the age of online publishing: including all the information stipulated in the guidance can be difficult within traditional word limits of printed journals. Some information may be most appropriately included in supplemental material or online only supplements, but it is important that it should still be available alongside the research report. Key outcome information should not be presented in “other” or “related” papers or protocols as these may not be available to all readers of trial reports and their absence may impair full and correct interpretation of trial results. Making all information available in 1 place will ensure that the reader is able to fully comprehend what has (and has not) been established in a clinical trial, which facilitates discussion between researchers and readers to optimally inform clinical practice.
Conclusions
Any research manuscript is a conversation between the researchers and their target audience. If clinical research results are to influence clinical practice, sufficient information relating to trial outcomes must be communicated in a transparent fashion from researcher to readers. Failure to achieve this is a form of research waste. We have identified examples of good outcome reporting that illustrate how important outcomes should be reported, as defined by the CONSORT 2010 and CONSORT-Outcomes 2022 guidelines, to meet the needs of evidence users. Emulating these examples when reporting future pediatric research will improve the transmission of information relating to outcomes and ultimately improve clinically important and patient-relevant outcomes.
Acknowledgments
We gratefully acknowledge the contributions made by the Core Outcome Reporting in Neonatal Trials (CORINT) Study Group, as detailed elsewhere.1
Dr Webbe was responsible for conceptualization, methodology, validation, visualization, writing the original draft, and writing, review, and editing of the completed article; Ms Baba was responsible for project administration, methodology, validation, formal analysis, investigation, visualization, data curation, writing, review, and editing; Dr Rodrigues and Ms Stallwood were responsible for investigation, writing, review, and editing; Ms Goren was responsible for investigation, data curation, writing, review, and editing; Ms Monsour was responsible for project administration, methodology, validation, investigation, writing, review, and editing; Drs Chang, Trivedi, Manley, Bogossian, Namba, Schmölzer, Popat, An Nguyen, Doyle, Jardine, Rysavy, Meyer, Muhd Helmi, Ming Lai, Hay, Onland, and Mun Choo, Ms McCall, and Ms Konstantinidis contributed to the acquisition of data and critically reviewed and revised the manuscript; Drs Gale and Soll were responsible for conceptualization, methodology, validation, writing, review, editing, and supervision; Drs Butcher and Offringa were responsible for conceptualization, methodology, validation, and writing, review, editing, and supervision; and all authors approved the final manuscript as submitted and agree to be accountable for all aspects of the work.
COMPANION PAPER: A companion to this article can be found online at www.pediatrics.org/cgi/doi/10.1542/peds.2022-060751.
FUNDING: No external funding.
CONFLICT OF INTERST DISCLOSURES: Dr Butcher declares consulting fees from Nobias Therapeutics, Inc, unrelated to this work. The other authors have indicated they have no conflicts of interest relevant to this article to disclose.
Comments