Clear outcome reporting in clinical trials facilitates accurate interpretation and application of findings and improves evidence-informed decision-making. Standardized core outcomes for reporting neonatal trials have been developed, but little is known about how primary outcomes are reported in neonatal trials. Our aim was to identify strengths and weaknesses of primary outcome reporting in recent neonatal trials.
Neonatal trials including ≥100 participants/arm published between 2015 and 2020 with at least 1 primary outcome from a neonatal core outcome set were eligible. Raters recruited from Cochrane Neonatal were trained to evaluate the trials’ primary outcome reporting completeness using relevant items from Consolidated Standards of Reporting Trials 2010 and Consolidated Standards of Reporting Trials-Outcomes 2022 pertaining to the reporting of the definition, selection, measurement, analysis, and interpretation of primary trial outcomes. All trial reports were assessed by 3 raters. Assessments and discrepancies between raters were analyzed.
Outcome-reporting evaluations were completed for 36 included neonatal trials by 39 raters. Levels of outcome reporting completeness were highly variable. All trials fully reported the primary outcome measurement domain, statistical methods used to compare treatment groups, and participant flow. Yet, only 28% of trials fully reported on minimal important difference, 24% on outcome data missingness, 66% on blinding of the outcome assessor, and 42% on handling of outcome multiplicity.
Primary outcome reporting in neonatal trials often lacks key information needed for interpretability of results, knowledge synthesis, and evidence-informed decision-making in neonatology. Use of existing outcome-reporting guidelines by trialists, journals, and peer reviewers will enhance transparent reporting of neonatal trials.
Inconsistent outcome selection and reporting is an issue in neonatal trials. Recommendations for standardization methods are available, in the form of a neonatal core outcome set and trial reporting guidelines such as the Consolidated Standards of Reporting Trials-Outcomes extension.
Assessment of recent neonatal trials revealed highly variable outcome reporting. Application of trial-reporting guidance identified several areas where reporting could be improved, including minimal important difference, outcome data missingness, outcome assessor blinding, and outcome multiplicity.
Appropriate outcome selection and reporting in clinical trials empower patients and caregivers to make evidence-informed treatment decisions and allow researchers to properly evaluate, synthesize, and build upon published research findings. However, large variability in outcome selection and reporting is a prevalent problem in pediatrics; variability in outcome selection and definition has been found in research related to opioid withdrawal in neonates, appendicitis treatments in children, longitudinal studies in children with feeding tubes and neurologic impairment, and in rare diseases including medium-chain acyl-coA dehydrogenase deficiency and phenylketonuria.1–4 Heterogeneous outcome reporting was found in trials investigating treatment of adolescent depression; for example, >60% (24 of 42 trials) did not clearly define their primary outcome, and specific details, such as the timing and/or frequency of outcome assessment, and details on the outcome assessor, were reported in only 6% and 17% of articles, respectively.5 Similar findings were found in trials about neurodevelopmental outcomes in preterm births; of the 55 outcome reporting items assessed, assessed trials fully reported only 26% to 46% of items.6
Inconsistent outcome selection and reporting have been observed in neonatal trials,7 and likely contribute to 50% of Cochrane Neonatal reviews having inconclusive findings.8 Limited evidence on patient-important outcomes has led to variation in clinical practices and subsequent health outcomes.9,10 Even when trial outcomes are reported appropriately, selective outcome reporting occurs, leading to uncertainty about which outcomes were measured in the trial and which ones were ultimately reported. Deficient reporting affects the certainty of evidence and strength of recommendations in practice guidelines.9,11,12 Standardized selection, measurement, and reporting of health outcomes in neonatology have been recommended.13
To harmonize outcome selection in neonatology research, the Core Outcomes in Neonatology project developed a neonatal core outcome set (COS) using a consensus process that drew on perspectives from methodologists, clinician scientists, health care professionals, patients, and families.10 The resulting neonatal COS, published in 2020, consists of 12 outcomes that are recommended to be reported in all neonatal clinical trials10 ; use of this COS in new trials will ensure important outcomes are investigated and reported, enabling comparison of results between trials, data syntheses, and identification of research gaps.14 To improve outcome reporting in trials, the Consolidated Standards of Reporting Trials (CONSORT) 201015 and the new CONSORT-Outcomes 2022 extension,16 which focuses on the selection, measurement, analysis, and reporting of outcomes, have been developed. These standards comprise a minimal set of items that should be reported in any trial report to ensure findings are valid and interpretable.
To better understand existing trial outcome-reporting strengths and weaknesses, we evaluated recently published neonatal trials that included at least 1 primary outcome from the neonatal COS. Using a 38-item checklist drawn from CONSORT 2010 and CONSORT-Outcomes 2022, we describe how these trials reported the definition, selection, measurement, analyses, and interpretation of primary trial outcomes. We hypothesized that moderate to large heterogeneity exists in neonatal trials included in our review.
Methods
The study’s steering group, composed of 6 members from The Hospital for Sick Children, Imperial College London, and Cochrane Neonatal (Supplemental Information), provided project oversight.
Eligibility Criteria
Full-text neonatal clinical trials published between 2015 and 2020 in English that:
investigated neonates requiring care during NICU admission;
had a sample size of ≥100 in each arm of the trial;
were either any type of randomized controlled trial or a cluster-randomized trial; and
reported either single, composite, or multiple primary outcome(s) with at least 1 primary outcome included from the neonatal COS (Table 1).
Study Sample Size (Median [IQR]) . | 456 [308–862] . |
---|---|
Publication y by articles, N = 36 | N (%) |
2015 | 7 (19) |
2016 | 10 (28) |
2017 | 8 (22) |
2018 | 5 (14) |
2019 | 6 (17) |
Continenta | |
Europe | 11 (30) |
Asia | 5 (14) |
North America | 3 (8) |
South America | 1 (3) |
Australasia | 1 (3) |
Otherb | 15 (42) |
Collaboration type | |
Multicenter, national | 17 (47) |
Multicenter, international | 15 (42) |
Single center | 4 (11) |
Outcome type | |
Single | 13 (36) |
Composite | 20 (56) |
Multiple | 3 (8) |
COIN primary outcomesc | |
Survival (including death and mortality)d | 23 (64) |
General cognitive ability | 12 (33) |
Chronic lung disease/bronchopulmonary dysplasiae | 10 (28) |
Sepsis | 8 (22) |
General gross motor ability | 6 (17) |
Necrotizing enterocolitis | 5 (14) |
Hearing impairment/deafness | 4 (11) |
Visual impairment/blindness | 3 (8) |
Brain injury on imaging | 2 (6) |
Retinopathy of prematurityd | 2 (6) |
Quality of life | 0 (0) |
Adverse events | 0 (0) |
Availability of supplemental information | |
Supplemental information available | 23 (56) |
Supplemental information not available | 16 (44) |
Study Sample Size (Median [IQR]) . | 456 [308–862] . |
---|---|
Publication y by articles, N = 36 | N (%) |
2015 | 7 (19) |
2016 | 10 (28) |
2017 | 8 (22) |
2018 | 5 (14) |
2019 | 6 (17) |
Continenta | |
Europe | 11 (30) |
Asia | 5 (14) |
North America | 3 (8) |
South America | 1 (3) |
Australasia | 1 (3) |
Otherb | 15 (42) |
Collaboration type | |
Multicenter, national | 17 (47) |
Multicenter, international | 15 (42) |
Single center | 4 (11) |
Outcome type | |
Single | 13 (36) |
Composite | 20 (56) |
Multiple | 3 (8) |
COIN primary outcomesc | |
Survival (including death and mortality)d | 23 (64) |
General cognitive ability | 12 (33) |
Chronic lung disease/bronchopulmonary dysplasiae | 10 (28) |
Sepsis | 8 (22) |
General gross motor ability | 6 (17) |
Necrotizing enterocolitis | 5 (14) |
Hearing impairment/deafness | 4 (11) |
Visual impairment/blindness | 3 (8) |
Brain injury on imaging | 2 (6) |
Retinopathy of prematurityd | 2 (6) |
Quality of life | 0 (0) |
Adverse events | 0 (0) |
Availability of supplemental information | |
Supplemental information available | 23 (56) |
Supplemental information not available | 16 (44) |
COIN, Core Outcomes in Neonatology; IQR, interquartile range.
North America (Canada, United States), Europe (Germany, United Kingdom, Netherlands, France, Ireland, Switzerland), Asia (India, Pakistan, China, Japan), South America (Colombia), Australasia (Australia).
Breakdown of other (n = 15, 42%) is as follows: United States/Taiwan (n = 1), United States/Canada (n = 2), Belgium/Czech Republic/Finland/France/Germany/Israel/Italy/Netherlands (n = 1), Australia/United Kingdom (n = 1), Australia/New Zealand/Canada/France/Northern Ireland/Pakistan/United States (n = 1), Canada/Australia/United Kingdom (n = 1), Canada/Australia/United Kingdom/Sweden (n = 1), Australia/Malaysia/Qatar (n = 2), United Kingdom/Ireland/Netherlands/England (n = 1), Netherlands/Belgium (n = 1), United States/Australia/Netherlands/Canada/Germany/Italy/Austria/South Korea/Singapore (n = 1), Netherlands/Switzerland/Canada/Czech Republic (n = 1), Australia/Italy/United States/United Kingdom/Canada/Netherlands/New Zealand (n = 1).
Some articles had >1 Core Outcomes in Neonatology primary outcome. Specific categorizations are shown in Supplemental Table 5.
The Core Outcomes in Neonatology outcome of survival has been expanded to include death and mortality to reflect the outcome measured in the trial.
Preterm only.
For composite outcomes, at least 1 component of the composite needed to be from the neonatal COS.
Study Identification and Search Strategy
Trials were identified in a 2-step process; first, through a previous systematic review (search dates: July 1, 2012–July 1, 2017).7 Second, an updated search was conducted on the Cochrane Central Register of Controlled Trials to capture trials published between January 1, 2017, and up to January 1, 2020 (search date: June 2020). Trials identified through the updated search were limited to those published in the following journals: Pediatrics, New England Journal of Medicine, Journal of Pediatrics, Archives of Disease in Childhood: Fetal and Neonatal Edition, PLoS One, Journal of the American Medical Association, The Lancet, JAMA Pediatrics, and The Lancet Child and Adolescent Health. The updated search was limited to these journals because the first 7 published 3 or more trials in the original systematic review, and between them accounted for 59% of all neonatal trials in the original sample; the last 2 were searched to account for 2 newer, highly relevant journals in neonatology. For trials identified in the updated search, we used the Cochrane Screen4Me workflow17 to exclude studies that were not randomized controlled trials; all documents were then uploaded to Covidence18 and duplicates were removed. The updated search strategy (Supplemental Table 5) is reported per the Preferred Reporting Items for Systematic Review and Meta-Analyses checklist (Supplemental Table 6).19 Targeted searching in Google Scholar was conducted to identify follow-up studies of the identified trials within the target date range. Follow-up studies that already had their original trial included in the study sample were excluded. No citation screening was conducted.
Trial Selection and Verification of Primary Outcomes
All 76 trials, identified in the previous systematic review,7 were screened for inclusion independently and in duplicate by 2 reviewers (E.S., C.R.). Discrepancies were resolved through discussions between reviewers, and if needed, a third reviewer (A.M.) was consulted. For records identified in the updated search, a prescreening was done to exclude those published outside of the specified time frame and/or not in target journals of interest (A.M.). Independent and duplicate screening of all remaining studies was completed by 4 reviewers (M.O., J.W., C.G., R.F.S.) who also undertook data extraction to identify and verify the trial primary outcome as part of the neonatal COS; conflicts were resolved by a senior reviewer (M.O.). Multiple reports from the same study were not included.
Data Collection and Analysis
Outcome Reporting Assessment
A standard data extraction form was created on the Research Electronic Data Capture platform.20 Outcome reporting was assessed through items selected from the CONSORT 2010 statement15 and the newly developed CONSORT-Outcomes 2022 extension16 (Supplemental Table 7). CONSORT-Outcomes 2022 and selected items relevant to outcome reporting from CONSORT 2010 were combined resulting in a 38-item data extraction tool. For items that address >1 aspect of reporting, the item was split into subparts to capture reporting comprehensiveness (eg, for CONSORT 4b, which looks at setting[s] and location[s] where the data were collected, it was split into CONSORT 4b[i] for setting[s] and CONSORT 4b[ii] for location[s]). Item rating options are detailed elsewhere (Supplemental Table 8). In addition to the ratings, raters were asked to extract verbatim text from the trial or applicable supplemental information to justify their rating. The data extraction form was piloted by members of the steering group (M.O., N.J.B., J.W., C.G., R.F.S.) and refined before use as described elsewhere (Supplemental Information).
Rater Identification, Piloting, and Outcome-Reporting Assessment
Comprehensiveness of outcome reporting in included trials was assessed by raters recruited from the Cochrane Neonatal authors database through e-mail. Details on rater identification and rater piloting is outlined in Supplemental Information; rater registration form is included in Supplemental Figure 3. Each included trial was assessed by at least 3 raters.
Data Cleaning and Analysis
After raters completed their assessments, the steering group reviewed and adjudicated all scores, particularly the reporting items with obvious discrepancies among the raters and the items rated as “unclear if reported.” Descriptive quantitative methods (counts and frequencies) were used to analyze the final data on Microsoft Excel (A.B.). In the overall assessment calculations, “not applicable” items were excluded.
Results
Search and Screening Results
Figure 1 details the number of records identified in the search, reasons for exclusions, and the final number of trials included. A total of 37 neonatal trials were included. The trial used in the rater pilot exercise21 was removed from the final study sample, leaving 36 trials for the outcome-reporting assessment.
Study Characteristics
Table 1 details the characteristics of the 36 trials; additional details are provided (Supplemental Table 9). Trials had a median sample size of 456 (interquartile range 308–862), and were most commonly national multicenter trials (47%) conducted in Europe (30%). Composite outcomes were the most common trial primary outcome (56%). Ten of 12 neonatal COS primary outcomes were represented as primary outcomes, with survival/death/mortality being the most common (64%); none of the included trials used quality of life or adverse events as their primary outcome.
Rater Identification, Piloting, Final Group
Seventy-two systematic review authors from the Cochrane Neonatal review group registered through the online form to participate as a rater in the study. The pilot exercise was completed by 46 raters. One rater’s pilot score did not meet the criteria of agreement with other raters and was therefore excluded. A total of 45 raters were sent materials to complete the outcome-reporting assessment for 3 or 4 trial reports each. Of these, 39 raters (Table 2; Supplemental Table 10) completed outcome-reporting assessments.
. | Raters N (%) . |
---|---|
Total number of raters | 39 (100) |
Rolea | |
Neonatologist | 28 (72) |
Systematic review/meta-analysis author | 27 (69) |
Trial report author | 14 (48) |
Trial protocol author | 11 (28) |
Journal editor | 7 (18) |
Research ethics committee member that reviews trial protocols | 6 (15) |
Epidemiologist | 4 (10) |
Biostatistician | 2 (5) |
Reporting guideline developer | 2 (5) |
Other | 1 (3) |
Country of workplace | |
North America | 13 (33) |
Australasia | 10 (26) |
Europe | 7 (18) |
Asia | 8 (21) |
Middle East | 1 (3) |
Career stage | |
Early career researcher | 9 (23) |
Mid-career researcher | 15 (38) |
Senior career researcher | 7 (18) |
Other (eg, project manager; PhD student) | 7 (18) |
Level of educationa | |
MD | 13 (33) |
MD/PhD | 11 (28) |
Master’s degree | 11 (28) |
PhD | 8 (21) |
Bachelor’s degree | 7 (18) |
. | Raters N (%) . |
---|---|
Total number of raters | 39 (100) |
Rolea | |
Neonatologist | 28 (72) |
Systematic review/meta-analysis author | 27 (69) |
Trial report author | 14 (48) |
Trial protocol author | 11 (28) |
Journal editor | 7 (18) |
Research ethics committee member that reviews trial protocols | 6 (15) |
Epidemiologist | 4 (10) |
Biostatistician | 2 (5) |
Reporting guideline developer | 2 (5) |
Other | 1 (3) |
Country of workplace | |
North America | 13 (33) |
Australasia | 10 (26) |
Europe | 7 (18) |
Asia | 8 (21) |
Middle East | 1 (3) |
Career stage | |
Early career researcher | 9 (23) |
Mid-career researcher | 15 (38) |
Senior career researcher | 7 (18) |
Other (eg, project manager; PhD student) | 7 (18) |
Level of educationa | |
MD | 13 (33) |
MD/PhD | 11 (28) |
Master’s degree | 11 (28) |
PhD | 8 (21) |
Bachelor’s degree | 7 (18) |
MD, doctor of medicine; PhD, doctor of philosophy.
Raters were allowed to indicate >1 role, and as such, percentages do not sum to 100%.
Outcome-Reporting Assessment
Discrepancies in rating assessment were observed for all 38 reporting items. The 4 items accounting for the most discrepancies were those related to minimal important difference (7a.1), reporting of absolute and relative effect sizes (17b), reliability of study instruments (6a.8[ii]), and description of study instrument(s) used to assess the outcome (6a.8[i]). Through steering group discussions, all apparent discrepancies were resolved, and a final reporting assessment was adjudicated to each reporting item for each trial.
Figure 2 portrays the variability of outcome-reporting comprehensiveness across the trials, calculated as the percentage of fully reported, partially reported, and unreported. The median percentage of fully reported items across trials was 75% (range 39–94%). Complete reporting was found in 22 (63%) reporting items for >80% of included trials.
CONSORT 2010
Table 3 details the reporting frequency and percentage for the 13 CONSORT 2010 items relevant to trial outcomes. Not all items are applicable to all primary outcomes (eg, composite outcome [6a.6] is not applicable to single primary outcomes) or intervention (eg, blinding [11a] was not done in a trial). Reporting frequencies for each item were calculated after excluding trial reports where the item was not applicable; the denominator for each item is reported in Table 3.
Item # . | Description of Reporting Item . | Fully Reported N (%) . | Unreported N (%) . | Partially Reported N (%) . | N/AdN [De] . |
---|---|---|---|---|---|
Trial setting | |||||
4b(i)a | Setting(s) where the [primary outcome] data were collected | 34 (94) | 2 (6) | 0 (0) | 0 [36] |
4b(ii)a | Location(s) where the [primary outcome] data were collected | 33 (91) | 3 (9) | 0 (0) | 0 [36] |
Trial primary outcomes | |||||
6a | Completely defined prespecified primary outcome measures, including how and when they were assessed | 35 (97) | 1 (3) | 0 (0) | 0 [36] |
6bb | Any changes to trial outcomes after the trial commenced, with reasonsc | 11 (31) | 25 (69) | 0 (0) | 0 [36] |
Blinding | |||||
11ab | If done, who was blinded after assignment to interventions (eg, participants, care providers, those assessing outcomes) and how | 19 (66) | 2 (7) | 8 (27) | 7 [29] |
Statistical methods | |||||
12a | Statistical methods used to compare groups for primary outcomes | 36 (100) | 0 (0) | 0 (0) | 0 [36] |
12b | Methods for additional analyses, such as subgroup analyses and adjusted analyses | 32 (94) | 2 (6) | 0 (0) | 2 [34] |
Trial results | |||||
13a | For each group, the number of participants who were randomly assigned, received intended treatment, and were analyzed for the primary outcome | 36 (100) | 0 (0) | 0 (0) | 0 [36] |
13b | For each group, losses and exclusions after randomization, together with reasons | 33 (97) | 1 (3) | 0 (0) | 2 [34] |
16d | For each group, number of participants (denominator) included in each analysis and whether the analysis was by original assigned groups | 35 (97) | 0 (0) | 1 (3) | 0 [36] |
17ab | For each primary outcome, results for each group, and the estimated effect size and its precision (such as 95% confidence interval) | 34 (94) | 0 (0) | 2 (6) | 0 [36] |
17bb | For binary outcomes, presentation of both absolute and relative effect sizes is recommended. | 25 (71) | 2 (6) | 8 (23) | 1 [35] |
18a | Results of any other analyses performed, including subgroup analyses and adjusted analyses, distinguishing prespecified from exploratory | 29 (83) | 4 (11) | 2 (6) | 1 [35] |
Item # . | Description of Reporting Item . | Fully Reported N (%) . | Unreported N (%) . | Partially Reported N (%) . | N/AdN [De] . |
---|---|---|---|---|---|
Trial setting | |||||
4b(i)a | Setting(s) where the [primary outcome] data were collected | 34 (94) | 2 (6) | 0 (0) | 0 [36] |
4b(ii)a | Location(s) where the [primary outcome] data were collected | 33 (91) | 3 (9) | 0 (0) | 0 [36] |
Trial primary outcomes | |||||
6a | Completely defined prespecified primary outcome measures, including how and when they were assessed | 35 (97) | 1 (3) | 0 (0) | 0 [36] |
6bb | Any changes to trial outcomes after the trial commenced, with reasonsc | 11 (31) | 25 (69) | 0 (0) | 0 [36] |
Blinding | |||||
11ab | If done, who was blinded after assignment to interventions (eg, participants, care providers, those assessing outcomes) and how | 19 (66) | 2 (7) | 8 (27) | 7 [29] |
Statistical methods | |||||
12a | Statistical methods used to compare groups for primary outcomes | 36 (100) | 0 (0) | 0 (0) | 0 [36] |
12b | Methods for additional analyses, such as subgroup analyses and adjusted analyses | 32 (94) | 2 (6) | 0 (0) | 2 [34] |
Trial results | |||||
13a | For each group, the number of participants who were randomly assigned, received intended treatment, and were analyzed for the primary outcome | 36 (100) | 0 (0) | 0 (0) | 0 [36] |
13b | For each group, losses and exclusions after randomization, together with reasons | 33 (97) | 1 (3) | 0 (0) | 2 [34] |
16d | For each group, number of participants (denominator) included in each analysis and whether the analysis was by original assigned groups | 35 (97) | 0 (0) | 1 (3) | 0 [36] |
17ab | For each primary outcome, results for each group, and the estimated effect size and its precision (such as 95% confidence interval) | 34 (94) | 0 (0) | 2 (6) | 0 [36] |
17bb | For binary outcomes, presentation of both absolute and relative effect sizes is recommended. | 25 (71) | 2 (6) | 8 (23) | 1 [35] |
18a | Results of any other analyses performed, including subgroup analyses and adjusted analyses, distinguishing prespecified from exploratory | 29 (83) | 4 (11) | 2 (6) | 1 [35] |
D, denominator; N/A, not applicable.
To capture good reporting for each component the item addresses, these items were split into subparts, and were specified for the primary outcome.
Items were considered partially reported when 1 of the item’s components was reported in the trial (eg, 6b [changes and/or reasons reported], 11a [who was blinded and/or how blinding was performed], 16 [number of participants and/or whether analysis was by original assigned groups reported], 17a [effect size and/or precision], 17b [absolute and/or relative effect sizes], 18 [results of other analyses and/or distinguishing prespecified from exploratory reported]).
In situations where authors did not explicitly state that no changes to trial outcomes after trial commencement were made, we marked it as “not reported” because there were no changes to the trial outcomes to report.
N/A rating was applied when the reporting item was not applicable to the primary outcome or the trial (eg, blinding was not performed in the trial [11a]).
Several items do not sum to a total denominator of 36, because there were items not applicable to some trial reports. The number of trial reports that the item was not applicable for are excluded from the overall outcome-reporting assessment calculations because they are not relevant to the item in consideration. The denominator for each reporting item is reflected in the brackets.
CONSORT 2010 items were generally well reported, with 10 of 13 (77%) items fully reported in >80% of trials. Two of these 10 items were completely reported in all trials:
statistical methods used to compare groups for the primary outcome (item 12a); and
number of participants who were randomly assigned, received intended treatment, and were analyzed for the primary outcome (13a).
The 3 items reported in <80% of trials were:
changes to trial outcomes after trial commencement (6b; 31%);
blinding (11a; 66%); and
absolute and relative effect sizes (17b; 71%).
CONSORT-Outcomes 2022
Table 4 details the reporting frequency and percentage of the 25 CONSORT-Outcomes 2022 items. Not all items were applicable to all 36 trials (eg, study instrument reliability [6a.7(ii)] is not applicable to mortality). As before, the number of trial reports that the item was not applicable to was excluded from the outcome reporting frequency and percentage calculation; the denominator for each item is noted in Table 4. Twelve (48%) items were fully reported in >80% of trials, of which 3 items were completely reported in all applicable trials: (1) specific measurement variable (6a.2[i]), (2) time point(s) used for analysis (6a.4), and (3) defined individual components for a composite outcome (6a.6). Although 9 trials (25%) reported outcomes that were not originally specified in a trial registry or protocol, 27 (75%) trials either did not identify or report additional outcomes that were not prespecified (6a.7).
Item # . | Description of Reporting Item . | Fully Reported N (%) . | Unreported N (%) . | Partially Reported N (%) . | N/AdN [De] . |
---|---|---|---|---|---|
6a.1 | Provide a rationale for the selection of the domain for the trial’s primary outcome. | 32 (89) | 4 (11) | 0 (0) | 0 [36] |
6a.2(i)a | Describe the specific measurement variable (eg, systolic blood pressure). | 36 (100) | 0 (0) | 0 (0) | 0 [36] |
6a.2(ii)a | Describe the specific analysis metric (eg, change from baseline, final value, time to event). | 34 (97) | 1 (3) | 0 (0) | 1 [35] |
6a.2(iii)a | Describe the specific method of aggregation (eg, mean, proportion). | 35 (97) | 1 (3) | 0 (0) | 0 [36] |
6a.2(iv)a | Describe the specific time point for each outcome. | 35 (97) | 1 (3) | 0 (0) | 0 [36] |
6a.3 | If the analysis metric for the primary outcome represents within-subject change, define and justify the minimal important change in individuals. | 0 (0) | 2 (67) | 1 (33) | 33 [3] |
6a.4 | If the outcome data were continuous but were analyzed as categorical (method of aggregation), specify the cutoff values used | 8 (89) | 1 (11) | 0 (0) | 27 [9] |
6a.5 | If outcome assessments were performed at several time points after randomization, state the time points used for the analysis. | 1 (100) | 0 (0) | 0 (0) | 35 [1] |
6a.6 | If a composite outcome, define all individual components of the composite outcome. | 20 (100) | 0 (0) | 0 (0) | 16 [20] |
6a.7 | Identify any outcomes that were not prespecified in a trial registry or protocol.c | 9 (25) | 27 (75) | 0 (0) | 0 [36] |
6a.8(i)a | Provide a description of study instruments used to assess the outcome (eg, questionnaires, laboratory tests). | 30 (97) | 1 (3) | 0 (0) | 5 [31] |
6a.8(ii)a,b | Provide a description of the study instrument’s reliability in a population similar to the study sample. | 5 (17) | 23 (77) | 2 (6) | 6 [30] |
6a.8(iii)a,b | Provide a description of the study instrument’s validity in a population similar to the study sample. | 3 (10) | 25 (83) | 2 (6) | 6 [30] |
6a.8(iv)a,b | Provide a description of the study instrument’s responsiveness in a population similar to the study sample. | 0 (0) | 0 (0) | 0 (0) | 36 [0] |
6a.9(i)a | Describe who assessed the outcome (eg, nurse, parent). | 20 (56) | 15 (41) | 1 (3) | 0 [36] |
6a.9(ii)a | Describe any qualifications or trial-specific training necessary to administer the study instruments to assess the outcome. | 14 (40) | 21 (60) | 0 (0) | 1 [35] |
6a.10b | Describe any processes used to promote outcome data quality during data collection (eg, coprimary outcomes, same outcome assessed at multiple time points, or subgroup analyses of 1 outcome). | 13 (37) | 19 (54) | 3 (9) | 1 [35] |
Sample size | |||||
7a.1b | Define and justify the target difference between treatment groups (eg, the minimal important difference). | 10 (28) | 18 (50) | 8 (22) | 0 [36] |
Statistical methods | |||||
12a.1 | Describe any methods used to account for multiplicity in the analysis or interpretation of the primary and secondary outcomes (eg, coprimary outcomes, same outcome assessed at multiple time points, or subgroup analyses of 1 outcome). | 11 (42) | 15 (58) | 0 (0) | 10 [26] |
12a.2b | State and justify any criteria for excluding any outcome data from the analysis and reporting, or report that no outcome data were excluded. | 35 (97) | 1 (3) | 0 (0) | 0 [36] |
12a.3(i)a | Describe methods to assess patterns of missingness (eg, missing not at random). | 7 (24) | 22 (76) | 0 (0) | 7 [29] |
12a.3(ii)a | Describe methods on handling missing outcome items or entire assessments (eg, multiple imputation). | 19 (66) | 10 (34) | 0 (0) | 7 [29] |
12a.4 | Provide definition of outcome analysis population relating to protocol nonadherence (eg, as a randomized analysis). | 31 (86) | 5 (14) | 0 (0) | 0 [36] |
Results and justification | |||||
17a.1 | Include results for all prespecified outcome analyses or state where results can be found if not in this report. | 34 (94) | 2 (6) | 0 (0) | 0 [36] |
18.1 | If there were any analyses that were not prespecified, explain why they were performed. | 15 (79) | 4 (21) | 0 (0) | 17 [19] |
Item # . | Description of Reporting Item . | Fully Reported N (%) . | Unreported N (%) . | Partially Reported N (%) . | N/AdN [De] . |
---|---|---|---|---|---|
6a.1 | Provide a rationale for the selection of the domain for the trial’s primary outcome. | 32 (89) | 4 (11) | 0 (0) | 0 [36] |
6a.2(i)a | Describe the specific measurement variable (eg, systolic blood pressure). | 36 (100) | 0 (0) | 0 (0) | 0 [36] |
6a.2(ii)a | Describe the specific analysis metric (eg, change from baseline, final value, time to event). | 34 (97) | 1 (3) | 0 (0) | 1 [35] |
6a.2(iii)a | Describe the specific method of aggregation (eg, mean, proportion). | 35 (97) | 1 (3) | 0 (0) | 0 [36] |
6a.2(iv)a | Describe the specific time point for each outcome. | 35 (97) | 1 (3) | 0 (0) | 0 [36] |
6a.3 | If the analysis metric for the primary outcome represents within-subject change, define and justify the minimal important change in individuals. | 0 (0) | 2 (67) | 1 (33) | 33 [3] |
6a.4 | If the outcome data were continuous but were analyzed as categorical (method of aggregation), specify the cutoff values used | 8 (89) | 1 (11) | 0 (0) | 27 [9] |
6a.5 | If outcome assessments were performed at several time points after randomization, state the time points used for the analysis. | 1 (100) | 0 (0) | 0 (0) | 35 [1] |
6a.6 | If a composite outcome, define all individual components of the composite outcome. | 20 (100) | 0 (0) | 0 (0) | 16 [20] |
6a.7 | Identify any outcomes that were not prespecified in a trial registry or protocol.c | 9 (25) | 27 (75) | 0 (0) | 0 [36] |
6a.8(i)a | Provide a description of study instruments used to assess the outcome (eg, questionnaires, laboratory tests). | 30 (97) | 1 (3) | 0 (0) | 5 [31] |
6a.8(ii)a,b | Provide a description of the study instrument’s reliability in a population similar to the study sample. | 5 (17) | 23 (77) | 2 (6) | 6 [30] |
6a.8(iii)a,b | Provide a description of the study instrument’s validity in a population similar to the study sample. | 3 (10) | 25 (83) | 2 (6) | 6 [30] |
6a.8(iv)a,b | Provide a description of the study instrument’s responsiveness in a population similar to the study sample. | 0 (0) | 0 (0) | 0 (0) | 36 [0] |
6a.9(i)a | Describe who assessed the outcome (eg, nurse, parent). | 20 (56) | 15 (41) | 1 (3) | 0 [36] |
6a.9(ii)a | Describe any qualifications or trial-specific training necessary to administer the study instruments to assess the outcome. | 14 (40) | 21 (60) | 0 (0) | 1 [35] |
6a.10b | Describe any processes used to promote outcome data quality during data collection (eg, coprimary outcomes, same outcome assessed at multiple time points, or subgroup analyses of 1 outcome). | 13 (37) | 19 (54) | 3 (9) | 1 [35] |
Sample size | |||||
7a.1b | Define and justify the target difference between treatment groups (eg, the minimal important difference). | 10 (28) | 18 (50) | 8 (22) | 0 [36] |
Statistical methods | |||||
12a.1 | Describe any methods used to account for multiplicity in the analysis or interpretation of the primary and secondary outcomes (eg, coprimary outcomes, same outcome assessed at multiple time points, or subgroup analyses of 1 outcome). | 11 (42) | 15 (58) | 0 (0) | 10 [26] |
12a.2b | State and justify any criteria for excluding any outcome data from the analysis and reporting, or report that no outcome data were excluded. | 35 (97) | 1 (3) | 0 (0) | 0 [36] |
12a.3(i)a | Describe methods to assess patterns of missingness (eg, missing not at random). | 7 (24) | 22 (76) | 0 (0) | 7 [29] |
12a.3(ii)a | Describe methods on handling missing outcome items or entire assessments (eg, multiple imputation). | 19 (66) | 10 (34) | 0 (0) | 7 [29] |
12a.4 | Provide definition of outcome analysis population relating to protocol nonadherence (eg, as a randomized analysis). | 31 (86) | 5 (14) | 0 (0) | 0 [36] |
Results and justification | |||||
17a.1 | Include results for all prespecified outcome analyses or state where results can be found if not in this report. | 34 (94) | 2 (6) | 0 (0) | 0 [36] |
18.1 | If there were any analyses that were not prespecified, explain why they were performed. | 15 (79) | 4 (21) | 0 (0) | 17 [19] |
D, denominator; N/A, not applicable.
To capture good reporting for each component the item addresses, these items were split into subparts and were specified for the primary outcome.
Items were considered partially reported when 1 of the item’s components was reported in the trial (eg, 6a.5 [defined and/or justified minimal important change], 6a.7[ii]/6a.7[iii]/6a.7[iv] [described the reliability/validity/responsiveness of study instruments and/or in a population representative of the study population], 6a.9 [described both processes during and/or after data collection], 7a.1 [defined and/or justified the minimal important difference], 12a.2 [stated and/or justified criteria to exclude data, or demonstrated that no outcome data were excluded]).
In situations where authors did not explicitly state that no changes to trial outcomes after trial commencement were made, we marked it as not reported because there were no changes to the trial outcomes to report.
Not applicable rating was applied when the reporting item was not applicable to the primary outcome or the trial (eg, study instrument reliability [6a.7(ii)] is not applicable to the primary outcome of mortality, outcome assessed may not be continuous [6a.2(i)] or measured only once [6a.2(ii), 6a.7(iv)], and also when the item was not relevant to the trial [eg, no missing data (12a.3(i), 12a.3(ii)] or no nonprespecified analyses were reported (18.1)]).
Several items do not sum to a total denominator of 36, because there were items not applicable to some reports. The number of reports that the item was not applicable for are excluded from the overall outcome-reporting assessment calculations because they are not relevant to the item in consideration. The denominator for each reporting item is reflected in the brackets.
Discussion
This detailed analysis of primary trial outcome reporting in 36 recent neonatal trials identified important reporting gaps and elicited contributing factors to inadequate reporting. Although many key trial items were well reported, descriptions of blinding of the outcome assessor, minimal important difference between treatment groups, outcome data missingness, and how outcome multiplicity was dealt with in the analysis were insufficiently reported.
Overall, 10 (77%) CONSORT 2010 items were fully reported in >80% of included trials, whereas 12 (48%) of the new CONSORT-Outcomes 2022 items had been reported in >80% of trials. Key items were often insufficiently reported; <20% reported on the instrument’s reliability and validity, and only 40% reported qualifications or trial-specific training needed to administer the study instrument. Because the included trials were published from 2015 onwards, it is likely that trialists had access to CONSORT 2010 in drafting the trial report. Though CONSORT-Outcomes 2022 was still in development during the publication of all included trials and conduct of the current study, it is encouraging that close to half of CONSORT-Outcomes 2022 items are addressed in >80% trial reports; however, there is still room for improvement. Future studies may use our results to examine whether reporting improves over time.
Awareness and understanding of key concepts related to trial outcomes is fundamental for trialists to be able to comprehensively report their findings, and for research users to assess reporting quality. Our review of the extracted verbatim texts and discrepancies among the 39 raters’ assessments revealed that a small number of items, including blinding, minimal important difference, missingness, and management of outcome multiplicity, were confusing to raters. Raters often confused minimal important difference with sample size; although related, these are distinct concepts in the design and reporting of trials. Along with the CONSORT-Outcomes glossary (Supplemental Table 11), we provide the 5 core elements of a defined outcome (Supplemental Table 12) as resources to understand key concepts related to outcome reporting.
Raters’ scores also showed how frequently information in trial reports is simply missed; their assessments may resemble those of peer reviewers for academic journals. This points to a need for training and a structured reporting format for trials that everyone, including expert peer reviewers, could use during the peer review process.22 Evidence shows that, although journals support authors and peer reviewers by encouraging the use of reporting guidelines and checklists to guide peer review,23–25 currently, “instructions to authors” are variable between journals, and guidance for peer reviewers is heterogenous. Efforts to improve and provide clear standards for reporting outcomes to both authors and peer reviewers are needed to improve outcome reporting.
Insufficient reporting of key trial outcomes highlights weaknesses in current publication practices among academic journals. Because of various constraints, such as figure and word count limitations, trialists may leave out critical information to meet submission requirements. One reason for apparent limited outcome reporting in published trials is that information could be published in protocols or supplemental information. In the context of limited journal word counts, supplemental information is useful for obtaining further context on the results and methods that may not be found in the main text.26 When available, we provided the supplemental information along with the trial report to our raters for their assessment. Referrals to appendices to the trial report and a rating, “They refer to another document,” were both considered adequate reporting. In our sample, trials without any available supplemental information (n = 16; 44%) generally had more unreported items; the 5 trials with the most unreported items had no accompanying supplemental information. We conclude that supplemental information is an underused tool to attain full trial reporting.
Avenues for Improved Outcome Reporting
The identified underreported items reemphasize the need for reporting guidelines like CONSORT and its extensions. Uptake of these guidelines has been shown to improve reporting27,28 ; CONSORT 2010 items are increasingly well reported, suggesting that they are effective.29,30 Journals could endorse use of reporting guidelines as a standard for submission, and identify the main reporting guidelines for trials they send out for review.31–34 Journals could also mandate trial registration,34,35 implement guidance for what to include in supplemental information while considering its usefulness to different stakeholders,26 and make statistical analysis plans and changes to information included in a registry readily available for their readership. Such incentives in the publication process that encourage comprehensive reporting have been used before and showed a positive impact.27 Uptake of the neonatal COS, combined with use of reporting guidelines, could facilitate meta-analyses and translation of results into clinical practice, a positive trend observed in other fields.36
Beyond trialists, journal editors, and peer reviewers, knowledge users and consumers of research need to be aware of the pervasive, suboptimal reporting in research.34 There is a need for transparency in reporting to reduce bias and research waste, because clinical decisions around fitting treatment options that impact patients and families are based on published trial results. Though sometimes not aware of a trial’s quality, readers deserve to know that there are ways trialists and journals can enhance reporting, such as through understanding and endorsement of the “minimal reporting items” in reporting guidelines. Readers should look carefully at what outcomes have been reported and how they have been reported. We urge readers to consult other accompanying, accessible documents in tandem with a trial report, such as the supplemental information, trial protocol, statistical analysis plan, and trial registration, to gain a comprehensive overview of what they are reading, and to be able to interpret the results presented. General readers’ awareness of these issues may propel transparency in research and increase buy-in from all stakeholders within the research enterprise.
Strengths and Limitations
Strengths of our study include the involvement of an international group of clinicians, systematic reviewers, and trial experts identified through Cochrane Neonatal who assessed outcome reporting in recent neonatal trials, which allowed us to develop a better understanding of what is considered well reported or inadequately reported. We used items from empirically developed reporting guidelines using robust consensus methodology to rate outcome reporting. Although we used CONSORT-Outcomes 2022 before its publication in December 2022, it provided us insight to current neonatal trial reporting because it codifies best reporting practices. Focusing on Core Outcomes in Neonatology primary outcomes in recent neonatal trials ensured that we examined the reporting of outcomes identified as important to key stakeholders in neonatology.
Limitations first include our restriction to trial primary outcomes; we cannot guarantee our findings apply to reporting of secondary outcomes. In adjudicating whether an item was reported or not, we took a liberal approach, meaning if any element of the reporting item was addressed, we scored it as reported; however, we found very few examples of optimal reporting. Examples of optimal reporting of each item to guide prospective writers of neonatology trials are needed. Second, we were unable to assess reporting of quality of life and adverse events, because these were not used in our sample. We note that these are rarely used as primary trial outcomes, yet their reporting deserves dedicated attention. Third, we recognize the potential of language bias because we only included studies published in English. Fourth, the journals that the included trials were published in endorse the use of reporting guidelines, which is likely a contributing factor as to why CONSORT 2010 items were generally well reported. With this in consideration, our findings may overestimate the true completeness of outcome reporting in neonatal clinical trials, because many neonatal studies are not published in the journals to which we restricted our search. Outcome-reporting comprehensiveness may differ in journals that do not endorse reporting guidelines.
Conclusions
Reporting of primary outcomes in neonatal trials often lacks key information needed for interpreting results, knowledge synthesis, and evidence-informed decision-making in neonatology. Use of existing outcome-reporting guidelines such as CONSORT 2010 and CONSORT-Outcomes 2022 by trialists, journals, and peer reviewers will increase usability of neonatal research, provide more opportunities for evidence synthesis to inform decision-making at the bedside, reduce research waste, and improve child health outcomes.
Acknowledgments
We thank the Core Outcome Reporting in Neonatal Trials Study Group (Supplemental Information and Supplemental Table 10) for their contributions. We also thank Colleen Ovelman (managing editor, Cochrane Neonatal) for her assistance with the updated search strategy.
Ms Baba was responsible for project administration, methodology, validation, formal analysis, investigation, visualization, data curation, writing including of the original draft, review, and editing; Dr Webbe was responsible for conceptualization, methodology, validation, visualization, and writing, and review and editing; Dr Rodrigues and Ms Stallwood were responsible for investigation and writing, and review and editing; Ms Goren was responsible for investigation, data curation, and writing, and review and editing; Ms Monsour was responsible for project administration, methodology, validation, investigation, and writing, and review and editing; Drs Chang, Trivedi, Manley, Bogossian, Namba, Schmölzer, Harding, Nguyen, Doyle, Jardine, Rysavy, Meyer, Helmi, Lai, Hay, Onland, and Choo, and Ms McCall and Mr Konstantinidis contributed to the acquisition of data and critically reviewed and revised the manuscript; Drs Gale and Soll were responsible for conceptualization, methodology, validation, and writing, and review and editing; Drs Butcher and Offringa were responsible for conceptualization, methodology, supervision, validation, and writing, review and editing, and supervision; and all authors approved the final manuscript as submitted and agree to be accountable for all aspects of the work.
COMPANION PAPER: A companion to this article can be found online at www.pediatrics.org/cgi/doi/10.1542/peds.2022-060765.
FUNDING: No external funding.
CONFLICT OF INTEREST DISCLOSURES: Dr Butcher declares consulting fees from Nobias Therapeutics, Inc, unrelated to this work. All other authors have indicated they have no conflicts of interest relevant to this article to disclose.
Comments