Retinopathy of prematurity eye examinations conducted in the neonatal intensive care.
To combine randomized trials of pain-relieving interventions for retinopathy of prematurity examinations using network meta-analysis.
Systematic review and network meta-analysis of Medline, Embase, Cochrane Central Register of Controlled Trials, Web of Science, and the World Health Organization International Clinical Trials Registry Platform. All databases were searched from inception to February 2017.
Abstract and title screen and full-text screening were conducted independently by 2 reviewers.
Data were extracted by 2 reviewers and pooled with random effect models if the number of trials within a comparison was sufficient. The primary outcome was pain during the examination period; secondary outcomes were pain after the examination, physiologic response, and adverse events.
Twenty-nine studies (N = 1487) were included. Topical anesthetic (TA) combined with sweet taste and an adjunct intervention (eg, nonnutritive sucking) had the highest probability of being the optimal treatment (mean difference [95% credible interval] versus TA alone = −3.67 [−5.86 to −1.47]; surface under the cumulative ranking curve = 0.86). Secondary outcomes were sparsely reported (2–4 studies, N = 90–248) but supported sweet-tasting solutions with or without adjunct interventions as optimal.
Limitations included moderate heterogeneity in pain assessment reactivity phase and severe heterogeneity in the regulation phase.
Multisensory interventions including sweet taste is likely the optimal treatment for reducing pain resulting from eye examinations in preterm infants. No interventions were effective in absolute terms.
Retinopathy of prematurity (RoP) is a potentially serious disease that arises from the immature vasculature of the preterm retina,1 which, if left untreated, can result in blindness. Current guidelines recommend that infants born <30 weeks’ gestational age receive serial eye examinations (sometimes as often as weekly) until their retina reach maturity.1 This procedure is widely recognized as being painful, with neonates showing both immediate pain behaviors and prolonged physiologic arousal.2 RoP examinations are 1 of many medically indicated painful procedures that preterm neonates endure, with an average exposure of up to 12 procedures per day during hospitalization in the NICU.3 This high pain exposure has been associated with numerous short- and long-term sequelae including altered cortical development and changes in response to later pain.4,5 Thus, determining optimal ways to reduce pain associated with painful medical procedures is of utmost importance, with the aim to ensure optimal outcomes for these vulnerable newborns.
Methods to reduce the pain associated with RoP eye examination include pharmacological, nonpharmacological, and procedural modification interventions.2 The plurality of approaches makes a direct comparison of all interventions unfeasible without a large multicenter trial. As a result, despite the topic being the subject of at least 3 recent reviews,2,6,7 it has not been possible to provide a statistically derived estimate of the most effective treatment. Our purpose with this systematic review is to combine all existing randomized trials of pain-relieving interventions for RoP examinations using network meta-analysis (NMA) to allow for comparison of direct and indirect evidence.
We conducted a systematic review with Bayesian NMA. A prespecified protocol was followed (PROSPERO 2017: CRD42017058231) (Supplemental Information).
Search Strategy and Selection Criteria
A database search was conducted in July 2017. The search strategy was developed in partnership with a library professional and included searches of the Cochrane Library Central Registry of Controlled Trials (1966 to present), Medline (1946 to present), Embase (1974 to present), and Web of Science (1900 to present) (see Supplemental Information for Medline strategy). Eligible trials designs included randomized clinical trials in which at least 2 pain-relieving strategies for RoP eye examinations conducted in preterm neonates were compared. Preterm infants were defined as those delivered <37 weeks’ gestational age.
Study Selection and Data Extraction
Parallel-group and crossover designs were included. Eligible interventions included those that were intended to provide pain relief and could include pharmacological (eg, sucrose), nonpharmacological (eg, nonnutritive sucking [NNS]), combined interventions, or procedural modifications.
Abstract and title screen, full-text screening, and data extraction were conducted independently by 2 reviewers using Covidence.8 All conflicts were resolved by reviewers and, if necessary, consultation with a third reviewer. Data were extracted by using standardized forms.
The primary outcome is pain as measured by validated pain assessment tools during the first time point measured during the procedure. All tools were converted to a common scale (the premature infant pain profile [PIPP]).9,10 The PIPP was selected because it is the most frequently used tool to measure pain related to RoP eye examination. Following the approach outlined in Pillai Riddell et al’s.11 Cochrane review of nonpharmacologic pain-relieving interventions in neonates, we selected 1 time point measured during the procedure (pain reactivity) and the first time point after completion of the procedure (pain regulation).
Secondary outcomes included pain assessment scales during the regulation phase, physiologic response (eg, heart rate), and adverse events during reactivity and recovery and cry time during the reactivity phase. When multiple adverse events were reported, the most serious were used for meta-analysis.
Quality Assessment: Risk of Bias
Critical appraisal was conducted by using the Cochrane risk of bias tool for randomized controlled trials.12 Two reviewers assessed each study, with conflicts resolved through consultation or, if required, consultation with a third reviewer. We intended to use funnel plots to investigate signs of publication bias, although no comparisons had sufficient studies.12
Relevant clinical and study design characteristics were compared between eligible trials to assess acceptability to synthesis. These included infant postmenstrual age at the time of the procedure, birth weight, use of a speculum and scleral depression during the procedure, and infant positioning (eg, swaddled or contained). Network structure was explored through the use of network diagrams. Pairwise and NMA was conducted by using the gemtc13 package in R.14 When at least 1 comparison contained 3 treatments, a random effect model was used. Models properly account for correlation in multiarm trials, use a single heterogeneity parameter for the entire network, and place vague priors on all parameters.15 Model fit was assessed through comparison of residual deviance to the number of unconstrained data points, and deviance information criteria. Fit for metaregressions were assessed through these characteristics in addition to whether the 95% credible interval (CrI) of the regression coefficient excludes 0.16 All analyses were run on 4 chains with 20 000 iterations a chain including a burn-in period of 5000 runs. Convergence was monitored by using the Brooks-Gelman-Rubin diagnostic, with values <1.05 considered acceptable if consistent with visual inspection of convergence and time series plots.15,17 SEs for crossover trials were adjusted by converting paired t tests to SE.18 When medians were reported, the mean and SD was imputed by using methods outlined by the Agency for Healthcare Research and Quality guidelines for pooling continuous measures.19 Results of continuous outcomes were expressed in mean difference (MD) and accompanied with their 95% CrI. Adverse events were expressed as odds ratios (ORs). The surface under the cumulative ranking curve (SUCRA) was used to express the probability that a treatment is optimal.20 Results of the largest trial were used to estimate the absolute PIPP reactivity score, and this value was used to convert MDs to absolute scores for the top 3 treatments.15 Mean absolute scores were used to calculate the number of infants expected, with scores indicating low, moderate, and severe pain, assuming pain scores are normally distributed.
Heterogeneity was assessed through the SD of the random effect distribution. Assessment of inconsistency within the network (eg, agreement between direct and indirect evidence) was conducted through the use of a node-splitting model.21 Meta-regressions were conducted if potential effect modifiers (eg, postmenstrual age at the time of the procedure or the risk of bias) revealed evidence of variability between studies in addition to revealing evidence of interaction with treatment effect. Sensitivity analyses were conducted to test key assumptions related to synthesis feasibility and included use of imputed mean, exclusion of published posters, and exclusion of studies that appear to contribute to inconsistency.
The database search returned 831 citations after the removal of duplicates, of which 29 studies met all inclusion criteria (N = 1487) (Supplemental Fig 5).
Twenty-three studies were parallel randomized controlled trials,22,–43 with 6 studies44,–49 randomized crossover trials. Based on consultation with clinicians, interventions were grouped on the basis of the hypothesized underlying mechanism of action (Supplemental Table 1). Interventions that combined strategies targeted at multiple sensory systems (eg, sweet taste in addition to NNS, sweet taste in addition to familiar odor) were categorized as multisensory. Studies were similar in infant and procedure characteristics (eg, use of speculum) (Supplemental Table 2).
Risk of Bias Within Studies
Studies in which interventions that were easily blinded were assessed (eg, sweet taste, oral acetaminophen) were considered to be an overall low risk of bias (Supplemental Fig 6). Details of sequence generation and allocation concealment were unclear in most studies.
We identified several trial registries indicating trials that are or should realistically be complete without an identifiable publication of results in abstract or manuscript form.50,–55 One of these was a trial in which the efficacy of acetaminophen was assessed, which was stopped early because the intervention showed no effect.52 None of the authors responded to e-mails.
In 20 studies (n = 1228), the authors reported results of a validated pain assessment scale during the pain reactivity phase (Fig 1). Two studies were excluded from primary analysis (Supplemental Table 3). Signs of inconsistency were detected in the NNS node, which appeared to arise from a single trial22 that was considered to have a high risk of bias (Supplemental Table 4). With this trial excluded, signs of inconsistency were resolved and the model fit was improved; thus, remaining regressions and sensitivity analyses were conducted with this trial excluded (Supplemental Table 5, Supplemental Figs 6 and 7). The removal of studies with imputed means resulted in the best model fit, although treatment rankings were similar across all sensitivity analyses (Supplemental Figs 8 and 9). Relative (Fig 2) and absolute (Fig 3) scores based on the best-fitting model suggest small differences between the top treatments (probability of at least a 2 point difference = 12.8%), with no interventions lowering mean absolute scores to ranges on the PIPP associated with low or no pain (probability that absolute score is <6 <1%).
Of the included secondary outcomes, only the analysis of pain assessment scales during the regulation phase had sufficient multitrial comparisons to allow for a random effect model to be fit. When studies for remaining outcomes were combined, fixed effect models were used.
Pain Assessment Scales During Regulation Phase
Twelve studies in which 11 interventions (n = 693) were assessed were included in the analysis. Metaregression and sensitivity analyses were unsuccessful in revealing robust associations between treatment effects and study characteristics (Supplemental Fig 10). In the best fitting model, topical anesthetic (TA) in addition to expressed breast milk (EBM) multisensory revealed a statistically significant improvement over TA alone (−5.54; 95% CrI: −10.18 to −0.95), but remaining comparisons had wide CrIs including 0 (Supplemental Fig 11). The direction and magnitude of effect did not meaningfully change for any model that was a good fit for the data. Combined treatments had a higher probability of being optimal on the basis of SUCRA (Fig 4). Node-splitting models did not reveal inconsistency, and a manual review of plots did not reveal systematic disagreements between direct and indirect evidence (Supplemental Fig 12).
In 5 studies (n = 381), researchers reported heart rate during the reactivity phase, but 3 studies were excluded and the remaining studies did not form a connected network (Supplemental Table 3). Xin et al43 reported that sweet taste combined with TA was superior to TA alone (MD = −23.7 bpm; P < .01), and Şener Taplak and Erdem39 found no statistically significant difference between NNS, sweet taste and NNS, or EBM multisensory with TA, although mean results favored NNS alone. In 3 studies (N = 173), researchers reported heart rate during the regulation phase, but 148 study was excluded from meta-analysis (Supplemental Table 3) for missing variance information. Relative effects are wide, 95% CrIs include 0 (Supplemental Fig 13), and sweet taste in addition to TA has the highest SUCRA ranking (Fig 4). There were no closed loops for assessment of inconsistency.
In 9 studies (N = 595), researchers reported oxygen saturation in the reactivity phase, but 6 studies were excluded from analysis or analyzed with adverse events (Supplemental Table 3). Sweet taste combined with TA ranked highest (Fig 4) and revealed evidence of a moderate improvement compared with sweet taste multisensory with TA (−1.71; 95% CrI: −3.03 to −0.38; Supplemental Fig 14). In 5 studies (N = 243), researchers reported oxygen saturation in the recovery phase, but 4 studies were excluded or analyzed with adverse events (Supplemental Table 3). Results from the remaining study46 revealed identical mean oxygen saturation for infants treated with TA alone compared with TA and sweet taste. There were no closed loops for assessment of inconsistency.
In 8 studies (N = 421), researchers assessed crying time as an outcome. Four studies31,34,35,48 were excluded from synthesis (Supplemental Table 3). Strube et al35 found that feeding infants 1 hour before their eye examination reduced cry time compared with feeding 2 hours before (MD = 19%; P = .016). Mehta et al48 found no infants cried during the procedure when a speculum was not used, compared with 2 infants who cried when a speculum was used and 1 infant who cried when wide-field digital retina imaging was used but no statistical tests were conducted. Sweet taste multisensory combined with TA was ranked as the best treatment (Fig 4), although CrIs for the MD were large, with only the comparison against NNS with TA reaching statistical significance (MD = −29.57 seconds; −37.92 to −21.14) (Supplemental Fig 15).
In 4 studies (N = 268), researchers reported 1 or more adverse events during the reactivity phase (Supplemental Tables 6 through 9). Two studies25,32 could not be included in meta-analysis because treatments did not connect to the network. Dilli et al25 found no difference in the rate of bradycardia between NNS with TA and sweet multisensory with a TA (Supplemental Table 3). O’Sullivan et al32 found a nonstatistically significant difference favoring sweet taste multisensory in the same comparison. In the NMA results, sweet taste with TA had the highest SUCRA (Fig 4), but no differences reached statistical significance despite large point estimates (eg, sweet taste and TA versus TA alone; OR = 0.32; 95% CrI: 0.06 to 1.41) (Supplemental Fig 16). Longer-term adverse events were assessed in 3 studies28,45,47 (N = 130), with 1 excluded28 in the final analysis because the authors did not report outcomes in a way that could be synthesized (Supplemental Table 3). SUCRA ranking favored no treatment (Fig 4), although relative differences between top treatments were small (Supplemental Fig 17), with wide CrIs.
Although most comparisons failed to reach statistical significance, Bayesian results support the hypothesis that engaging more sensory systems likely results in improved pain relief. Rankings were generally robust to sensitivity analyses, which provides some degree of confidence in their findings, although all results should be interpreted with caution.
Results must be interpreted within the limitations related to moderate heterogeneity in the pain assessment reactivity phase and severe heterogeneity in the regulation phase. Our investigation of potential sources of heterogeneity was limited by incomplete procedure reporting. Although these factors may explain some additional heterogeneity, it should be noted that this appears to be a consistent problem faced in a meta-analysis of pain-relieving interventions in neonates. The authors of 2 recently updated Cochrane systematic reviews assessed nonpharmacologic11 and skin-to-skin contact56 as interventions for reducing pain associated with commonly performed painful procedures in preterm and term neonates. In both cases, moderate-to-high heterogeneity was a commonly cited reason for downgrading the level of evidence from combined analysis. In no reviews has there been success in identifying explanations for this heterogeneity, and thus it is unclear whether it is the result of methodological or clinical heterogeneity.
Absolute scores suggest that no pain treatment is effective in absolute terms (ie, 62% of trial arms had mean scores >12). These scores are placed in comparison with the same interventions used to reduce pain from vaccination, heel lance, or venepuncture when scores in intervention groups are routinely lying within 4 and 6 points on the same scale.11,56,–58 Future researchers should thus consider whether new trials comparing these interventions are of value when compared with identifying procedure modifications or new treatments that result in more reasonable absolute scores. Of trials included in this review, the lowest absolute scores were observed in a unit that does not use a speculum,31 and the only study in which the authors assessed the effect of speculum on pain found evidence to support avoiding its use.48 With this review, we are not the first to suggest that excluding the routine use of eye speculum should be considered,48,59 and it would appear that the lack of uptake of earlier suggestions may be the result of fear of missed diagnoses.60 Practice change in this direction will thus likely require larger trials establishing its ability to reduce pain when the most effective pain relieving interventions are implemented in addition to an assessment of the potential consequences for sensitivity and specificity of diagnosis likelihood. Others have suggested that persistently high raw pain scores suggest that stronger analgesics should be investigated (eg, opiates).61 Researchers for 1 ongoing clinical trial will investigate the use of morphine for pain reduction during an eye examination and use the PIPP and EEG to assess pain.61
Despite limitations, there are consistent trends suggesting that the addition of multisensory pain-reducing interventions with TA results in an improved reduction in pain response to eye examinations in preterm infants. Given the less than the optimal efficacy of current treatments, it is imperative that future researchers investigate novel approaches to reduce pain associated with eye examinations in preterm infants.
expressed breast milk
premature infant pain profile
retinopathy of prematurity
surface under the cumulative ranking curve
Mr Disher conceptualized and designed the review, participated in record screening and data extraction, conducted all statistical analysis, and drafted the initial manuscript; Dr Cameron conceptualized and designed the review and oversaw statistical analysis; Dr Mitra contributed to interpretation of results and reviewed; Ms Cathcart participated in record screening and data extraction and reviewed; Dr Campbell-Yeo conceptualized and designed the review, oversaw record screening and data extraction, contributed to interpretation of results, and reviewed; and all authors approved the final manuscript for submission.
FUNDING: No study-specific funding to report. Dr Disher is supported by a Canadian Graduate Scholarships Vanier Scholarship, a Killam predoctoral scholarship, an Nova Scotia Health Research Foundation Scotia Scholar award, a Nova Scotia Graduate Scholarship, and the Electa MacLennan memorial scholarship. Dr Campbell-Yeo is supported by a Canadian Institutes of Health Research New Investigator Award.
We thank Mr Aaron Situ from Cornerstone Research Group for providing assistance with developing R code for league tables and heat plots and Leah Boulos, Evidence Synthesis Coordinator at the Maritime SPOR SUPPORT Unit, for developing the electronic search strategy for this review.
POTENTIAL CONFLICT OF INTEREST: Dr Cameron is an employee of and shareholder in Cornerstone Research Group Inc. Cornerstone Research Group Inc consults for various pharmaceutical and medical device companies; the other authors have indicated they have no potential conflicts of interest to disclose.
FINANCIAL DISCLOSURE: The authors have indicated they have no financial relationships relevant to this article to disclose.