Summary measures are used to quantify a hospital’s quality of care by combining multiple metrics into a single score. We used Baby-MONITOR, a summary quality measure for NICUs, to evaluate quality by race and ethnicity across and within NICUs in the United States.
Vermont Oxford Network members contributed data from 2015 to 2019 on infants from 25 to 29 weeks’ gestation or of 401 to 1500 g birth weight who were inborn or transferred to the reporting hospital within 28 days of birth. Nine Baby-MONITOR measures were individually risk adjusted, standardized, equally weighted, and averaged to derive scores for African American, Hispanic, Asian American, and American Indian infants, compared with white infants.
This prospective cohort included 169 400 infants at 737 hospitals. Across NICUs, Hispanic and Asian American infants had higher Baby-MONITOR summary scores, compared with those of white infants. African American and American Indian infants scored lower on process measures, and all 4 minority groups scored higher on outcome measures. Within NICUs, the mean summary scores for African American, Hispanic, and Asian American NICU subsets were higher, compared with those of white infants in the same NICU. American Indian summary NICU scores were not different, on average.
With Baby-MONITOR, we identified differences in NICU quality by race and ethnicity. However, the summary score masked within-measure quality gaps that raise unanswered questions about the relationships between race and ethnicity and processes and outcomes of care.
With Baby-MONITOR, a summary quality measure for NICUs, we identified differences in process and outcome scores by race and ethnicity in California.
Baby-MONITOR scores and individual measures across and within NICUs in the United States revealed that each minority racial and ethnic group scored lower on process measures and higher on outcome measures, compared with white infants.
Summary measures are used to quantify hospitals’ quality of care by evaluating process and outcome data as a single score that is often risk adjusted and standardized across measures.1 Combining many items into a single score allows for easier comparison between units. However, 1 number may hide subgroup differences, such as by race and ethnicity.
Baby-MONITOR (Measure Of Neonatal InTensive care Outcomes Research) is a summary measure used in NICUs. The measure was developed by using a modified Delphi process in which a panel of experts considered 28 measures, selecting 9 for inclusion: 5 process measures (any human milk at discharge, no admission hypothermia, antenatal steroid exposure, no health care–associated infection, and timely retinal examination) and 4 outcome measures (survival to hospital discharge, no chronic lung disease, no pneumothorax, and greater than median growth velocity).2 Measures are combined and standardized relative to other units in the sample to produce a single score.3
Baby-MONITOR was used to measure variation in the quality of care in California by NICU level of care4 and disparities in care by race and ethnicity.5 Across California NICUs, Baby-MONITOR scores for African American infants and Asian American infants were not significantly different from non-Hispanic white infants, whereas scores for Hispanic infants and other races or ethnicities were significantly lower.5 Compared with white infants, African American and Hispanic infants had lower scores on all process measures and higher scores on most outcome measures.5 In a study, researchers using Baby-MONITOR in the United States found that African American infants were concentrated at lower quality NICUs and Asian American infants were concentrated at higher quality NICUs, even after adjusting for the region in which the hospital was located.6
We used Baby-MONITOR and its components to describe differences in quality of care by race and ethnicity across NICUs, within US Census Divisions and within NICUs in a cohort of very low birth weight and very preterm infants. We extended the Baby-MONITOR methodology to all states, including California, and the American Indian and Alaskan native population.
Methods
Population
Vermont Oxford Network (VON) is a voluntary worldwide community dedicated to improving the quality, safety, and value of perinatal and neonatal care.7 Member NICUs contributed standardized data on all infants who were 22 to 29 weeks’ gestation or had a 401 to 1500 g birth weight who were inborn or transferred to the hospital within 28 days of birth.8 We included 216 425 infants at 793 US hospitals born from 2015 to 2019. We excluded infants born at <25 weeks’ gestation (n = 32 147), infants with serious congenital anomalies (n = 10 458), infants with deaths in the delivery room or within 12 hours of admission to the NICU (n = 1554), infants transferred more than once (n = 1216); infants with implausible values for birth weight (n = 498), and infants at hospitals without data for all 9 measures for at least 1 white infant and 1 infant who was Hispanic, Black or African American, Asian American, or American Indian (n = 1152), leaving 169 400 infants at 737 hospitals. The University of Vermont Institutional Review Board determined that use of the VON repository for this analysis was not human subjects research.
Race and Ethnicity
The VON collected information on race (Black or African American; white; Asian; American Indian or Alaskan native; native Hawaiian or other Pacific Islander; other) and ethnicity (Hispanic; not Hispanic) on the basis of definitions from the 2010 US Census.9,10 Abstractors were instructed to obtain the information by personal interview with the mother, review of the birth certificate, or medical record, in that order.10 The source of the information for individual infants was not recorded. We combined race and ethnicity to yield non-Hispanic white, non-Hispanic Black or African American, non-Hispanic Asian American or Pacific Islander, non-Hispanic American Indian or Alaskan native, and Hispanic (any race), which we refer to as white, African American, Asian American, American Indian, and Hispanic.
Baby-MONITOR Scores
Baby-MONITOR scores were calculated by using infant-level process (any human milk at discharge, no admission hypothermia, antenatal steroid exposure, no health care–associated infection, and timely retinal examination) and outcome (survival to hospital discharge, no chronic lung disease, no pneumothorax, and greater than median growth velocity) measures.3 No health care–associated infection and any human milk at discharge were considered to be markers of the care process or process-intense outcomes. The measures were calculated to attribute events appropriately for infants who were transferred to a NICU from another hospital after birth. For analyses across and within NICUs, each measure was adjusted for region and relevant infant characteristics (gestational age, sex, 5-minute Apgar, whether the mother received prenatal care, and whether the infant was inborn or outborn, small for gestational age, part of a multiple birth, or born by cesarean delivery). For analyses by US Census Division, each measure was adjusted for the relevant infant characteristics and stratified by region. Measures were standardized relative to other units in the data set.11,12 The standardized scores for the 9 measures were equally weighted after placing them on a common scale and averaged to derive the summary Baby-MONITOR score. A higher Baby-MONITOR score indicates a higher quality of care. The methods used in this analysis include some modifications from the original Baby-MONITOR methods developed in 2014. Further details on the Baby-MONITOR methodology used in this analysis are provided in the Supplemental Information.
Analysis
We applied the Baby-MONITOR method to each racial and ethnic group in the population as a whole, by US Census Division,13 and within each NICU individually. For each group, we derived scores for each component of Baby-MONITOR: a combined score for the process measures, a combined score for the outcome measures, and an overall summary Baby-MONITOR score. All scores were measured as a difference from white infants. To receive scores, hospitals needed data on at least 1 infant for each measure. Compatibility intervals14 for summary scores were calculated by aggregating the variance estimates of the individual measure scores, assuming a Student’s t distribution with Welch-Satterthwaite approximated degrees of freedom, and applying the Bonferroni correction for multiple comparisons. Point estimates represent the score differences most compatible with the data. Ninety-nine percent compatibility intervals that do not include zero suggest strong evidence of true differences from the reference group; we use 99% for all intervals in our results to promote the replicability of our findings. We performed a sensitivity analysis including early deaths to evaluate the effects of any selection bias introduced by excluding them. R version 3.4.3 was used for all data analyses.15
Results
Of the 737 hospitals in the study, 58 (8%) had restrictions on assisted ventilation, 286 (39%) had no ventilation restrictions and did not perform neonatal surgery, 283 (38%) had no ventilation restrictions and performed neonatal surgery (except for cardiac surgery requiring a bypass), and 109 (15%) had no ventilation restrictions and performed neonatal surgery, including cardiac surgery requiring a bypass. A total of 65 hospitals (7%) contributed from 1 to 3 years of data, 37 hospitals (5%) contributed 4 years of data, and 645 hospitals (88%) contributed data for all 5 years.
The characteristics of the 169 400 eligible infants (of which 75 427 (45%) were white, 52 139 (31%) were African American, 31 481 (19%) were Hispanic, 8933 (5%) were Asian American, and 1420 (0.8%) were American Indian) are shown in Table 1.
Scores Across NICUs
Across NICUs, compared with white infants, African American infants had a summary score of +0.2 standard units (compatibility interval [CI] −1.1 to +1.5), Hispanic infants had a summary score of +1.5 (CI +0.2 to +2.8), Asian American infants had a summary score of +1.3 (CI 0.0 to +2.6), and American Indian infants had a summary score of −0.8 (CI −2.1 to +0.5; Fig 1A; Table 2). All 4 racial and ethnic groups had lower process scores, compared with that of white infants, although the range of credible values for Hispanic and Asian American infants included both positive and negative scores. All 4 racial and ethnic groups had positive outcome scores, compared with that of white infants.
Differences from white infants by race or ethnic group varied by individual measure (Fig 1B; Table 3). Compared with white infants, African American infants scored lower on discharge on human milk, no hypothermia, antenatal steroids, and timely eye examination and higher on survival to discharge, no chronic lung disease, no pneumothorax, and growth velocity. Hispanic American infants scored lower on discharge on human milk and higher on timely eye examination, survival to discharge, no chronic lung disease, and no pneumothorax. Asian American infants scored lower on no hypothermia and higher on no chronic lung disease, no pneumothorax, and growth velocity. American Indian infants scored lower on discharge on human milk, no hypothermia, and antenatal steroids and higher on no chronic lung disease, no pneumothorax, and growth velocity. For no hospital-acquired infection, the ranges of credible values were wide for all racial and ethnic groups, providing little evidence of differences from white infants.
Scores Within Region
The same process and outcome score patterns observed across and within NICUs applied to the US Census Divisions (Fig 2). Summary scores for those who are multiracial or people of color were substantially higher than those for white infants in the South Atlantic (African American, Hispanic, Asian American, and American Indian), West South Central (Hispanic and Asian American), Mountain (African American and Hispanic), and Pacific (African American, Hispanic, and Asian American) divisions. In the East North Central division, infants of all racial and ethnic groups had lower process scores than white infants and higher outcome scores, which resulted in no differences in summary scores. Subcomponent scores for each region by race are in the Supplemental Information.
Scores Within NICUs
There was wide variation at the NICU level in the distribution of the differences in summary, process, and outcome measure scores by racial and ethnic groups, compared with that of white infants. For every measure and for each of the 4 racial and ethnic groups, there were groups with scores at least 2.5 standard units below white infants and groups with scores at least 2.5 standard units above white infants.
The mean summary scores for African American, Hispanic, and Asian American groups were higher, compared with those of white infants in the same NICU, whereas the American Indian summary NICU score was not different from white infants. On average, differences within NICUs followed the same patterns as those across units (Fig 3A; Table 3). African American and American Indian infants scored lower on average on process measures, and all 4 minority groups scored higher on average on outcome measures, compared with white infants within the same NICU.
Compared with white infants in the same NICU, African American infants had lower mean scores on discharge on human milk, no admission hypothermia, and antenatal steroid exposure and higher mean scores on all 4 outcome measures (Fig 3B; Table 3). Hispanic infants had lower mean scores on discharge on human milk and higher mean scores on survival to hospital discharge, no chronic lung disease, and no pneumothorax. Asian American infants had lower mean scores on discharge on human milk and no hypothermia and higher mean scores on all 4 outcome measures. American Indian infants had lower mean scores on discharge on human milk and antenatal steroids and higher scores on no chronic lung disease and no pneumothorax; American Indian infant scores for no hypothermia, survival to discharge, and growth velocity differed little on average from white infants. Average scores within NICUs on timely retinal examinations and no hospital care–associated infections had compatibility intervals with wide ranges, including plausible positive and negative values, for all racial and ethnic groups, compared with that of white infants.
The results of a sensitivity analysis including early deaths did not differ from the original analyses (Supplemental Tables 4 and 5).
Discussion
Using data from 737 NICUs in the United States from 2015 to 2019, we identified differences in quality of care by race and ethnicity across and within NICUs. When compared with non-Hispanic white infants, African American and American Indian infants had substantially lower process scores on discharge on human milk, no admission hypothermia, and antenatal steroid exposure, and African American, Hispanic, Asian American, and American Indian infants had higher outcome scores on survival to hospital discharge, no chronic lung disease, and no pneumothorax. Many of these population-wide patterns were replicated within US Census Divisions and NICUs.
In a study, researchers using Baby-MONITOR in California found that summary Baby-MONITOR scores were not significantly different for African American infants and Asian American infants, compared with those for non-Hispanic white infants and those who were Hispanic, multiracial, or people of color had significantly lower summary scores.5 In that study, African American infants had lower scores overall on process measures and higher scores on outcome measures, compared with white infants. Hispanic infants had lower scores than white infants on all process measures and higher scores on all outcome measures excluding growth velocity, although some of these differences did not reach the level of statistical significance. In the current study, we found the same pattern of lower process scores and higher outcome scores for African American infants as well as other racial and ethnic groups, compared with white infants across the United States.
An increased risk for preterm birth, receipt of lower quality care, and socioeconomic disadvantages over of the life course are all markers of the structural and systemic racism found in the United States.16,17 In a study, researchers using Baby-MONITOR on VON infants in the US found that African American infants were concentrated at units with lower quality scores, after adjusting for region of the country.6 In New York City, 40% of the difference in morbidity and mortality between white and African American infants and 30% of the difference between white and Hispanic infants was due to white women receiving care in higher quality hospitals.18 In the current study, we found that even within the same hospitals, African American infants received lower average scores on important markers of quality of care, such as the receipt of antenatal steroids and having no hypothermia on admission. We also found that the geographical location of hospitals continues to play a role in determining the quality of care by race and ethnicity. Subcomponent scores by US Census Region may be used to help hospitals and state collaboratives identify quality improvement efforts to improve health disparities.
In research, Boghossian et al,19 using VON data, identified variations in practices and outcomes among white, African American, and Hispanic infants from 2006 to 2017. The use of antenatal steroids increased over time for all groups, and, although the increase was greater for African American and Hispanic than for white mothers, the rate of steroid exposure in 2017 remained the highest for white mothers. African American infants had a faster rate of decline over time for hypothermia after admission than white infants did, whereas Hispanic infants’ rates of mortality and pneumothorax declined faster. Boghossian et al19 evaluated the ways that mortality and morbidities affect different racial groups and appropriately did not adjust for gestational age to elucidate population-level differences. In the current study, we adjust for case mix, including gestational age, to help explain how hospitals perform given the infants under their care.
Why, then, do African American, Hispanic, Asian American, and American Indian infants have higher outcome scores? Proponents of fetus-at-risk analyses avow that studies in which researchers exclude early deaths provide distorted views of preterm mortality and morbidity.20,21 The VON database is limited to live births, and, in the current study, we include only infants surviving at least 12 hours after NICU admission. It may be that miscarriages, still births, and early deaths affect racial and ethnic groups differentially, leading to selection bias. However, Baby-MONITOR is a measure of NICU quality, not population-level preterm mortality and morbidity. In addition, our sensitivity analysis revealed that the early death of live born infants did not change our results.
With this study, we are the first to use Baby-MONITOR to evaluate the quality of care for American Indians. American Indians had a lower average process score and higher average outcome score, compared with that of white infants. The overall summary score was slightly lower, with a compatibility interval ranging from slightly better to substantially worse than white infants. American Indian infants had lower scores on important process measures, such as no hypothermia at admission to the NICU and antenatal steroids, with substantially higher scores on no chronic lung disease, no pneumothorax, survival, and growth velocity. American Indian mothers have the second highest rates of preterm birth behind non-Hispanic African American mothers.22 Addressing the quality of care that Native Americans receive deserves further attention with consideration of the context of the specific perinatal risk factors that American Indian women face.23
Still, not all of those who are American Indian, Asian American, or Hispanic are alike. In research studies, researchers have found important differences in preterm births24,25 and infant mortality26 within racial and ethnic subgroups and by region of origin.27–30 Learning more about the effects of perinatal and NICU quality of care on racial and ethnic subgroups is important and necessary.
A strength of our study is the use of a large national data set, including nearly 90% of very low birth weight and very preterm infants born each year in the United States over a 5-year period. A limitation is the classification of race and ethnicity on the basis of abstractors’ identification of maternal race and ethnicity, which cannot exclude the possibility of misclassification. In addition, our data did not have information on subethnicities, which might reveal additional differences. Selection into Baby-MONITOR may introduce biases if the distribution of exclusion criteria differs by race and ethnicity. Small samples of racial and ethnic subgroups within NICUs may lead to low precision in those scores. Baby-MONITOR scores are calculated relative to the infants, years of data, and units in a sample; therefore, scores may differ from study to study. Finally, other summary scores of NICU quality could lead to different results, such as a score that included other measures or weighted process and outcome measures differently. Such measures would need to be developed.
Conclusions
Measuring quality of care by using a summary score evaluates NICU performance across a variety of process and outcome measures. However, summary scores mask important differences in quality that can be seen by transparently evaluating the score’s components.31 Assessing the measures within the Baby-MONITOR summary highlighted many important quality gaps that raise unanswered questions about the relationships between race and ethnicity and processes and outcomes of care. Ultimately, until neonatal and pediatric clinicians and providers accept that they must practice social as well as technical medicine and follow through to address social determinants of health and act against racism,32 we will not achieve comparable high-quality care for all infants.
Acknowledgments
We thank our colleagues who submit data to the VON on behalf of infants and their families. Participating centers are listed in Supplemental Table 6.
Dr Edwards conceptualized and designed the study, interpreted analyses, drafted the initial manuscript, and reviewed and revised the manuscript; Ms Greenberg conceptualized and designed the study, conducted and interpreted data analyses, and reviewed and revised the manuscript; Dr Profit conceptualized the study, interpreted analyses, and critically reviewed the manuscript for important intellectual content; Mr Helkey and Dr Draper contributed foundational statistical input in the development and application of Baby-MONITOR and critically reviewed the manuscript for important intellectual content; Dr Horbar conceptualized and designed the study, interpreted analyses, and critically reviewed the manuscript for important intellectual content; and all authors approved the final manuscript as submitted and agree to be accountable for all aspects of the work.
FUNDING: Funded by the Eunice Kennedy Shriver National Institute of Child Health and Development (grant 1R01HD083368-01A1 to Drs Profit, Draper, Edwards, Horbar, and Mr Helkey and grant 1R01HD084667-01A1 to Drs Profit and Horbar). Ms Greenberg received no external funding. The funder and sponsor did not participate in the work. Funded by the National Institutes of Health (NIH).
COMPANION PAPER: A companion to this article can be found online at www.pediatrics.org/cgi/doi/10.1542/peds.2021-051298.
References
Competing Interests
POTENTIAL CONFLICT OF INTEREST: Dr Edwards receives salary support from the Vermont Oxford Network. Ms Greenberg is an employee of the Vermont Oxford Network. Dr Horbar is chief executive officer, president, and chief scientific officer of the Vermont Oxford Network and an unpaid member of the Vermont Oxford Network Board of Trustees. The other authors have indicated they have no potential conflicts of interest to disclose.
FINANCIAL DISCLOSURE: The authors have indicated they have no financial relationships relevant to this article to disclose.