Children born into the same family, patients under the care of the same physician, medical students trained in the same residency program…how does one analyze data that violate the sine qua non of elementary statistical analysis: that all data points be independent?
In this issue of Pediatrics, Flannery et al addressed this statistical issue using generalized estimating equations (GEE) in a study of early onset sepsis among more than 80,000 very low birthweight neonates cared for at 753 neonatal intensive care units (NICUs) over a two-year period (10.1542/peds.2021-052456). Even allowing that all of these newborn infants were the singleton offspring of unique parents, there is a need to address the correlation induced by the clustering of infants within hospitals because of hospital level factors that could be associated with the main outcome: survival to hospital discharge. Variation in the prevalence of common pathogens (both community- and hospital-acquired), status as a referral hospital for complex cases, and community differences in the extent of prenatal care prior to birth admission are all factors that may induce correlations in survival among NICU patients.
Even though data on these underlying commonalities were not collected, the GEE approach to regression models – including logistic and linear regression models – handles the possibility of correlated outcomes. Formulated in 1986 by Liang and Zeger,1 by 2021, GEE has been implemented in many common statistical software packages including SAS, R, Stata, and SPSS. Especially in large data sets, an explicit model of the correlation need not be specified. So if your study can more naturally acquire data via a clustered study design, make the most of this cost-saving recruitment strategy by engaging a biostatistician who can properly analyze your study using GEE.
References:
- Liang K-Y, Zeger S. Longitudinal data analysis using generalized linear models. Biometrika. 1986; 73 (1): 13–22. doi:10.1093/biomet/73.1.13
Related Links: