Respiratory syncytial virus (RSV) causes seasonal outbreaks of respiratory tract infections in children, leading to increased emergency department visits and hospitalizations. Although the risk of severe illnesses difficult to predict, the sudden surge in RSV may strain the health care system. Therefore, the objective of this study was to examine the utility of Google Trends search activity on RSV to predict changes in RSV-related hospitalizations in children in the United States in 2019.
A retrospective cross-sectional analysis of pediatric hospitalization was conducted using the 2019 HCUP-Kids Inpatient Database. Google Trends search activity for “RSV” was abstracted as a monthly relative interest score for 2019. RSV-related hospitalizations were identified using International Classification of Diseases 9/10 codes. We applied finite distributed lag models to estimate the causal effect over time of historical relative search activity and long-run propensity to calculate the cumulative effect of changes in relative search activity on admission rate.
Of the total 102 127 RSV-related pediatric hospitalizations, 90% were in those aged ≤2 years. Admissions were common in males (55%), non-Hispanic Whites (50%), and South region (39%). Across 2o successive months, the cumulative effect of a 1-unit score increase in relative interest was associated with an increase of 140.7 (95% confidence interval, 96.2–185.2; P < .05) RSV-related admissions.
Historic Google Trends search activity for RSV predicts lead-time RSV-related pediatric hospitalization. Further studies are needed to validate these findings using regional health systems.
Respiratory syncytial virus (RSV) is a major respiratory pathogen that causes yearly outbreaks of respiratory tract infections in children. In the United States, RSV infections lead to 2.1 million nonhospitalization or ambulatory visits, at least 58 000 hospitalizations, and 100 to 300 deaths in children younger than age 5 years each year.1 Infections and hospitalizations related to RSV tend to surge over weeks, which may eventually strain US children’s hospitals.2
Prior to the COVID-19 pandemic, RSV showed seasonal variation in the United States with onset occurring mid-October with a peak in February and lasting until early May.3 Even within the United States, the onset, peak, and offset of RSV season varied considerably within the US Department of Health and Human Services regions.4 The Centers for Disease Control and Prevention has several research and surveillance platforms available to study seasonal trends and risk factors for RSV-associated illnesses.1 However, these platforms may have certain limitations: (1) they rely on data from participating laboratories; (2) data may not be in real time because of reporting delays5 ; and (3) data may not be generalizable as it is limited to only 12 states.6
Google Trends is a novel, freely accessible tool that allows users to interact with internet search data and provides deep insights into population behavior and health-related phenomena.7 It has been used in predicting influenza, HIV, and dengue as well as COVID-19.8 A recent study by Crowson et al showed that Google search activity can predict RSV encounters in a major health system with a measurable lead time.9 However, the study lacked generalizability. Therefore, the objective of this study is to examine the utility of Google Trends search activity on RSV to predict changes in RSV-related hospitalizations in children in the United States in 2019.
Study Design and Data Source
We conducted a retrospective cross-sectional analysis of the 2019 Healthcare Cost and Utilization Project’s Kids Inpatient Database (HCUP-KID).10 KID is the largest publicly available all-payer pediatric inpatient care database in the United States, yielding national estimates of hospital inpatient stays by children. KID-2019 includes 10% of normal newborns and 80% of other pediatric discharges from 4200 US community hospitals.11 Discharge sampling weights provided by HCUP were used to generate national estimates. Weighted estimates are reported for this study. Monthly relative interest scores for “RSV” Google Trends search activity data were abstracted from January 1 through December 31, 2019, for each state by each week.12
We first extracted all pediatric hospitalizations (aged <21 years) from the database. To prevent double counting, transfers were excluded. We further identified RSV hospitalizations using International Classification of Diseases, Tenth Revision, Clinical Modification, codes J121 (RSV pneumonia), B974 (RSV), J205 (bronchitis due to RSV), and J210 (bronchiolitis due to RSV) in the primary and secondary diagnosis fields.13
Definition of Variables
Patient-level characteristics such as age, sex, race/ethnicity, and hospital-level characteristics such as hospital region (Northeast, Midwest, South, and West) were studied.14
We first studied the baseline distribution of RSV hospitalizations within the 2019 KID and applied the finite distributed lag model controlled for race, sex, and hospital regions to estimate the causal effect over time of how historical relative search activity might influence the current RSV-related admissions. A 2-lag period (2 months) model was selected based on the Akaike Information Criteria. We also calculated the long run propensity, which is reported as the cumulative effect of changes in relative search activity on admission rate. The change in the number of admissions with differences in relative interest is reported as marginal differences in RSV-related admissions. We additionally performed subgroup analysis by geographic region to predict marginal differences in RSV-related admissions within each region based on relative search activity for that region. We performed sensitivity analysis by restricting the sample to pediatric hospitalizations of those aged ≤2 years and found a similar prediction pattern (result not shown). Statistical analyses were performed by using Stata 17. A 2-sided P value <.05 was considered significant.
A total of 5 745 793 pediatric hospitalizations were identified after excluding transfers, of which 102 127 were attributable to RSV. After excluding missing observations on key variables, the final analytic sample included 98 624 RSV-related hospitalizations. The majority (90%) of the hospitalizations were aged ≤2 years. The majority of the hospitalizations were male (55.3%), non-Hispanic White (49.9%), and hospitalized in the South region (39.1%).
The average monthly RSV-related hospitalizations were 3044 (standard deviation, 3220.7) for 2019. The average relative interest score (scaled from 0 to 100 U by Google Trends) over the same period as the patient encounters was 24.9 (standard deviation, 23.1). Across 2 successive months, the cumulative effect of a 1-unit score increase in relative interest was associated with an increase of 140.7 (95% confidence interval [CI], 96.2–185.2; P < .05) RSV-related admissions in 2019. With subgroup analysis by geographic region, over the course of 2 consecutive months, the cumulative effect of a 1-unit score increase in relative search interest was associated with nonsignificant increase in 236.8 (95% CI, –95.6 to 569.3; P = .108) RSV-related admission in Northeast region. Whereas a 1-unit score was associated with significant increase in the Midwest with 177.9 (95% CI, 4.8–351.3; P < .05), in the South with 192.3 (95% CI, 139.0–245.5; P < .01), and in the West region with 250.6 (95% CI, 41.2–459.9; P < .05) RSV-related admissions in 2019 (Supplemental Fig 2). Figure 1 shows the RSV-related admission rate differences with lagged relative search over 2 months. A marginal difference of admissions declines of 1885.6 was predicted in March by the average difference of relative search interest score that trended in decline over 2 consecutive months (lagging interval) from January to March. Similarly, the average difference of inclining relative interest score from September to November predicted an increased marginal difference of 3760.2 RSV-related admissions in November. By geographic regions, similar patterns were observed with some variation in the West region, where relative search interest with 2 months lagged period predicted increased RSV-related admission during May by 545 and June month by 645 admissions.
Our study is the first to report lead-time prediction of RSV-related pediatric hospitalizations using historic Google Trends search activity using the largest nationally representative pediatric inpatient database in the United States. The majority of RSV hospitalizations were attributable to children aged ≤2 years. Using Google Trends search activity, we found that every 1-unit increase in relative search interest across 2 successive months was associated with a corresponding increase of 140 RSV-related hospitalizations in the United States.
Our modeling approach provides preliminary evidence of a greater correlation between RSV-related relative search interest and subsequently an increase in RSV hospitalizations. Tse et al and Crowson et al found similar results regarding Google Trends search data’s ability to predict RSV hospitalization at a regional health system.9,15 Possible reasons for these findings include: (1) RSV is most common among children and older adults. Caregivers may be searching for the term RSV on learning about the condition from their health care provider. (2) Increase in RSV searches could be attributable to the spread of RSV in the community, with subsequent RSV-related hospitalization.
Hospital systems are facing significant congestion with the current surge in RSV admissions, causing delays in care that may impact clinical outcomes. Our findings may be important to frontline clinicians, administrators, and policymakers to overcome the significant burden of RSV on the health care system. First, incorporating real-time Google Trends search activity with regional health system-level or state-level surveillance data may help predict the RSV surge allowing for emergency preparedness for these children. This may help in preventing morbidity and mortality among children.16 Second, our modeling approach has the potential to inform timings for immunoprophylaxis against RSV. With recent advancements in RSV vaccine development,17 robust pieces of evidence from a forecasting model may inform clinicians and policymakers on timings and insurers to expedite the approvals of these immunoprophylaxis.
Since the COVID-19 pandemic, the use of nonpharmacological measures may have contributed to modifying subsequent RSV seasons.18 With many variants and subvariants of COVID-19 in circulation, it is difficult to predict how viral interference alters the dynamics of RSV infections. Given the uncertainties, a validated prediction tool for RSV-related activity using public search queries may help overcome the limitations of surveillance networks that heavily rely on laboratory-based diagnoses and surveillance.
Our study has a few limitations as well. KID is retrospective cross-section data created using discharge abstracts by hospitals for billing, which is subject to administrative errors. Furthermore, important clinical and laboratory data that may help to ascertain cases is not available. Because our study was done using 2019 KID, it may be limited on its generalizability over time. Prediction models heavily rely on the quality of historic Google Trends data with an assumption that search queries and terms on Google accurately represent the community’s interest in RSV and capture the search intention of RSV for each individual. Further studies are needed to validate prospectively the capabilities of Google Trends search activity to predict RSV hospitalization in the regional healthcare system.
In conclusion, historic Google Trends search activity for RSV predicts lead-time RSV-related pediatric hospitalization using a nationally representative children’s specific database. These findings have major implications for frontline clinicians, administrators, and policymakers. Further studies are required to validate these findings using regional health systems data that may help in reducing morbidity and mortality among children.
The authors acknowledge the Healthcare Cost and Utilization Project (HCUP) sponsored by the Agency for Healthcare Research and Quality, Rockville, Maryland, and its partner organizations that provide data to the HCUP.
Drs Pemmasani, Shaikh, and Boateng conceptualized and designed the study, drafted the initial manuscript, and reviewed and revised the manuscript; Drs Parmar, Bhatt, and Parekh conceptualized and designed the study, designed the data collection instruments, collected data, carried out the initial analyses, and reviewed and revised the manuscript; Drs Doshi, Donda, and Dapaah-Siakwan conceptualized and designed the study, coordinated and supervised data collection, and critically reviewed the manuscript for important intellectual content; and all authors approved the final manuscript as submitted and agree to be accountable for all aspects of the work.
FUNDING: No external funding.
CONFLICT OF INTEREST DISCLOSURES: The authors have indicated they have no conflicts to disclose.