To provide recommendations for future common data element (CDE) development and collection that increases community partnership, harmonizes data interpretation, and continues to reduce barriers of mistrust between researchers and underserved communities.
We conducted a cross-sectional qualitative and quantitative evaluation of mandatory CDE collection among Rapid Acceleration of Diagnostics-Underserved Populations Return to School project teams with various priority populations and geographic locations in the United States to: (1) compare racial and ethnic representativeness of participants completing CDE questions relative to participants enrolled in project-level testing initiatives and (2) identify the amount of missing CDE data by CDE domain. Additionally, we conducted analyses stratified by aim-level variables characterizing CDE collection strategies.
There were 15 study aims reported across the 13 participating Return to School projects, of which 7 (47%) were structured so that CDEs were fully uncoupled from the testing initiative, 4 (27%) were fully coupled, and 4 (27%) were partially coupled. In 9 (60%) study aims, participant incentives were provided in the form of monetary compensation. Most project teams modified CDE questions (8/13; 62%) to fit their population. Across all 13 projects, there was minimal variation in the racial and ethnic distribution of CDE survey participants from those who participated in testing; however, fully uncoupling CDE questions from testing increased the proportion of Black and Hispanic individuals participating in both initiatives.
Collaboration with underrepresented populations from the early study design process may improve interest and participation in CDE collection efforts.
Common data elements (CDEs),1 which are sets of standardized questions and answer responses, can potentially serve as powerful research tools. CDEs enable data harmonization across research projects to propel science forward by providing larger sample sizes that allow more robust analyses2 with potential wider impact.3,4 At the start of the COVID-19 pandemic,5 the National Institutes of Health (NIH) established a comprehensive set of adult-focused CDEs to be collected across COVID-19–related projects to permit rapid data collection, synthesis, and reporting.6 NIH-funded COVID-19–related projects such as those funded through the Rapid Acceleration of Diagnostics-Underserved Populations (RADx-UP) initiative were required to collect the CDEs created and agreed on by the NIH and RADx-UP leadership teams. CDE data were then shared with the NIH and NIH-sponsored coordinating centers to facilitate collaboration and develop a multisite database for analysis.
The RADx-UP program was specifically designed to increase SARS-CoV-2 testing access in historically underserved communities that have been disproportionately affected by the pandemic.7 A specific subset of the RADx-UP Program, the Return to School (RTS) Diagnostic Testing Approaches Initiative, focused on the importance of universal school-based access to SARS-CoV-2 testing to slow the spread of COVID-19 and return preschool- and school-aged students to in-person learning.
Evaluation of CDE collection by investigators involved in the RADx-UP RTS Diagnostic Testing Approaches Initiative8 in preschool through grade 12 (preK-12) schools provides an important case study for understanding the challenges of CDE collection within this population. In this manuscript, we summarize CDE collection processes and provide recommendations for future CDE development and collection that increases community partnership, harmonizes data interpretation, and reduces barriers of mistrust between researchers and underserved communities.
Methods
Using a de novo questionnaire with fixed-response and open-ended questions, we conducted a cross-sectional qualitative and quantitative evaluation of mandatory CDE collection among RADx-UP RTS project teams with differing priority populations and from varying geographic locations within the United States. Projects awarded under RADx-UP RTS were also required to collect the 153 CDE questions spanning 13 domains: (1) consent; (2) location; (3) sociodemographic information; (4) housing, employment, and insurance; (5) work personal protective equipment and distancing; (6) medical history; (7) health status; (8) vaccine acceptance; (9) testing (to be referred to as “previous testing experiences”); (10) COVID-19 test (to be referred to as “administered COVID-19 test”); (11) symptoms; (12) alcohol and tobacco; and (13) identity (Table 1 contains an overview of each of the CDE domains and Supplemental Table 4 contains the questions asked within each domain). Of these 13 domains, 1 domain (“consent”) was related to both testing and nontesting activities, 2 domains (“administered COVID-19 test” and “symptoms”) were related to the testing initiative for which each project was funded, whereas the other 10 required domains were not related to current testing administration.
Overview of CDE Domains
Domain . | Information Collected . | Response Type . | Number of Questions . |
---|---|---|---|
Consent | Participant consent for data sharing, extent of data willing to share (e.g., identifiable, nonidentifiable) | Radio | 7 |
Location | County, ZIP code | Text | 2 |
Sociodemographics | Age, race, ethnicity | Checkbox, text, radio | 14 |
Housing, employment, and insurance | Type of housing, household languages, employment status and type, household income, insurance coverage | Radio, text, checkbox, dropdown | 22 |
Work personal protective equipment and distancing | Workplace COVID-19 safety | Radio | 4 |
Medical history | Significant medical history | Radio | 17 |
Health status | Anthropometric data, current functional ability related to activities of daily living | Radio, text | 11 |
Vaccine acceptance | Perceptions around COVID-19 vaccine including reasons for vaccine approval and disapproval | Radio, checkbox | 7 |
Previous testing experiences | COVID-19 testing history: when tested previously, previous positive results, and availability of tests in the community | Radio, dropdown | 13 |
Administered COVID-19 test | Test administration logistics including test manufacturer, specimen type, testing location, test administrator, date and time of test, test result | Checkbox, radio, text | 23 |
Symptoms | Current COVID-19 symptoms | Radio | 13 |
Alcohol and tobacco | Current substance use, frequency of substance use | Radio, text | 6 |
Identity | Name, date of birth, address, contact information | Text, dropdown, radio | 14 |
Domain . | Information Collected . | Response Type . | Number of Questions . |
---|---|---|---|
Consent | Participant consent for data sharing, extent of data willing to share (e.g., identifiable, nonidentifiable) | Radio | 7 |
Location | County, ZIP code | Text | 2 |
Sociodemographics | Age, race, ethnicity | Checkbox, text, radio | 14 |
Housing, employment, and insurance | Type of housing, household languages, employment status and type, household income, insurance coverage | Radio, text, checkbox, dropdown | 22 |
Work personal protective equipment and distancing | Workplace COVID-19 safety | Radio | 4 |
Medical history | Significant medical history | Radio | 17 |
Health status | Anthropometric data, current functional ability related to activities of daily living | Radio, text | 11 |
Vaccine acceptance | Perceptions around COVID-19 vaccine including reasons for vaccine approval and disapproval | Radio, checkbox | 7 |
Previous testing experiences | COVID-19 testing history: when tested previously, previous positive results, and availability of tests in the community | Radio, dropdown | 13 |
Administered COVID-19 test | Test administration logistics including test manufacturer, specimen type, testing location, test administrator, date and time of test, test result | Checkbox, radio, text | 23 |
Symptoms | Current COVID-19 symptoms | Radio | 13 |
Alcohol and tobacco | Current substance use, frequency of substance use | Radio, text | 6 |
Identity | Name, date of birth, address, contact information | Text, dropdown, radio | 14 |
CDE, common data element; COVID-19, coronavirus 2019
Data Collection
We created a questionnaire to gather information on how nontesting CDE collection was conducted by RADx-UP RTS project teams. Information collected included projects’ target demographics, whether project teams applied for and obtained CDE exemptions from the NIH, community member involvement in CDE survey construction (if applicable), and the number of project study aims that included CDE collection. Furthermore, we asked whether CDE collection was fully coupled, fully uncoupled, or partially coupled from the RTS project’s SARS-CoV-2 testing initiative based on whether participants were required to complete CDE questions only at the time of enrollment and accessing SARS-CoV-2 testing (fully coupled), at a time separate from and/or independent from testing enrollment (fully uncoupled), or through a combination of both approaches (partially coupled).
Additionally, we collected data on whether incentives were offered to participants for completion of nontesting CDEs, the proportion of CDE questions completed by projects’ participants, the racial and ethnic distribution of those who completed CDE questions, and the percent of missing data by CDE domain. Participant race and Hispanic ethnicity were evaluated as separate and independent variables. Self-reported race for each participant was categorized as American Indian or Alaska Native, Asian, Black, Native Hawaiian or Other Pacific Islander, White, Other, Multiple, or Prefer Not to Answer. Hispanic ethnicity was categorized as participants who reported identifying as Hispanic, Latino, or Spanish origin (binary: “yes” or “no”). RTS project teams calculated the percent of missing data for each CDE domain as the number of unanswered domain questions across all project participants divided by the total number of domain-related questions asked to all project participants. Missing data reporting by RTS project teams was optional to accommodate project teams that had not yet started analysis and were therefore unable to report percent missingness. Last, we used open-ended questions to collect qualitative information on project-level experiences with CDE collection including challenges, lessons learned, and recommendations for future CDE collection.
Study teams for each of the 16 RADx-UP–funded RTS projects were invited to complete the questionnaire. Responses were recorded using research electronic data capture (REDCap).9,10 No individual participant-level data from research participants within each project were shared across institutions and only aggregate (counts, proportions), deidentified data were collected via REDCap.
This study was approved by the Duke University Health System institutional review board under Pro00108129.
Statistical Analysis
Descriptive statistics were used to analyze quantitative results. Counts, proportions, and trends were used to summarize categorical data, whereas 5-number summaries (median, quartiles, range) were used to summarize discrete, numerical data. To assess the racial and ethnic representativeness of participants completing CDE questions relative to participants enrolled in project-level testing initiatives, we compared the proportion of participants in each group stratified by self-reported race and ethnicity. To assess the distribution of missing data among the CDE domains, we determined the median percentage of missing data for each domain across RTS projects.
Next, we stratified and compared both participant racial/ethnic representativeness and missingness by whether each project’s study aim collecting CDE questions was fully coupled, partially coupled, or fully uncoupled from the project’s testing initiative. Then, to assess the role of monetary compensation on both participant racial/ethnic representativeness and data missingness, we stratified both analyses by whether an incentive was offered for CDE participation. Last, we examined data missingness stratified by whether study aims modified the standard language of CDE questions before providing them to participants (eg, modified language to fit specific populations such as the child participant). Statistical significance testing was deferred because of the small sample size and lack of power.
For qualitative analysis, content analysis was used to describe lessons learned and key challenges, as well as recommendations for future data collection across projects.11 Because of the exploratory nature of the lessons learned, key challenges, and recommendations across projects, thematic categories were not defined a priori, and, instead, salient themes were identified and grouped, resulting in posteriori definitions/recommendations. Representative quotes were highlighted to provide context. Qualitative and quantitative data were analyzed separately.
Results
Project-Level Overview
Thirteen (13) of the 16 (81%) total RTS-awarded project teams participated and provided responses to the study questionnaire. These 13 teams were located in 11 states (Fig 1).12 The CDE collection process varied across projects. All project teams (13/13, 100%) reported use of an electronic platform (REDCap, Qualtrics, or KoBo) for CDE collection and 6 teams (6/13, 46%) offered additional paper-collection options. Eight (8/13, 62%) project teams offered CDE questions in languages other than English (8 in Spanish and 1 in Haitian Creole). Eleven (11/13, 85%) project teams collected self-reported data, 11 (11/13, 85%) collected caregiver-reported, 6 (6/13, 46%) collected teacher-reported, and 1 (1/13, 8%) collected school-reported data. Eleven projects completed reporting for mechanisms used to provide survey reminders: 5 project teams (5/11; 45%) used text messages reminders, 5 (5/11; 45%) used phone call reminders, 7 (7/11; 63%) used e-mail reminders, and 3 (3/11; 27%) used in-person reminders at testing visits. Eight (8/13; 62%) project teams collected nontesting CDEs from participants at 1 timepoint and 5 teams collected CDE data from participants more than once. Among those project teams that collected nontesting CDE data more than once, the number of times participants were surveyed ranged from 2 to 4 times with 2 project teams asking the same questions each time, 2 project teams asking different questions each time, and 1 project team not specifying whether the same or different questions were asked each time. Among the projects that collected CDE at multiple times, 1 project team specified that it collected a “mandatory” initial survey and filtered more sensitive questions to subsequent “optional” survey(s).
Statewide distribution of RTS project involvement. This figure displays the statewide distribution of RTS project involvement, color-coded by the number of projects per state. RTS, return to school (initiative).
Statewide distribution of RTS project involvement. This figure displays the statewide distribution of RTS project involvement, color-coded by the number of projects per state. RTS, return to school (initiative).
There were 15 study aims across the 13 participating RTS projects. Of the 15 aims, 7 (47%) were structured such that CDEs were fully uncoupled from the testing initiative, 4 (27%) were fully coupled with the testing initiative, and 4 (27%) had data collections that were partially coupled with testing. In 9 (60%) study aims, participant incentives were provided in the form of monetary compensation for participation in CDE collection. Compensation amounts ranged from $10 to $50 with differing payment approaches (e.g., 1-time payment versus tiered payment approach). Eight (8, 53%) study aims reported modifying CDE questions before use.
Project teams completed optional “missing data” reporting for 10 study aims (10/15; 66%); of these 10, 2 (20%) aims were fully coupled, 2 (20%) were partially coupled, and 6 (60%) were fully uncoupled. Seven (7, 70%) of the aims reporting missing data provided financial incentives to participants and 5 (50%) modified the CDE questions before use. For specific information on question exemptions and modifications, please see the Supplemental Results.
Racial and Ethnic Distribution
Across all participating projects, there were no notable differences in the racial or ethnic distribution of those participating in the testing initiatives versus CDE collection (Table 2). Overall, White participants comprised 47% of both CDE participants and testing initiative participants; Black participants comprised 23% of CDE participants and 25% of testing participants. The remaining racial categories (American Indian or Alaska Native, Asian, Native Hawaiian or Other Pacific Islander, Other, Multiple, or Prefer Not to Answer) comprised 1% to 3% of CDE and testing participants. Ethnically, Hispanic participants accounted for 11% of both CDE participants and testing initiative participants. See the Supplemental Results for specific information on the role of coupling status and compensation on racial and ethnic distribution.
Racial and Ethnic Representation Across RTS Projects, Stratified by Coupling Status, and Stratified by Compensation
Representation . | Race . | Ethnicity . | ||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
AIAN . | Asian . | Black . | NHPI . | White . | Other . | Multiple . | Prefer Not to Answer . | Hispanicc . | ||||||||||
CDEa (%) . | Testingb (%) . | CDEa (%) . | Testingb (%) . | CDEa (%) . | Testingb (%) . | CDEa (%) . | Testingb (%) . | CDEa (%) . | Testingb (%) . | CDEa (%) . | Testingb (%) . | CDEa (%) . | Testingb (%) . | CDEa (%) . | Testingb (%) . | CDE (%) . | Testing (%) . | |
RTS projects | ||||||||||||||||||
Median (IQR) | 2 (0, 3) | 1 (1, 2) | 2 (2, 4) | 2 (1, 4) | 23 (3, 40) | 25 (10, 40) | 1 (0, 1) | 0 (0, 1) | 47 (27, 64) | 47 (37, 63) | 3 (0, 5) | 1 (0, 3) | 3 (0, 6) | 3 (0, 6) | 2 (2, 5) | 2 (2, 4) | 11 (4, 32) | 11 (4, 38) |
Min, max | 0, 90 | 0, 4 | 0, 10 | 0, 10 | 0, 68 | 1, 68 | 0, 20 | 0, 20 | 3, 86 | 6, 80 | 0, 38 | 0, 41 | 0, 40 | 0, 40 | 0, 23 | 0, 23 | 2, 78 | 2, 79 |
Stratified by coupling status | ||||||||||||||||||
Fully coupled | ||||||||||||||||||
Median (IQR) | 2 (1, 2) | 2 (1, 2) | 5 (3, 9) | 5 (2, 9) | 11 (2, 30) | 14 (2, 33) | 1 (1, 11) | 1 (0, 11) | 43 (23, 62) | 42 (22, 60) | 4 (2, 22) | 4 (2, 23) | 4 (0, 24) | 4 (0, 24) | 1 (0, 13) | 2 (1, 13) | 8 (3, 45) | 8 (3, 45) |
Min, max | 0, 2 | 0, 2 | 2, 10 | 1, 10 | 1, 40 | 1, 40 | 0, 20 | 0, 20 | 6, 77 | 6, 72 | 0, 38 | 0, 41 | 0, 40 | 0, 40 | 0, 23 | 0, 23 | 2, 78 | 2, 79 |
Partially coupled | ||||||||||||||||||
Median (IQR) | 1 (1, 2) | 1 (1, 1) | 2 (2, 3) | 2 (1, 4) | 20 (14, 26) | 28 (10, 50) | 0 (0, 0) | 0 (0, 0) | 64 (56, 72) | 63 (39, 80) | 3 (0, 12) | 0 (0, 1) | 3 (2, 4) | 3 (3, 5) | 3 (2, 7) | 3 (2, 4) | 4 (4, 55) | 4 (4, 61) |
Min, max | 0, 3 | 1, 1 | 1, 4 | 1, 4 | 10, 28 | 10, 50 | 0, 0 | 0, 0 | 49, 80 | 39, 80 | 0, 19 | 0, 1 | 0, 4 | 3, 5 | 1, 10 | 2, 4 | 4, 55 | 4, 61 |
Fully uncoupled | ||||||||||||||||||
Median (IQR) | 2 (0, 4) | 2 (0, 4) | 2 (1, 3) | 1 (0, 2) | 29 (3, 68) | 47 (25, 68) | 1 (0, 2) | 2 (0, 4) | 27 (14, 54) | 41 (27, 54) | 3 (0, 3) | 2 (0, 3) | 2 (1, 6) | 3 (0, 6) | 4 (2, 5) | 4 (2, 5) | 18 (9, 32) | 36 (34, 38) |
Min, max | 0, 90 | 0, 4 | 0, 5 | 0, 2 | 0, 68 | 25, 68 | 0, 4 | 0, 4 | 3, 86 | 27, 54 | 0, 4 | 0, 3 | 0, 12 | 0, 6 | 2, 11 | 2, 5 | 9, 34 | 34, 38 |
Stratified by compensation | ||||||||||||||||||
No incentive | ||||||||||||||||||
Median (IQR) | 2 (0, 2) | 1 (1, 2) | 3 (2, 4) | 2 (1, 4) | 26 (19, 40) | 34 (25, 50) | 0 (0, 1) | 0 (0, 0) | 55 (39, 64) | 43 (37, 63) | 3 (0, 5) | 1 (0, 5) | 2 (0, 4) | 2 (0, 5) | 2 (0, 2) | 2 (2, 3) | 8 (4, 34) | 8 (4, 34) |
Min, max | 0, 2 | 0, 2 | 0, 10 | 0, 10 | 3, 68 | 3, 68 | 0, 1 | 0, 1 | 27, 77 | 27, 72 | 0, 38 | 0, 41 | 0, 7 | 0, 7 | 0, 5 | 0, 5 | 4, 78 | 4, 79 |
Incentive | ||||||||||||||||||
Median (IQR) | 2 (1, 4) | 1 (0, 4) | 2 (2, 3) | 2 (2, 7) | 17 (3, 29) | 10 (1, 25) | 1 (0, 2) | 4 (0, 20) | 45 (14, 54) | 54 (6, 80) | 3 (0, 3) | 3 (0, 3) | 3 (1, 6) | 6 (3, 40) | 4 (2, 10) | 4 (2, 23) | 15 (9, 32) | 38 (2, 61) |
Min, max | 0, 90 | 0, 4 | 1, 7 | 2, 7 | 0, 68 | 1, 25 | 0, 20 | 0, 20 | 3, 86 | 6, 80 | 0, 19 | 0, 3 | 0, 40 | 3, 40 | 2, 23 | 2, 23 | 2, 55 | 2, 61 |
Representation . | Race . | Ethnicity . | ||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
AIAN . | Asian . | Black . | NHPI . | White . | Other . | Multiple . | Prefer Not to Answer . | Hispanicc . | ||||||||||
CDEa (%) . | Testingb (%) . | CDEa (%) . | Testingb (%) . | CDEa (%) . | Testingb (%) . | CDEa (%) . | Testingb (%) . | CDEa (%) . | Testingb (%) . | CDEa (%) . | Testingb (%) . | CDEa (%) . | Testingb (%) . | CDEa (%) . | Testingb (%) . | CDE (%) . | Testing (%) . | |
RTS projects | ||||||||||||||||||
Median (IQR) | 2 (0, 3) | 1 (1, 2) | 2 (2, 4) | 2 (1, 4) | 23 (3, 40) | 25 (10, 40) | 1 (0, 1) | 0 (0, 1) | 47 (27, 64) | 47 (37, 63) | 3 (0, 5) | 1 (0, 3) | 3 (0, 6) | 3 (0, 6) | 2 (2, 5) | 2 (2, 4) | 11 (4, 32) | 11 (4, 38) |
Min, max | 0, 90 | 0, 4 | 0, 10 | 0, 10 | 0, 68 | 1, 68 | 0, 20 | 0, 20 | 3, 86 | 6, 80 | 0, 38 | 0, 41 | 0, 40 | 0, 40 | 0, 23 | 0, 23 | 2, 78 | 2, 79 |
Stratified by coupling status | ||||||||||||||||||
Fully coupled | ||||||||||||||||||
Median (IQR) | 2 (1, 2) | 2 (1, 2) | 5 (3, 9) | 5 (2, 9) | 11 (2, 30) | 14 (2, 33) | 1 (1, 11) | 1 (0, 11) | 43 (23, 62) | 42 (22, 60) | 4 (2, 22) | 4 (2, 23) | 4 (0, 24) | 4 (0, 24) | 1 (0, 13) | 2 (1, 13) | 8 (3, 45) | 8 (3, 45) |
Min, max | 0, 2 | 0, 2 | 2, 10 | 1, 10 | 1, 40 | 1, 40 | 0, 20 | 0, 20 | 6, 77 | 6, 72 | 0, 38 | 0, 41 | 0, 40 | 0, 40 | 0, 23 | 0, 23 | 2, 78 | 2, 79 |
Partially coupled | ||||||||||||||||||
Median (IQR) | 1 (1, 2) | 1 (1, 1) | 2 (2, 3) | 2 (1, 4) | 20 (14, 26) | 28 (10, 50) | 0 (0, 0) | 0 (0, 0) | 64 (56, 72) | 63 (39, 80) | 3 (0, 12) | 0 (0, 1) | 3 (2, 4) | 3 (3, 5) | 3 (2, 7) | 3 (2, 4) | 4 (4, 55) | 4 (4, 61) |
Min, max | 0, 3 | 1, 1 | 1, 4 | 1, 4 | 10, 28 | 10, 50 | 0, 0 | 0, 0 | 49, 80 | 39, 80 | 0, 19 | 0, 1 | 0, 4 | 3, 5 | 1, 10 | 2, 4 | 4, 55 | 4, 61 |
Fully uncoupled | ||||||||||||||||||
Median (IQR) | 2 (0, 4) | 2 (0, 4) | 2 (1, 3) | 1 (0, 2) | 29 (3, 68) | 47 (25, 68) | 1 (0, 2) | 2 (0, 4) | 27 (14, 54) | 41 (27, 54) | 3 (0, 3) | 2 (0, 3) | 2 (1, 6) | 3 (0, 6) | 4 (2, 5) | 4 (2, 5) | 18 (9, 32) | 36 (34, 38) |
Min, max | 0, 90 | 0, 4 | 0, 5 | 0, 2 | 0, 68 | 25, 68 | 0, 4 | 0, 4 | 3, 86 | 27, 54 | 0, 4 | 0, 3 | 0, 12 | 0, 6 | 2, 11 | 2, 5 | 9, 34 | 34, 38 |
Stratified by compensation | ||||||||||||||||||
No incentive | ||||||||||||||||||
Median (IQR) | 2 (0, 2) | 1 (1, 2) | 3 (2, 4) | 2 (1, 4) | 26 (19, 40) | 34 (25, 50) | 0 (0, 1) | 0 (0, 0) | 55 (39, 64) | 43 (37, 63) | 3 (0, 5) | 1 (0, 5) | 2 (0, 4) | 2 (0, 5) | 2 (0, 2) | 2 (2, 3) | 8 (4, 34) | 8 (4, 34) |
Min, max | 0, 2 | 0, 2 | 0, 10 | 0, 10 | 3, 68 | 3, 68 | 0, 1 | 0, 1 | 27, 77 | 27, 72 | 0, 38 | 0, 41 | 0, 7 | 0, 7 | 0, 5 | 0, 5 | 4, 78 | 4, 79 |
Incentive | ||||||||||||||||||
Median (IQR) | 2 (1, 4) | 1 (0, 4) | 2 (2, 3) | 2 (2, 7) | 17 (3, 29) | 10 (1, 25) | 1 (0, 2) | 4 (0, 20) | 45 (14, 54) | 54 (6, 80) | 3 (0, 3) | 3 (0, 3) | 3 (1, 6) | 6 (3, 40) | 4 (2, 10) | 4 (2, 23) | 15 (9, 32) | 38 (2, 61) |
Min, max | 0, 90 | 0, 4 | 1, 7 | 2, 7 | 0, 68 | 1, 25 | 0, 20 | 0, 20 | 3, 86 | 6, 80 | 0, 19 | 0, 3 | 0, 40 | 3, 40 | 2, 23 | 2, 23 | 2, 55 | 2, 61 |
AIAN, American Indian or Alaska Native; CDE, common data element; IQR, interquartile range; NHPI, Native Hawaiian or Other Pacific Islander; RADx-UP, Rapid Acceleration of Diagnostics-Underserved Populations.
Participants completing nontesting CDE questions.
Participants in the RADx-UP COVID-19 testing initiative.
Data provided for 13 of 15 study aims.
Missing Data
Of those who reported missing data within study aims, the median percent of missing data ranged from 4% to 21% across nontesting CDE domains (Table 3). CDE domains with the greatest median percentages of missing data were: location (21%; interquartile range [IQR], 3% to 40%); identity (18%; IQR, 3% to 80%); housing, employment, and insurance (15%; IQR, 3% to 79%); and health status (15%; IQR, 4% to 49%). Domains with the least missing data were alcohol and tobacco (4%; IQR, 3% to 10%); previous testing experiences (7%; IQR, 2% to 55%); medical history (8%; IQR, 4% to 31%); and work personal protective equipment and distancing (8%; IQR, 1% to 51%). See the Supplemental Results for specific information on the role of coupling status, compensation, and question modification on missing data.
Data Missingness by CDE Domain Across (A) RTS Projects; (B) Stratified by Coupling Status; (C) Stratified by Compensation; and (D) Modification of Questions for Child Participants
. | (A) . | (B) . | (C) . | (D) . | ||||
---|---|---|---|---|---|---|---|---|
Domain . | Proportion of Aims Reporting Missingnessa . | Missing Data Median (Q1, Q3) (%) . | Proportion of Aims Reporting Missingnessa by Coupling Status . | Missing Data Median (Q1, Q3) (%) . | Proportion of Aims Reporting Missingnessa by Incentive Status . | Missing Data Median (Q1, Q3) (%) . | Proportion of Aims Reporting Missingnessa by Question Modificationb . | Missing Data Median (Q1, Q3) (%) . |
Location | 9/13 | 21 (3, 40) | Fully coupled: 2/3 | 70 (40, 100) | Incentive: 7/9 | 8 (2, 26) | Modification: 5/8 | 21 (2, 26) |
Partially coupled: 2/4 | 48 (3, 92) | No incentive: 2/4 | 96 (92, 100) | No modification: 4/5 | 24 (6, 66) | |||
Fully uncoupled: 5/6 | 8 (2, 21) | |||||||
Sociodemographics | 10/15 | 9 (3, 49) | Fully coupled: 2/4 | 74 (49, 99) | Incentive: 7/9 | 4 (0, 12) | Modification: 5/8 | 5 (4, 12) |
Partially coupled: 2/4 | 47 (0, 93) | No incentive: 3/6 | 93 (33, 99) | No modification: 5/7 | 33 (3, 49) | |||
Fully uncoupled: 6/7 | 5 (3, 12) | |||||||
Housing, employment, and insurance | 9/13 | 15 (3, 79) | Fully coupled: 2/3 | 88 (79, 97) | Incentive: 7/9 | 5 (1, 22) | Modification: 5/8 | 15 (5, 22) |
Partially coupled: 2/4 | 48 (3, 93) | No incentive: 2/4 | 95 (93, 97) | No modification: 4/5 | 41 (2, 86) | |||
Fully uncoupled: 5/6 | 5 (1, 15) | |||||||
Work PPE and distancing | 8/11 | 8 (1, 51) | Fully coupled: 1/2 | 80 (80, 80) | Incentive: 7/8 | 3 (0, 22) | Modification: 4/6 | 1 (0, 12) |
Partially coupled: 2/4 | 45 (3, 86) | No incentive: 1/3 | 86 (86, 86) | No modification: 4/5 | 46 (8, 83) | |||
Fully uncoupled: 5/5 | 2 (0, 12) | |||||||
Medical history | 8/12 | 8 (4, 31) | Fully coupled: 2/3 | 70 (40, 100) | Incentive: 7/9 | 7 (3, 22) | Modification: 5/8 | 8 (7, 22) |
Partially coupled: 1/3 | 3 (3, 3) | No incentive: 1/3 | 100 (100, 100) | No modification: 3/4 | 3 (2, 40) | |||
Fully uncoupled: 5/6 | 7 (5, 8) | |||||||
Health status | 8/12 | 15 (4, 49) | Fully coupled: 1/2 | 76 (76, 76) | Incentive: 7/9 | 10 (2, 22) | Modification: 4/7 | 11 (2, 21) |
Partially coupled: 2/4 | 52 (5, 98) | No incentive: 1/3 | 98 (98, 98) | No modification: 4/5 | 43 (8, 87) | |||
Fully uncoupled: 5/6 | 10 (2, 20) | |||||||
Vaccine acceptance | 9/13 | 12 (7, 24) | Fully coupled: 2/3 | 76 (62, 90) | Incentive: 7/9 | 9 (2, 24) | Modification: 5/8 | 22 (2, 24) |
Partially coupled: 2/4 | 10 (7, 12) | No incentive: 2/4 | 51 (12, 90) | No modification: 4/5 | 11 (8, 37) | |||
Fully uncoupled: 5/6 | 9 (2, 22) | |||||||
Previous testing experiences | 8/11 | 7 (2, 55) | Fully coupled: 2/3 | 50 (0, 99) | Incentive: 6/7 | 4 (2, 7) | Modification: 4/6 | 4 (2, 53) |
Partially coupled: 2/4 | 55 (11, 98) | No incentive: 2/4 | 99 (98, 99) | No modification: 4/5 | 9 (4, 55) | |||
Fully uncoupled: 4/4 | 4 (2, 7) | |||||||
Alcohol and tobacco | 5/9 | 4 (3, 10) | Fully coupled: 1/2 | 85 (85, 85) | Incentive: 5/6 | 4 (3, 10) | Modification: 3/5 | 4 (2, 10) |
Partially coupled: 1/4 | 3 (3, 3) | No incentive: 0/3 | – | No modification: 2/4 | 44 (3, 85) | |||
Fully uncoupled: 3/3 | 4 (2, 10) | |||||||
Identity | 8/12 | 18 (3, 80) | Fully coupled: 1/2 | 99 (99, 99) | Incentive: 6/8 | 8 (2, 22) | Modification: 5/8 | 22 (13, 60) |
Partially coupled: 2/4 | 50 (0, 99) | No incentive: 2/4 | 99 (99, 99) | No modification: 3/4 | 3 (0, 99) | |||
Fully uncoupled: 5/6 | 13 (3, 22) |
. | (A) . | (B) . | (C) . | (D) . | ||||
---|---|---|---|---|---|---|---|---|
Domain . | Proportion of Aims Reporting Missingnessa . | Missing Data Median (Q1, Q3) (%) . | Proportion of Aims Reporting Missingnessa by Coupling Status . | Missing Data Median (Q1, Q3) (%) . | Proportion of Aims Reporting Missingnessa by Incentive Status . | Missing Data Median (Q1, Q3) (%) . | Proportion of Aims Reporting Missingnessa by Question Modificationb . | Missing Data Median (Q1, Q3) (%) . |
Location | 9/13 | 21 (3, 40) | Fully coupled: 2/3 | 70 (40, 100) | Incentive: 7/9 | 8 (2, 26) | Modification: 5/8 | 21 (2, 26) |
Partially coupled: 2/4 | 48 (3, 92) | No incentive: 2/4 | 96 (92, 100) | No modification: 4/5 | 24 (6, 66) | |||
Fully uncoupled: 5/6 | 8 (2, 21) | |||||||
Sociodemographics | 10/15 | 9 (3, 49) | Fully coupled: 2/4 | 74 (49, 99) | Incentive: 7/9 | 4 (0, 12) | Modification: 5/8 | 5 (4, 12) |
Partially coupled: 2/4 | 47 (0, 93) | No incentive: 3/6 | 93 (33, 99) | No modification: 5/7 | 33 (3, 49) | |||
Fully uncoupled: 6/7 | 5 (3, 12) | |||||||
Housing, employment, and insurance | 9/13 | 15 (3, 79) | Fully coupled: 2/3 | 88 (79, 97) | Incentive: 7/9 | 5 (1, 22) | Modification: 5/8 | 15 (5, 22) |
Partially coupled: 2/4 | 48 (3, 93) | No incentive: 2/4 | 95 (93, 97) | No modification: 4/5 | 41 (2, 86) | |||
Fully uncoupled: 5/6 | 5 (1, 15) | |||||||
Work PPE and distancing | 8/11 | 8 (1, 51) | Fully coupled: 1/2 | 80 (80, 80) | Incentive: 7/8 | 3 (0, 22) | Modification: 4/6 | 1 (0, 12) |
Partially coupled: 2/4 | 45 (3, 86) | No incentive: 1/3 | 86 (86, 86) | No modification: 4/5 | 46 (8, 83) | |||
Fully uncoupled: 5/5 | 2 (0, 12) | |||||||
Medical history | 8/12 | 8 (4, 31) | Fully coupled: 2/3 | 70 (40, 100) | Incentive: 7/9 | 7 (3, 22) | Modification: 5/8 | 8 (7, 22) |
Partially coupled: 1/3 | 3 (3, 3) | No incentive: 1/3 | 100 (100, 100) | No modification: 3/4 | 3 (2, 40) | |||
Fully uncoupled: 5/6 | 7 (5, 8) | |||||||
Health status | 8/12 | 15 (4, 49) | Fully coupled: 1/2 | 76 (76, 76) | Incentive: 7/9 | 10 (2, 22) | Modification: 4/7 | 11 (2, 21) |
Partially coupled: 2/4 | 52 (5, 98) | No incentive: 1/3 | 98 (98, 98) | No modification: 4/5 | 43 (8, 87) | |||
Fully uncoupled: 5/6 | 10 (2, 20) | |||||||
Vaccine acceptance | 9/13 | 12 (7, 24) | Fully coupled: 2/3 | 76 (62, 90) | Incentive: 7/9 | 9 (2, 24) | Modification: 5/8 | 22 (2, 24) |
Partially coupled: 2/4 | 10 (7, 12) | No incentive: 2/4 | 51 (12, 90) | No modification: 4/5 | 11 (8, 37) | |||
Fully uncoupled: 5/6 | 9 (2, 22) | |||||||
Previous testing experiences | 8/11 | 7 (2, 55) | Fully coupled: 2/3 | 50 (0, 99) | Incentive: 6/7 | 4 (2, 7) | Modification: 4/6 | 4 (2, 53) |
Partially coupled: 2/4 | 55 (11, 98) | No incentive: 2/4 | 99 (98, 99) | No modification: 4/5 | 9 (4, 55) | |||
Fully uncoupled: 4/4 | 4 (2, 7) | |||||||
Alcohol and tobacco | 5/9 | 4 (3, 10) | Fully coupled: 1/2 | 85 (85, 85) | Incentive: 5/6 | 4 (3, 10) | Modification: 3/5 | 4 (2, 10) |
Partially coupled: 1/4 | 3 (3, 3) | No incentive: 0/3 | – | No modification: 2/4 | 44 (3, 85) | |||
Fully uncoupled: 3/3 | 4 (2, 10) | |||||||
Identity | 8/12 | 18 (3, 80) | Fully coupled: 1/2 | 99 (99, 99) | Incentive: 6/8 | 8 (2, 22) | Modification: 5/8 | 22 (13, 60) |
Partially coupled: 2/4 | 50 (0, 99) | No incentive: 2/4 | 99 (99, 99) | No modification: 3/4 | 3 (0, 99) | |||
Fully uncoupled: 5/6 | 13 (3, 22) |
CDE, common data element; PPE, personal protective equipment; RTS, Return to School.
Calculated as the number of aims reporting missingness/number of aims that included the domain in CDE collection. Because of exemptions, not all CDE question domains were covered by all study aims.
Among the 8 aims for which CDE questions were modified, 6 reframed questions for child participants. Missingness percentages were reported for both of the 2 aims modifying questions in ways other than reframing for child participants.
Qualitative Evaluation: Key Takeaways
Project teams identified several lessons learned from their CDE collection experiences. First, teams qualitatively noted that among participants who were offered the CDE questions, CDE completion rates were low, given that many individuals elected not to participate in CDE collection and those who did participate often did not complete their CDE questionnaires. Project teams hypothesized that possible reasons for low CDE completion rates were (1) the amount of time needed to complete the CDE questions and (2) participant skepticism stemming from unclear understanding of how and why nontesting data were being collected and used.
Second, among the project teams who provided incentives for CDE collection, completion was variable, and project teams reported a range of participant engagement in cases in which incentives were offered. Some projects noted that incentives were “essential” or “instrumental in achieving a [given] response rate,” whereas others noted that “not everyone will complete the CDEs despite an associated incentive.”
Finally, project teams that fully uncoupled CDE collection from testing noted that they felt the distinction was a necessary step to prevent CDE questions from being a barrier to increasing access to testing. For the 4 RTS projects that engaged community partners in selecting and editing CDE questions before broad distribution, feedback from community partners reinforced the challenges and concerns about data privacy and invasiveness identified by researchers.
Discussion
By requiring funded projects to collect more than 150 CDE questions per participant as part of project execution, RADx-UP envisioned creating a large data repository representative of perspectives from historically underrepresented research populations as a way to potentially resolve questions that could inform health disparities reduction efforts via community-engaged approaches. Nevertheless, our examination of CDE collection among RTS projects suggests that additional steps are needed to better operationalize this goal. Our results showed that each RTS project facilitated data collection using a variety of strategies to promote community comfort and reduce perceived barriers to testing access. There was minimal variation in the racial and ethnic distribution of CDE survey participants from those who participated in testing, but fully uncoupling CDE questions from testing increased the proportion of Black and Hispanic individuals participating in both initiatives. Furthermore, across all projects, the percent of missing data was greatest among nontesting CDE domains that sought to collect potentially personal, sensitive, or identifiable information; however, the amount of missing data for these domains was more limited when financial incentives were offered.
RADx-UP RTS projects identified highly heterogeneous methods for implementation of CDE collection. Reasons for this heterogeneity may be related to the principles of community-engaged research13 in which RTS projects were responsive to local context and used approaches to CDE collection that were informed by community experiences.14 The tailored data collection strategies essential for community-partnered research precludes a unified, 1-size-fits-all approach to CDE collection, but also highlights the importance of uniform questions and answers for collected data elements to ensure meaningful data aggregation across projects.15 A 2018 retrospective analysis16 of a real-world dataset developed as part of a CDE investigation found that of the 1414 CDEs collected across 426 clinical studies with a genomics focus, only 32 “truly common” CDEs were identified, demonstrating limited data harmonization. Although this study focused primarily on identifying CDEs across genetic data repositories, resulting in an abundance of heterogeneity across nearly 25 000 variables, the majority of “truly common” CDEs identified included sociodemographic (age, race, gender) or anthropometric (height, weight) data rather than novel data collected between topic-related projects.
At the project level, there was minimal variation in the racial and ethnic distribution of CDE survey participants from those who participated in testing. Nonetheless, when stratifying project study aims by coupling status, the skew in racial representativeness toward White participants among aims that fully or partially coupled CDE questions with testing highlights the possibility that coupling CDE collection to testing initiatives inadvertently served as a barrier to SARS-CoV-2 testing participation. Possible reasons for this finding may be related to distrust within communities that have been historically underrepresented or harmed by historical research efforts. Distrust is a commonly cited barrier to research engagement among participants from underrepresented backgrounds17,18 ; linking testing initiative enrollment to mandatory data collection could further perceptions of exposure to unnecessary risk18 and contribute to decisions to forgo involvement all together.
Alternatively, it is possible that our findings related to racial and ethnic distribution and coupling status are reflective of project teams’ proactive decision-making to uncouple testing and CDE collection because of anticipated community hesitation or recommendations provided by consulting community members. Given that most of the sites were in partnership with community-based schools, the level of detail required through the CDEs could have been out of character for typical information shared in these settings and perhaps made participants feel that research was being conducted “on” them, rather than “with” and “for” them.
There were also concerns about data amount, quality, and utility. Participants enrolled in most of the projects were required to complete 153 questions. One previous study demonstrated that survey completion rates substantially decrease (from 63% to 37%) when survey lengths increase from 13 to 72 questions, respectively, suggesting that limiting the number of survey questions is associated with increased response rate and reliability.19 Researchers investigating respondent preferences across 700+ participants found that the ideal length of time to complete a survey is roughly 10 minutes, whereas 20 minutes is the maximum recommended duration.16 The large quantities of missing CDE data also introduce worries about systematic nonresponse and the potential for biased conclusions, thereby potentially harming underserved populations that often battle stereotypes and systematic discrimination based on false narratives. Furthermore, the large quantities of missing data may limit the ability to meaningfully combine data across projects and raises questions about the utility of such efforts. Finally, one should use caution when interpreting data from adult-oriented CDEs in predominantly pediatric populations. Given that the original CDEs were adult focused, issues around parent-proxy respondents, mismatches between question content and developmental age, and the practicalities of data management with redundant variables for parent and child CDE data when collected as dyads, all could have contributed to our findings around data completion, exemption requests, decisions around coupling, and ultimately data quality.
Monetary compensation did not impact participant willingness to complete CDE questions among the majority of RTS participants. We did not identify major differences in the total amount of missing data between study aims that included and did not include incentives, but when stratifying missing data by CDE domain within each aim, the percent of missing data was profoundly lower in study aims where project teams used incentives compared with those that did not. This could indicate that participants felt adequately compensated for their time. In contrast to previous work that demonstrated increased racial diversity among survey participants when compensation was offered,19 we found no major differences in racial representativeness of participants at either level of incentive offering; however, we found increased representation of those identifying as Hispanic when an incentive was offered.
Our study has several limitations. First, the sample size of 13 participating project teams is small; therefore, views and recommendations presented here may not be representative of the range of RTS projects or of other RADx-UP CDE collection efforts outside of the subset of participating RTS projects. Second, given time restraints and ethical considerations, the questionnaire used to survey RTS project teams collected aggregate data that were self-reported by each RTS project team rather than individual-level data across all projects. Furthermore, project teams were only able to comment qualitatively on their achieved CDE completion rates. Data collection strategies varied greatly by teams and within project aims, and, as a result, not all teams were able to specify the total number of individuals who received their CDE questions. Therefore, we lacked the denominator needed to calculate completion rates. Finally, this study describes only a subset of community members willing to participate in the RTS projects and may not be representative of each community as a whole. Despite these limitations, we were able to survey the majority of RTS project teams, summarize CDE collection efforts within K-12 school communities across a wide geographic area, and aggregate data across heterogenous projects to describe overarching trends.
RTS project teams recommended the following for future data collection: (1) provide clear communication about data collection goals; (2) separate nontesting and testing CDE collection to decrease testing access barriers; (3) connect data collection to the project objective in a meaningful way if it is essential that data be collected in conjunction with a community-engaged project; (4) prioritize questions so fewer, higher quality questions are asked; and (5) increase community involvement at all stages of question development.
Conclusions
We must continue to critically examine and improve CDE collection efforts within programs designed to increase representation of underrepresented populations. Failing to improve CDE collection efforts and engage community stakeholders in an intentional and meaningful way may be harmful to individuals from underrepresented groups20,21 and may risk extrapolating findings that incorrectly inform future research efforts. Therefore, collaboration with underrepresented populations from the early study design process may improve interest and participation in CDE collection efforts and may help address ongoing mistrust issues related to health research data collection in marginalized communities.
Acknowledgments
The authors thank the families, school employees, Tribal Nations, and other community members for taking the time to participate in this study. The authors acknowledge the sacrifices, stress, and difficult decisions that our participants and many others have had to make to support the safe growth and development of children in the midst of the global pandemic. The project would like to acknowledge the investigators, staff, students, and trainees who worked tirelessly to complete these projects. The research team is grateful for the time and support of our community partners and research participants. Erin Campbell, MS, provided editorial review and submission of this manuscript but did not receive compensation for her contributions, apart from her employment at the institution where this study was conducted.
Drs Goldman, Schuster, Newland, Johnson, Stump, Coller, Keener Mast, and Haroz conceptualized and designed the study, reviewed, and revised the manuscript; Mr Kemp and Mr Anderson designed the data collection instruments, collected data, carried out the initial analyses, and reviewed and revised the manuscript; Drs Dozier, Inkelas, Foxe, Gwynn, Gurnett, McDaniels, D’Agostino, DeMuri, Wu, Pulgaron, Kiene, Oren, Allison-Burbank, Okihiro, Lee, Mss Holden-Wiltse, Potts, Zandi, Benjamin, Spallina, Mr Walsh, Mr Watterson, and Mr Corbett made substantial contributions to the study’s conception and provided review and revision of the manuscript; Ms Uthappa and Drs Mann and Zimmerman conceptualized and designed the study, drafted the initial manuscript, designed the data collection instruments, collected data, carried out the initial analyses, coordinated and supervised data collection, and critically reviewed the manuscript for important intellectual content; and all authors approved the final manuscript as submitted and agree to be accountable for all aspects of the work.
This trial has been registered at www.clinicaltrials.gov (identifier NCT05178290 and NCT05150860).
FUNDING: Research reported in this publication was supported by the Office of the Director of the National Institutes of Health (NIH) under award number U24MD016258; NIH Agreement No.’s OT2HD107543, OT2HD107553, OT2HD107555, OT2HD107556, OT2HD107557, OT2HD107558, OT2HD107559, OT2HD108103, OT2HD108101, OT2HD108105, OT2HD108111, OT2HD108112, OT2HD108097, and OT2HD108110; the National Center for Advancing Translational Sciences of the NIH under award number U24TR001608; and the National Institute of Child Health and Human Development of the NIH under contract HHSN275201000003I. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health.
CONFLICT OF INTEREST DISCLOSURES: Dr Zimmerman reports funding from the National Institutes of Health (NIH) and US Food and Drug Administration. Dr Goldman reports funding from the NIH. Dr Schuster reports funding from the NIH. The other authors have indicated they have no potential conflicts of interest to disclose.
Comments