Lack of a comprehensive database containing diagnosis, patient and clinical characteristics, diagnostics, treatments, and outcomes limits needed comparative effectiveness research (CER) to improve care in the PICU. Combined, the Pediatric Hospital Information System (PHIS) and Virtual Pediatric Systems (VPS) databases contain the needed data for CER, but limits on the use of patient identifiers have thus far prevented linkage of these databases with traditional linkage methods. Focusing on the subgroup of patients with bronchiolitis, we aim to show that probabilistic linkage methods accurately link data from PHIS and VPS without the need for patient identifiers to create the database needed for CER.
We used probabilistic linkage to link PHIS and VPS records for patients admitted to a tertiary children’s hospital between July 1, 2017 to June 30, 2019. We calculated the percentage of matched records, rate of false-positive matches, and compared demographics between matched and unmatched subjects with bronchiolitis.
We linked 839 of 920 (91%) records with 4 (0.5%) false-positive matches. We found no differences in age (P = .76), presence of comorbidities (P = .16), admission illness severity (P = .44), intubation rate (P = .41), or PICU stay length (P = .36) between linked and unlinked subjects.
Probabilistic linkage creates an accurate and representative combined VPS-PHIS database of patients with bronchiolitis. Our methods are scalable to join data from the 38 hospitals that jointly contribute to PHIS and VPS, creating a national database of diagnostics, treatment, outcome, and patient and clinical data to enable CER for bronchiolitis and other conditions cared for in the PICU.
PICU care is complex, resource intense, and associated with high morbidity and mortality. As such, PICU care should be a focus of comparative effectiveness research (CER) and guideline development, yet high quality evidence to guide care is lacking, resulting in variable and potentially low-quality care.1,2 Although randomized controlled trials are considered the gold standard of evidence, completion of randomized controlled trials to guide all of PICU care is impractical. Leveraging existing PICU databases and cohort-based research methods is an ideal strategy for rapid CER.3,4 However, CER studies require a database that includes: adequate numbers of patients for sufficient power, accurate diagnoses to identify patients, treatment and diagnostic data to assign exposures, outcome data, and data on patient characteristics such as illness severity and comorbidities for confounding adjustment. Unfortunately, for many PICU conditions, there are no currently available databases meeting these criteria.5,6
Two large databases, the Pediatric Health Information System (PHIS) and the Virtual Pediatric System (VPS), each contain a subset of the elements needed for CER for a variety of PICU conditions. VPS uniquely contains manually abstracted PICU diagnoses to identify subjects of interest and PICU measures of illness severity for confounding adjustment. PHIS uniquely contains medication, laboratory, and radiology data to study the use and effects of different diagnostic and treatment strategies. Table 1 summarizes common and unique PHIS and VPS variable categories an investigator may use for CER. However, constraints on the use of protected health information (PHI) limits the ability to directly link PHIS and VPS across multiple centers. Probabilistic linkage is a method to link data sets that does not rely on PHI.7–9
Common and Unique Variable Categories Between the Pediatric Health Information System (PHIS) and Virtual Pediatric System (VPS)
Unique PHIS Variables . | Common Variables . | Unique VPS Variables . |
---|---|---|
All ICD 9 and 10 coded diagnosis throughout hospitalization | Demographics (age, gender, race and ethnicity) | PICU specific diagnosis |
Resource utilization (billed medications, laboratory studies, radiology studies, standardized cost) | Procedures | Granular description of organ support modalities and duration of support |
Complex chronic condition coding | Mortality | Severity of illness scores (Pediatric Index of Mortality, PIM; Pediatric Risk of Mortality, PRISM; Pediatric Logistic Organ Dysfunction, PELOD) |
Hospital and PICU length of stay | PICU characteristics (size, academic affiliation, level of care) | |
Patient source | ||
Disposition |
Unique PHIS Variables . | Common Variables . | Unique VPS Variables . |
---|---|---|
All ICD 9 and 10 coded diagnosis throughout hospitalization | Demographics (age, gender, race and ethnicity) | PICU specific diagnosis |
Resource utilization (billed medications, laboratory studies, radiology studies, standardized cost) | Procedures | Granular description of organ support modalities and duration of support |
Complex chronic condition coding | Mortality | Severity of illness scores (Pediatric Index of Mortality, PIM; Pediatric Risk of Mortality, PRISM; Pediatric Logistic Organ Dysfunction, PELOD) |
Hospital and PICU length of stay | PICU characteristics (size, academic affiliation, level of care) | |
Patient source | ||
Disposition |
ICD, International Classification of Diseases; PELOD, Pediatric Logistic Organ Dysfunction; PIM, Pediatric Index of Mortality; PRISM, Pediatric Risk of Mortality Score.
We aim to use single site data to assess the accuracy and generalizability of probabilistic linkage to combine the PHIS and VPS databases without PHI. To illustrate the utility of this combined database to study a specific PICU condition, we focus on patients with bronchiolitis. Bronchiolitis is a leading cause of pediatric hospital and ICU admissions, yet there is little research to guide PICU care, making it a prototypical condition in need of CER.10–14 Once complete, the methods developed in this study can be scaled to create a multisite combined PHIS-VPS database to study bronchiolitis.
Methods
This is a database linkage study utilizing data contributed by a children’s hospital in the Western United Sates to the PHIS and VPS databases. The local Institutional Review Board approved the project with waiver of consent (# 00122830).
Data Sources and Patients
PHIS contains administrative data from >40 children’s hospitals in the United States, including demographics, diagnoses (International Classification of Diseases 10th Revision, Clinical Modification), and billing codes.15–17 VPS contains data entered by trained abstractors from >100 PICUs in the United States, including demographics, PICU diagnosis, procedures, and severity of illness scores.6,18 Both databases undergo quality checks.18,19 Both databases limit access to PHI for multisite studies, but allow PHI use for a participating hospital’s own study.
We obtained data for patients admitted to the PICU between July 1, 2017 and June 30, 2019 from both data sets. We included PHIS records containing a billing code for an ICU room charge.15 To analyze linkage accuracy and generalizability, we examined VPS records for patients with a primary diagnosis of bronchiolitis (VPS diagnostic codes “Bronchiolitis caused by RSV,” “Bronchiolitis, excludes RSV,” or “Bronchitis or Bronchiolitis”) and a PICU stay >1 day.
Database Linkage
Probabilistic linkage assigns a probability that a pair of records are a match based on the pattern of agreements and disagreements on common variables.7–9,20 A probability threshold is identified and pairs above this threshold are considered “matches” and carried forward. To identify the optimum match probability threshold, we calculated the percent of true and false matches at different thresholds and chose the threshold that minimized false-positives while maintaining a high match percent.9 We retained the highest probability match for each VPS record. Table 2 lists the PHIS and VPS variables used for linkage and Supplemental Table 4 summaries the crosswalk developed for PHIS and VPS variables. We used LinkSolv 9.1 (Strategic Matching In, Morrisonville, NY) to perform the linkage.8 The Supplemental Appendix 1 contains additional linkage details.
Common PHIS and VPS Variables Used in Linkage
Variable . | Notes . |
---|---|
Gender | Categories: female, male |
Age | Defined as days per 365 |
Race and ethnicity | Categories: African American or Black, American Indian or Alaska Native, Asian, Hispanic, Native Hawaiian or Pacific Islander, white, unspecified |
Year of admission | NA |
Year of discharge | NA |
Hospital discharge day of week | NA |
PICU admission day of week | NA |
Hospital length of stay | In days |
PICU length of stay | In days |
Discharge disposition | Categories: home, transfer to another acute care hospital, transfer to psychiatric facility, transfer to skilled nursing facility, transfer to physical rehab center, left against medical advice, expired, unspecified |
Variable . | Notes . |
---|---|
Gender | Categories: female, male |
Age | Defined as days per 365 |
Race and ethnicity | Categories: African American or Black, American Indian or Alaska Native, Asian, Hispanic, Native Hawaiian or Pacific Islander, white, unspecified |
Year of admission | NA |
Year of discharge | NA |
Hospital discharge day of week | NA |
PICU admission day of week | NA |
Hospital length of stay | In days |
PICU length of stay | In days |
Discharge disposition | Categories: home, transfer to another acute care hospital, transfer to psychiatric facility, transfer to skilled nursing facility, transfer to physical rehab center, left against medical advice, expired, unspecified |
NA, not applicable.
A limitation of any linkage is the ability to assess linkage accuracy in the absence of a gold-standard set to compare results. Without a gold-standard, researchers can estimate error rates based on the calculated probability of 2 records being a match without being able to know if they are a true match. To overcome this limitation, we created a gold standard set of matches for patients admitted to our PICU by first joining records that matched on the financial identification number (a patient and hospitalization specific identifier). Next, all remaining pairs agreeing on medical record number (a patient specific identifier) and date of hospital admission were joined. Finally, any remaining pairs agreeing on date of birth and date of PICU admission were also joined. For VPS records matching to multiple PHIS records, we identified the true match by manual chart review.
Linkage Accuracy
We compared probabilistic linkage to the gold standard to identify false-positive and -negative matches. To assess the generalizability of the matched records, we compared the demographic and clinical characteristics of matched and unmatched VPS records using SAS 9.4 (SAS Institute, Inc, Cary, NC). As VPS data are based on manual chart review, we considered the subjects in the VPS database the “true” number of PICU admissions.
Results
We obtained 4264 VPS and 4778 PHIS records for linkage. Figure 1 summarizes the flow of records for linkage. The VPS data contained 920 patients with a primary PICU diagnosis of bronchiolitis and a PICU stay >1 day, with all 920 matched in the gold standard. Probabilistic linkage joined 839 of 920 (91%) VPS records with 4 (0.5%) false matches when compared with the gold standard. We found no significant differences in demographic or clinical characteristics between the probabilistically linked and unlinked records (Table 3).
Flowchart of records. VPS records were probabilistically and deterministically (“Gold Standard”) matched to a PHIS record. In probabilistic linkage, 81/920 (9%) VPS records of patients with bronchiolitis did not match to a PHIS record with a match probability ≥0.99. In deterministic linkage, all 920 VPS records of patients with bronchiolitis matched to a PHIS record.
Flowchart of records. VPS records were probabilistically and deterministically (“Gold Standard”) matched to a PHIS record. In probabilistic linkage, 81/920 (9%) VPS records of patients with bronchiolitis did not match to a PHIS record with a match probability ≥0.99. In deterministic linkage, all 920 VPS records of patients with bronchiolitis matched to a PHIS record.
Summary and Comparison of Demographics and Clinical Characteristics of Linked and Unlinked Records
Characteristic . | Linked Records (n = 839) . | Unlinked Records (n = 81) . | P . |
---|---|---|---|
Female, n (%) | 372 (44%) | 37 (46%) | .82a |
Age, median years (IQR) | 0.7 (0.3 to 1.5) | 0.7 (0.2 to 1.5) | .76b |
# CCC, median (IQR) | 0 (0 to 1) | 0 (0 to 1) | .16b |
PIM 3 score, median (IQR) | −4.6 (−4.7 to −4.4) | −4.6 (−4.7 to −4.4) | .44b |
Intubated, n (%) | 233 (28) | 19 (24) | .41a |
PICU LOS, median days (IQR) | 4 (3 to 6) | 4 (2 to 6) | .36b |
Characteristic . | Linked Records (n = 839) . | Unlinked Records (n = 81) . | P . |
---|---|---|---|
Female, n (%) | 372 (44%) | 37 (46%) | .82a |
Age, median years (IQR) | 0.7 (0.3 to 1.5) | 0.7 (0.2 to 1.5) | .76b |
# CCC, median (IQR) | 0 (0 to 1) | 0 (0 to 1) | .16b |
PIM 3 score, median (IQR) | −4.6 (−4.7 to −4.4) | −4.6 (−4.7 to −4.4) | .44b |
Intubated, n (%) | 233 (28) | 19 (24) | .41a |
PICU LOS, median days (IQR) | 4 (3 to 6) | 4 (2 to 6) | .36b |
CCC, complex chronic condition; IQR, interquartile range; LOS, length of stay; PIM 3, Pediatric Index of Mortality – 3.
χ2 test.
Mann–Whitney U test.
Discussion
We conducted this linkage project to demonstrate that probabilistic linkage can successfully link records between VPS and PHIS without PHI. Creating this linked database removes a significant barrier to researching care in the PICU by creating a data set with granular demographic, resource use, comorbidities or severity of illness, and outcome data. As our group is interested in studying bronchiolitis care, we simultaneously used PHI to create a gold standard linked database and assessed the accuracy and generalizability of our probabilistic linkage method to link records of patients with bronchiolitis in the PICU. Our method linked a majority of records (91%) with high accuracy (0.5% false-positives). We found no difference in demographic and clinical characteristics between linked and unlinked records, indicating that the probabilistic linkage technique creates an unbiased set of linked records.
Our results compare favorably to recent work using probabilistic linkage to combine pediatric administrative and clinical databases. Bennet et al combined PHIS and National Trauma Data Bank data utilizing demographic and procedure data and linked 88% of subjects with severe traumatic brain injury.8 Dziorny et al combined VPS and PEDSnet data using demographic and granular laboratory and vital sign data to match 93% of subjects with sepsis.9 Together, these studies highlight the ability of probabilistic linkage to create novel research databases and overcome the limitations of deterministic linkage. Probabilistic linkage can be successful provided there is sufficient information content in the common variables to separate true and false record matches.8
Our pilot linkage has demonstrated the accuracy of our linkage method; thus, our next step is to create a national linked PHIS-VPS database. We are in the process of obtaining PHIS and VPS data from the 38 hospitals that contribute to both databases. A potential limitation of scaling our method to multiple sites is increased false-negative links by attempting to link data sets with more subjects.21 However, both PHIS and VPS can create common hospital groupings, which allows us to link records within the smaller subgroups similar in size to the linkage presented here. We are also in the process of obtaining specific, deidentified variables such as recoded dates and record numbers to use in a combined deterministic and probabilistic linkage strategy.
A probabilistically linked PHIS-VPS database has important implications for research. First, a barrier to combining PHIS-VPS data has been the need to obtain PHI from multiple sites to enable deterministic linkage methods. Obtaining site-level PHI requires a time and logistic intense approval process that limits site participation. In fact, a previous attempt to link VPS and PHIS using site-level PHI yielded participation from only a third of sites contributing data to both databases.22 A probabilistic linkage without PHI allows a single site (and single IRB approval) to quickly obtain data from PHIS and VPS for all common sites, join as needed, and perform the desired analysis. Further a linked PHIS-VPS database contains the information needed for CER: PICU diagnosis (VPS) for cohort identification; resources used (PHIS) and procedures and support modalities (VPS) for exposures; mortality (PHIS and VPS), length of stay (PHIS and VPS), and duration of support modalities (VPS) for outcomes; and severity of illness scores (VPS) and patient demographics (PHIS and VPS) for confounder adjustment. Leveraging the fact the PHIS contains data from an entire hospitalization, researchers can examine how PICU and non-PICU periods of care affect each other. Finally, although we focused on assessing the accuracy of probabilistic linkage to study bronchiolitis, these methods may be applied to other PICU conditions. As linkage success is associated with the numbers of records to be linked and the distribution of variables, we expect any condition with a similar or smaller prevalence and similar variable distribution to bronchiolitis to have similar linkage accuracy as our results.8,21
Use of a PHIS-VPS database to study bronchiolitis has limitations. First, potential variability in coding of linkage variables between sites could alter the linkage accuracy. We mitigated this by choosing linkage variables with standard PHIS and VPS data definitions. Variability in coding “PICU” status in PHIS may affect patient identification as patients in the PICU for <1 day may not trigger PICU stay billing codes. We chose to make VPS records, which are manually identified as PICU patients, our primary database for linkage, retained only the best match for each VPS record, and focused on subjects with a PICU stay >1 day to minimize this limitation. Finally, PHIS sites are children’s hospitals; thus, they are not generalizable to community based PICUs.
Conclusions
Probabilistic linkage accurately combines PHIS and VPS data without PHI to enable CER. Although we focused on validating the linkage for patients with bronchiolitis, our method can be applied to create a data set to study PICU care for a large variety of conditions.
Acknowledgments
VPS data were provided by Virtual Pediatric Systems, LLC. No endorsement or editorial restriction of the interpretation of these data or opinions of the authors has been implied or stated.
Dr Flaherty conceptualized and designed the study, led analysis and interpretation, and drafted the initial manuscript; Drs Srivastava, Cook, and Keenan supervised the conceptualization and design of the study, and supervised analysis and interpretation; Ms Smith contributed to the design of the study and conducted analysis and interpretation of data; Dr Dziorny contributed to the analysis and interpretation of data; and all authors critically reviewed and revised the manuscript, and approved the final manuscript as submitted.
FUNDING: Support for the research was provided by Dr Flaherty’s institution in the form of an Intermountain Foundation at Primary Children’s Hospital Early Career Development Research Grant and by the National Center for Advancing Translational Sciences of the National Institutes of Health under Award UM1TR004409. Additionally, Dr Flaherty’s effort in this project was supported in part by a Utah Clinical and Translational Science Institute (CTSI) Partner Scholars Program Award.
CONFLICT OF INTEREST DISCLOSURES: Dr Dziorny reports that he received funding from the American Academy of Pediatrics (AAP) to travel and speak at the 2022 AAP National Conference and Exhibition; Dr Srivastava reports support from the IPASS Patient Safety Institute, is a physician founder of the IPASS Patient Safety Institute and his equity is owned by his employer, Intermountain Healthcare, has active grants from the following federal agencies: PCORI, NIH, AHRQ, CDC (the grant funds are paid to his institution outside the submitted work), and has received monetary awards, honoraria, and travel reimbursement from multiple academic and professional organizations for teaching and consulting on quality of care, spreading evidence-based best practices in health systems and pediatric hospital medicine.
Comments