OBJECTIVES

Lack of a comprehensive database containing diagnosis, patient and clinical characteristics, diagnostics, treatments, and outcomes limits needed comparative effectiveness research (CER) to improve care in the PICU. Combined, the Pediatric Hospital Information System (PHIS) and Virtual Pediatric Systems (VPS) databases contain the needed data for CER, but limits on the use of patient identifiers have thus far prevented linkage of these databases with traditional linkage methods. Focusing on the subgroup of patients with bronchiolitis, we aim to show that probabilistic linkage methods accurately link data from PHIS and VPS without the need for patient identifiers to create the database needed for CER.

METHODS

We used probabilistic linkage to link PHIS and VPS records for patients admitted to a tertiary children’s hospital between July 1, 2017 to June 30, 2019. We calculated the percentage of matched records, rate of false-positive matches, and compared demographics between matched and unmatched subjects with bronchiolitis.

RESULTS

We linked 839 of 920 (91%) records with 4 (0.5%) false-positive matches. We found no differences in age (P = .76), presence of comorbidities (P = .16), admission illness severity (P = .44), intubation rate (P = .41), or PICU stay length (P = .36) between linked and unlinked subjects.

CONCLUSIONS

Probabilistic linkage creates an accurate and representative combined VPS-PHIS database of patients with bronchiolitis. Our methods are scalable to join data from the 38 hospitals that jointly contribute to PHIS and VPS, creating a national database of diagnostics, treatment, outcome, and patient and clinical data to enable CER for bronchiolitis and other conditions cared for in the PICU.

PICU care is complex, resource intense, and associated with high morbidity and mortality. As such, PICU care should be a focus of comparative effectiveness research (CER) and guideline development, yet high quality evidence to guide care is lacking, resulting in variable and potentially low-quality care.1,2  Although randomized controlled trials are considered the gold standard of evidence, completion of randomized controlled trials to guide all of PICU care is impractical. Leveraging existing PICU databases and cohort-based research methods is an ideal strategy for rapid CER.3,4  However, CER studies require a database that includes: adequate numbers of patients for sufficient power, accurate diagnoses to identify patients, treatment and diagnostic data to assign exposures, outcome data, and data on patient characteristics such as illness severity and comorbidities for confounding adjustment. Unfortunately, for many PICU conditions, there are no currently available databases meeting these criteria.5,6 

Two large databases, the Pediatric Health Information System (PHIS) and the Virtual Pediatric System (VPS), each contain a subset of the elements needed for CER for a variety of PICU conditions. VPS uniquely contains manually abstracted PICU diagnoses to identify subjects of interest and PICU measures of illness severity for confounding adjustment. PHIS uniquely contains medication, laboratory, and radiology data to study the use and effects of different diagnostic and treatment strategies. Table 1 summarizes common and unique PHIS and VPS variable categories an investigator may use for CER. However, constraints on the use of protected health information (PHI) limits the ability to directly link PHIS and VPS across multiple centers. Probabilistic linkage is a method to link data sets that does not rely on PHI.79 

TABLE 1

Common and Unique Variable Categories Between the Pediatric Health Information System (PHIS) and Virtual Pediatric System (VPS)

Unique PHIS VariablesCommon VariablesUnique VPS Variables
All ICD 9 and 10 coded diagnosis throughout hospitalization Demographics (age, gender, race and ethnicity) PICU specific diagnosis 
Resource utilization (billed medications, laboratory studies, radiology studies, standardized cost) Procedures Granular description of organ support modalities and duration of support 
Complex chronic condition coding Mortality Severity of illness scores (Pediatric Index of Mortality, PIM; Pediatric Risk of Mortality, PRISM; Pediatric Logistic Organ Dysfunction, PELOD) 
 Hospital and PICU length of stay PICU characteristics (size, academic affiliation, level of care) 
 Patient source  
 Disposition  
Unique PHIS VariablesCommon VariablesUnique VPS Variables
All ICD 9 and 10 coded diagnosis throughout hospitalization Demographics (age, gender, race and ethnicity) PICU specific diagnosis 
Resource utilization (billed medications, laboratory studies, radiology studies, standardized cost) Procedures Granular description of organ support modalities and duration of support 
Complex chronic condition coding Mortality Severity of illness scores (Pediatric Index of Mortality, PIM; Pediatric Risk of Mortality, PRISM; Pediatric Logistic Organ Dysfunction, PELOD) 
 Hospital and PICU length of stay PICU characteristics (size, academic affiliation, level of care) 
 Patient source  
 Disposition  

ICD, International Classification of Diseases; PELOD, Pediatric Logistic Organ Dysfunction; PIM, Pediatric Index of Mortality; PRISM, Pediatric Risk of Mortality Score.

We aim to use single site data to assess the accuracy and generalizability of probabilistic linkage to combine the PHIS and VPS databases without PHI. To illustrate the utility of this combined database to study a specific PICU condition, we focus on patients with bronchiolitis. Bronchiolitis is a leading cause of pediatric hospital and ICU admissions, yet there is little research to guide PICU care, making it a prototypical condition in need of CER.1014  Once complete, the methods developed in this study can be scaled to create a multisite combined PHIS-VPS database to study bronchiolitis.

This is a database linkage study utilizing data contributed by a children’s hospital in the Western United Sates to the PHIS and VPS databases. The local Institutional Review Board approved the project with waiver of consent (# 00122830).

PHIS contains administrative data from >40 children’s hospitals in the United States, including demographics, diagnoses (International Classification of Diseases 10th Revision, Clinical Modification), and billing codes.1517  VPS contains data entered by trained abstractors from >100 PICUs in the United States, including demographics, PICU diagnosis, procedures, and severity of illness scores.6,18  Both databases undergo quality checks.18,19  Both databases limit access to PHI for multisite studies, but allow PHI use for a participating hospital’s own study.

We obtained data for patients admitted to the PICU between July 1, 2017 and June 30, 2019 from both data sets. We included PHIS records containing a billing code for an ICU room charge.15  To analyze linkage accuracy and generalizability, we examined VPS records for patients with a primary diagnosis of bronchiolitis (VPS diagnostic codes “Bronchiolitis caused by RSV,” “Bronchiolitis, excludes RSV,” or “Bronchitis or Bronchiolitis”) and a PICU stay >1 day.

Probabilistic linkage assigns a probability that a pair of records are a match based on the pattern of agreements and disagreements on common variables.79,20  A probability threshold is identified and pairs above this threshold are considered “matches” and carried forward. To identify the optimum match probability threshold, we calculated the percent of true and false matches at different thresholds and chose the threshold that minimized false-positives while maintaining a high match percent.9  We retained the highest probability match for each VPS record. Table 2 lists the PHIS and VPS variables used for linkage and Supplemental Table 4 summaries the crosswalk developed for PHIS and VPS variables. We used LinkSolv 9.1 (Strategic Matching In, Morrisonville, NY) to perform the linkage.8  The Supplemental Appendix 1 contains additional linkage details.

TABLE 2

Common PHIS and VPS Variables Used in Linkage

VariableNotes
Gender Categories: female, male 
Age Defined as days per 365 
Race and ethnicity Categories: African American or Black, American Indian or Alaska Native, Asian, Hispanic, Native Hawaiian or Pacific Islander, white, unspecified 
Year of admission NA 
Year of discharge NA 
Hospital discharge day of week NA 
PICU admission day of week NA 
Hospital length of stay In days 
PICU length of stay In days 
Discharge disposition Categories: home, transfer to another acute care hospital, transfer to psychiatric facility, transfer to skilled nursing facility, transfer to physical rehab center, left against medical advice, expired, unspecified 
VariableNotes
Gender Categories: female, male 
Age Defined as days per 365 
Race and ethnicity Categories: African American or Black, American Indian or Alaska Native, Asian, Hispanic, Native Hawaiian or Pacific Islander, white, unspecified 
Year of admission NA 
Year of discharge NA 
Hospital discharge day of week NA 
PICU admission day of week NA 
Hospital length of stay In days 
PICU length of stay In days 
Discharge disposition Categories: home, transfer to another acute care hospital, transfer to psychiatric facility, transfer to skilled nursing facility, transfer to physical rehab center, left against medical advice, expired, unspecified 

NA, not applicable.

A limitation of any linkage is the ability to assess linkage accuracy in the absence of a gold-standard set to compare results. Without a gold-standard, researchers can estimate error rates based on the calculated probability of 2 records being a match without being able to know if they are a true match. To overcome this limitation, we created a gold standard set of matches for patients admitted to our PICU by first joining records that matched on the financial identification number (a patient and hospitalization specific identifier). Next, all remaining pairs agreeing on medical record number (a patient specific identifier) and date of hospital admission were joined. Finally, any remaining pairs agreeing on date of birth and date of PICU admission were also joined. For VPS records matching to multiple PHIS records, we identified the true match by manual chart review.

We compared probabilistic linkage to the gold standard to identify false-positive and -negative matches. To assess the generalizability of the matched records, we compared the demographic and clinical characteristics of matched and unmatched VPS records using SAS 9.4 (SAS Institute, Inc, Cary, NC). As VPS data are based on manual chart review, we considered the subjects in the VPS database the “true” number of PICU admissions.

We obtained 4264 VPS and 4778 PHIS records for linkage. Figure 1 summarizes the flow of records for linkage. The VPS data contained 920 patients with a primary PICU diagnosis of bronchiolitis and a PICU stay >1 day, with all 920 matched in the gold standard. Probabilistic linkage joined 839 of 920 (91%) VPS records with 4 (0.5%) false matches when compared with the gold standard. We found no significant differences in demographic or clinical characteristics between the probabilistically linked and unlinked records (Table 3).

FIGURE 1

Flowchart of records. VPS records were probabilistically and deterministically (“Gold Standard”) matched to a PHIS record. In probabilistic linkage, 81/920 (9%) VPS records of patients with bronchiolitis did not match to a PHIS record with a match probability ≥0.99. In deterministic linkage, all 920 VPS records of patients with bronchiolitis matched to a PHIS record.

FIGURE 1

Flowchart of records. VPS records were probabilistically and deterministically (“Gold Standard”) matched to a PHIS record. In probabilistic linkage, 81/920 (9%) VPS records of patients with bronchiolitis did not match to a PHIS record with a match probability ≥0.99. In deterministic linkage, all 920 VPS records of patients with bronchiolitis matched to a PHIS record.

Close modal
TABLE 3

Summary and Comparison of Demographics and Clinical Characteristics of Linked and Unlinked Records

CharacteristicLinked Records (n = 839)Unlinked Records (n = 81)P
Female, n (%) 372 (44%) 37 (46%) .82a 
Age, median years (IQR) 0.7 (0.3 to 1.5) 0.7 (0.2 to 1.5) .76b 
# CCC, median (IQR) 0 (0 to 1) 0 (0 to 1) .16b 
PIM 3 score, median (IQR) −4.6 (−4.7 to −4.4) −4.6 (−4.7 to −4.4) .44b 
Intubated, n (%) 233 (28) 19 (24) .41a 
PICU LOS, median days (IQR) 4 (3 to 6) 4 (2 to 6) .36b 
CharacteristicLinked Records (n = 839)Unlinked Records (n = 81)P
Female, n (%) 372 (44%) 37 (46%) .82a 
Age, median years (IQR) 0.7 (0.3 to 1.5) 0.7 (0.2 to 1.5) .76b 
# CCC, median (IQR) 0 (0 to 1) 0 (0 to 1) .16b 
PIM 3 score, median (IQR) −4.6 (−4.7 to −4.4) −4.6 (−4.7 to −4.4) .44b 
Intubated, n (%) 233 (28) 19 (24) .41a 
PICU LOS, median days (IQR) 4 (3 to 6) 4 (2 to 6) .36b 

CCC, complex chronic condition; IQR, interquartile range; LOS, length of stay; PIM 3, Pediatric Index of Mortality – 3.

a

χ2 test.

b

Mann–Whitney U test.

We conducted this linkage project to demonstrate that probabilistic linkage can successfully link records between VPS and PHIS without PHI. Creating this linked database removes a significant barrier to researching care in the PICU by creating a data set with granular demographic, resource use, comorbidities or severity of illness, and outcome data. As our group is interested in studying bronchiolitis care, we simultaneously used PHI to create a gold standard linked database and assessed the accuracy and generalizability of our probabilistic linkage method to link records of patients with bronchiolitis in the PICU. Our method linked a majority of records (91%) with high accuracy (0.5% false-positives). We found no difference in demographic and clinical characteristics between linked and unlinked records, indicating that the probabilistic linkage technique creates an unbiased set of linked records.

Our results compare favorably to recent work using probabilistic linkage to combine pediatric administrative and clinical databases. Bennet et al combined PHIS and National Trauma Data Bank data utilizing demographic and procedure data and linked 88% of subjects with severe traumatic brain injury.8  Dziorny et al combined VPS and PEDSnet data using demographic and granular laboratory and vital sign data to match 93% of subjects with sepsis.9  Together, these studies highlight the ability of probabilistic linkage to create novel research databases and overcome the limitations of deterministic linkage. Probabilistic linkage can be successful provided there is sufficient information content in the common variables to separate true and false record matches.8 

Our pilot linkage has demonstrated the accuracy of our linkage method; thus, our next step is to create a national linked PHIS-VPS database. We are in the process of obtaining PHIS and VPS data from the 38 hospitals that contribute to both databases. A potential limitation of scaling our method to multiple sites is increased false-negative links by attempting to link data sets with more subjects.21  However, both PHIS and VPS can create common hospital groupings, which allows us to link records within the smaller subgroups similar in size to the linkage presented here. We are also in the process of obtaining specific, deidentified variables such as recoded dates and record numbers to use in a combined deterministic and probabilistic linkage strategy.

A probabilistically linked PHIS-VPS database has important implications for research. First, a barrier to combining PHIS-VPS data has been the need to obtain PHI from multiple sites to enable deterministic linkage methods. Obtaining site-level PHI requires a time and logistic intense approval process that limits site participation. In fact, a previous attempt to link VPS and PHIS using site-level PHI yielded participation from only a third of sites contributing data to both databases.22  A probabilistic linkage without PHI allows a single site (and single IRB approval) to quickly obtain data from PHIS and VPS for all common sites, join as needed, and perform the desired analysis. Further a linked PHIS-VPS database contains the information needed for CER: PICU diagnosis (VPS) for cohort identification; resources used (PHIS) and procedures and support modalities (VPS) for exposures; mortality (PHIS and VPS), length of stay (PHIS and VPS), and duration of support modalities (VPS) for outcomes; and severity of illness scores (VPS) and patient demographics (PHIS and VPS) for confounder adjustment. Leveraging the fact the PHIS contains data from an entire hospitalization, researchers can examine how PICU and non-PICU periods of care affect each other. Finally, although we focused on assessing the accuracy of probabilistic linkage to study bronchiolitis, these methods may be applied to other PICU conditions. As linkage success is associated with the numbers of records to be linked and the distribution of variables, we expect any condition with a similar or smaller prevalence and similar variable distribution to bronchiolitis to have similar linkage accuracy as our results.8,21 

Use of a PHIS-VPS database to study bronchiolitis has limitations. First, potential variability in coding of linkage variables between sites could alter the linkage accuracy. We mitigated this by choosing linkage variables with standard PHIS and VPS data definitions. Variability in coding “PICU” status in PHIS may affect patient identification as patients in the PICU for <1 day may not trigger PICU stay billing codes. We chose to make VPS records, which are manually identified as PICU patients, our primary database for linkage, retained only the best match for each VPS record, and focused on subjects with a PICU stay >1 day to minimize this limitation. Finally, PHIS sites are children’s hospitals; thus, they are not generalizable to community based PICUs.

Probabilistic linkage accurately combines PHIS and VPS data without PHI to enable CER. Although we focused on validating the linkage for patients with bronchiolitis, our method can be applied to create a data set to study PICU care for a large variety of conditions.

VPS data were provided by Virtual Pediatric Systems, LLC. No endorsement or editorial restriction of the interpretation of these data or opinions of the authors has been implied or stated.

Dr Flaherty conceptualized and designed the study, led analysis and interpretation, and drafted the initial manuscript; Drs Srivastava, Cook, and Keenan supervised the conceptualization and design of the study, and supervised analysis and interpretation; Ms Smith contributed to the design of the study and conducted analysis and interpretation of data; Dr Dziorny contributed to the analysis and interpretation of data; and all authors critically reviewed and revised the manuscript, and approved the final manuscript as submitted.

FUNDING: Support for the research was provided by Dr Flaherty’s institution in the form of an Intermountain Foundation at Primary Children’s Hospital Early Career Development Research Grant and by the National Center for Advancing Translational Sciences of the National Institutes of Health under Award UM1TR004409. Additionally, Dr Flaherty’s effort in this project was supported in part by a Utah Clinical and Translational Science Institute (CTSI) Partner Scholars Program Award.

CONFLICT OF INTEREST DISCLOSURES: Dr Dziorny reports that he received funding from the American Academy of Pediatrics (AAP) to travel and speak at the 2022 AAP National Conference and Exhibition; Dr Srivastava reports support from the IPASS Patient Safety Institute, is a physician founder of the IPASS Patient Safety Institute and his equity is owned by his employer, Intermountain Healthcare, has active grants from the following federal agencies: PCORI, NIH, AHRQ, CDC (the grant funds are paid to his institution outside the submitted work), and has received monetary awards, honoraria, and travel reimbursement from multiple academic and professional organizations for teaching and consulting on quality of care, spreading evidence-based best practices in health systems and pediatric hospital medicine.

1
Zimmerman
JJ
.
President’s Message: Research in the ICU: It’s What We (Should) Do
.
Critical Connections
.
2018
;
4
5
2
Zimmerman
JJ
.
President’s Message: SCCM Tools Promote High-Value ICU Care
.
Critical Connections
.
2018
;
4
3
Dreyer
NA
,
Tunis
SR
,
Berger
M
,
Ollendorf
D
,
Mattox
P
,
Gliklich
R
.
Why observational studies should be among the tools used in comparative effectiveness research
.
Health Aff (Millwood)
.
2010
;
29
(
10
):
1818
1825
4
Institute of Medicine
.
Initial National Priorities for Comparative Effectiveness Research
.
The National Academies Press
;
2009
:
252
5
Wetzel
RC
.
First get the data, then do the science!
.
Pediatr Crit Care Med
.
2018
;
19
(
4
):
382
383
6
Bennett
TD
,
Spaeder
MC
,
Matos
RI
, et al
;
Pediatric Acute Lung Injury and Sepsis Investigators (PALISI)
.
Existing data analysis in pediatric critical care research
.
Front Pediatr
.
2014
;
2
:
79
7
Sayers
A
,
Ben-Shlomo
Y
,
Blom
AW
,
Steele
F
.
Probabilistic record linkage
.
Int J Epidemiol
.
2016
;
45
(
3
):
954
964
8
Bennett
TD
,
Dean
JM
,
Keenan
HT
,
McGlincy
MH
,
Thomas
AM
,
Cook
LJ
.
Linked records of children with traumatic brain injury. probabilistic linkage without use of protected health information
.
Methods Inf Med
.
2015
;
54
(
4
):
328
337
9
Dziorny
AC
,
Lindell
RB
,
Bennett
TD
,
Bailey
LC
.
Joining datasets without identifiers: probabilistic linkage of virtual pediatric systems and PEDSnet
.
Pediatr Crit Care Med
.
2020
;
21
(
9
):
e628
e634
10
Pelletier
JH
,
Au
AK
,
Fuhrman
D
,
Clark
RSB
,
Horvat
C
.
Trends in bronchiolitis ICU admissions and ventilation practices: 2010-2019
.
Pediatrics
.
2021
;
147
(
6
):
e2020039115
11
Fujiogi
M
,
Goto
T
,
Yasunaga
H
, et al
.
Trends in bronchiolitis hospitalizations in the United States: 2000-2016
.
Pediatrics
.
2019
;
144
(
6
):
e20192614
12
Chung
A
,
Reeves
RM
,
Nair
H
,
Campbell
H
;
RESCEU investigators
.
Hospital admission trends for bronchiolitis in Scotland, 2001-2016: a national retrospective observational study
.
J Infect Dis
.
2020
;
222
(
Suppl 7
):
S592
S598
13
Gill
PJ
,
Anwar
MR
,
Thavam
T
, et al
;
Pediatric Research in Inpatient Setting (PRIS) Network
.
Identifying conditions with high prevalence, cost, and variation in cost in us children’s hospitals
.
JAMA Netw Open
.
2021
;
4
(
7
):
e2117816
14
Ralston
SL
,
Lieberthal
AS
,
Meissner
HC
, et al
;
American Academy of Pediatrics
.
Clinical practice guideline: the diagnosis, management, and prevention of bronchiolitis
.
Pediatrics
.
2014
;
134
(
5
):
e1474
e1502
15
Chan
T
,
Rodean
J
,
Richardson
T
, et al
.
Pediatric critical care resource use by children with medical complexity
.
J Pediatr
.
2016
;
177
:
197
203.e1
16
Feudtner
C
,
Feinstein
JA
,
Zhong
W
,
Hall
M
,
Dai
D
.
Pediatric complex chronic conditions classification system version 2: updated for ICD-10 and complex medical technology dependence and transplantation
.
BMC Pediatr
.
2014
;
14
:
199
17
Keren
R
,
Luan
X
,
Localio
R
, et al
;
Pediatric Research in Inpatient Settings (PRIS) Network
.
Prioritization of comparative effectiveness research topics in hospital pediatrics
.
Arch Pediatr Adolesc Med
.
2012
;
166
(
12
):
1155
1164
18
Kirschen
MP
,
Francoeur
C
,
Murphy
M
, et al
.
Epidemiology of brain death in pediatric intensive care units in the United States
.
JAMA Pediatr
.
2019
;
173
(
5
):
469
476
19
Mongelluzzo
J
,
Mohamad
Z
,
Ten Have
TR
,
Shah
SS
.
Corticosteroids and mortality in children with bacterial meningitis
.
JAMA
.
2008
;
299
(
17
):
2048
2055
20
Dusetzina
SB
,
Tyree
S
,
Meyer
A-M
,
Meyer
A
,
Green
L
,
Carpenter
WR
.
An Overview of Record Linkage Methods
.
Agency for Healthcare Research and Quality
;
2014
21
Cook
LJ
,
Olson
LM
,
Dean
JM
.
Probabilistic record linkage: relationships between file sizes, identifiers and match weights
.
Methods Inf Med
.
2001
;
40
(
3
):
196
203
22
Gupta
P
,
Richardson
T
,
Hall
M
, et al
.
Effect of inhaled nitric oxide on outcomes in children with acute lung injury: propensity matched analysis from a linked database
.
Crit Care Med
.
2016
;
44
(
10
):
1901
1909
23
Singleton
MD
.
Differential protective effects of motorcycle helmets against head injury
.
Traffic Inj Prev
.
2017
;
18
(
4
):
387
392
24
Olsen
CS
,
Thomas
AM
,
Singleton
M
, et al
.
Motorcycle helmet effectiveness in reducing head, face and brain injuries by state and helmet law
.
Inj Epidemiol
.
2016
;
3
(
1
):
8
25
Han
GM
,
Newmyer
A
,
Qu
M
.
Seat belt use to save face: impact on drivers’ body region and nature of injury in motor vehicle crashes
.
Traffic Inj Prev
.
2015
;
16
(
6
):
605
610
26
Curry
AE
,
Metzger
KB
,
Pfeiffer
MR
,
Elliott
MR
,
Winston
FK
,
Power
TJ
.
Motor vehicle crash risk among adolescents and young adults with attention-deficit/hyperactivity disorder
.
JAMA Pediatr
.
2017
;
171
(
8
):
756
763
27
Carlson
KF
,
Gilbert
TA
,
Morasco
BJ
, et al
.
Linkage of VA and state prescription drug monitoring program data to examine concurrent opioid and sedative-hypnotic prescriptions among veterans
.
Health Serv Res
.
2018
;
53
(
Suppl 3
):
5285
5308
28
Ashraf
AJ
,
Gilbert
TA
,
Holmer
HK
,
Cook
LJ
,
Carlson
KF
.
Receipt of concurrent VA and non-VA opioid and sedative-hypnotic prescriptions among post-9/11 veterans with traumatic brain injury
.
J Head Trauma Rehabil
.
2021
;
36
(
5
):
364
373

Supplementary data