Hospital-associated venous thromboembolism (HA-VTE) is an increasing cause of morbidity in pediatric populations, yet identification of high-risk patients remains challenging. General pediatric models have been derived from case-control studies, but few have been validated. We developed and validated a predictive model for pediatric HA-VTE using a large, retrospective cohort.
The derivation cohort included 111 352 admissions to Monroe Carell Jr. Children’s Hospital at Vanderbilt. Potential variables were identified a priori, and corresponding data were extracted. Logistic regression was used to estimate the association of potential risk factors with development of HA-VTE. Variable inclusion in the model was based on univariate analysis, availability in routine medical records, and clinician expertise. The model was validated by using a separate cohort with 44 138 admissions.
A total of 815 encounters were identified with HA-VTE in the derivation cohort. Variables strongly associated with HA-VTE include history of thrombosis (odds ratio [OR] 8.7; 95% confidence interval [CI] 6.6–11.3; P < .01), presence of a central line (OR 4.9; 95% CI 4.0–5.8; P < .01), and patients with cardiology conditions (OR 4.0; 95% CI 3.3–4.8; P < .01). Eleven variables were included, which yielded excellent discriminatory ability in both the derivation cohort (concordance statistic = 0.908) and the validation cohort (concordance statistic = 0.904).
We created and validated a risk-prediction model that identifies pediatric patients at risk for HA-VTE development. We anticipate early identification of high-risk patients will increase prophylactic interventions and decrease the incidence of pediatric HA-VTE.
Pediatric patients have risk factors for hospital-associated venous thromboembolism (HA-VTE) events that are different from adults, and these risks are not well captured by an existing risk-prediction model based on patients’ characteristics available at hospital admission.
We developed and validated a general pediatric HA-VTE risk-prediction model that is automatically calculated from the electronic medical record and available on admission. We anticipate that this will improve detection of high-risk patients before HA-VTE events occur.
Hospital-associated venous thromboembolism (HA-VTE) has been identified as an increasing cause of morbidity and mortality in pediatrics. Although it is overall a rare complication among children, annual rates have been increasing, with a 70% increase reported in pediatric HA-VTE.1,2 Despite the increasing incidence, many treatment recommendations for pediatric HA-VTE have been extrapolated from adult data.3 Children with HA-VTE experience longer hospital stay, increased medical costs,4 and subsequent medical complications.5,6 Therefore, it is important to identify at-risk patients as early as possible. Risk stratification should also be as specific as possible because interventions include pharmacologic anticoagulation that places patients at higher risk for bleeding complications, requires frequent laboratory safety monitoring, and typically involves injections.7 Given these risks, it is imperative that we develop a more personalized, evidenced-based approach to identify children at highest risk for HA-VTE so that appropriate interventions can be initiated.
In previous studies, authors have sought to identify risk factors among pediatric patients; however, these authors have evaluated subgroups of pediatric patients, including patients undergoing surgical procedures,8–11 patients in the ICU,12,13 and patients with malignancies.14,15 These have formed a framework of known risk factors on which risk-prediction models have been based. Other risk-prediction models have been developed to evaluate pediatric patients at risk for HA-VTE development16–19 ; however, they were developed from case-control studies and are limited to pediatric subpopulations as described above.20–24 Given that risk-prediction models have been shown to identify patients at elevated risk for venous thromboembolism (VTE) events better than physician judgement alone,25,26 the goal was to develop and validate a general pediatric predictive model for HA-VTE by using a large, single-center, retrospective cohort.
Methods
Study Population
The study was approved by the Vanderbilt University Medical Center Institutional Review Board 180116. Retrospective data were obtained on patient admission encounters to Monroe Carell Jr. Children’s Hospital at Vanderbilt from January 1, 2010, to October 31, 2017. Data were obtained from the Vanderbilt Research Derivative, a database of clinical data, derived from the electronic data warehouse and restructured for research. Admission encounters with HA-VTE were identified on the basis of International Classification of Diseases, Ninth Revision (ICD-9) and International Classification of Diseases, 10th Revision (ICD-10) codes for acute deep vein thrombosis or acute pulmonary embolism (Supplemental Table 5). The diagnostic codes used to define VTE are similar to previous studies,16,17 but did not include superficial thrombotic events, thrombophlebitis, or chronic thromboses. Encounters without HA-VTE were selected on the basis of the lack of corresponding ICD-9 or ICD-10 codes. A subset of records (10% of encounters with HA-VTE and a matching number of encounters [0.1% overall] without HA-VTE) were manually reviewed by a blinded member of the research team to assess the accuracy of International Classification of Diseases–based identification of VTE diagnosis, including imaging-confirmed VTE during the hospital admission.
The study period was chosen to provide a large sample size with consistent hematology practice and one electronic medical record (EMR). The medical center transitioned to a new EMR system on November 2, 2017, and there was concern that the transition period may affect the quality of data obtained; therefore, the derivation study period ended before this date. All patient admission encounters were initially included, but patients were excluded if they were >21 years of age or if they were receiving anticoagulation before admission. Separate admission encounters for the same patient were included in the analysis.
Derivation Cohort Data Collection
Potential risk factors were initially identified a priori from the literature and expert opinion and the corresponding data were extracted from the EMR for all patients during the study period. Demographic, diagnostic, and laboratory data were evaluated.
Age was calculated on the basis of patients’ birth date and date of admission. Sex and ethnicity were self- or parent-reported at hospital registration. Personal history of thrombosis was obtained from ICD-9 or ICD-10 codes (Supplemental Table 6). Patients admitted to the ICU were identified on the basis of admission unit. Patients with malignancy or hemoglobinopathies were identified from ICD-9 or ICD-10 codes. Patients evaluated by the cardiology or infectious diseases services during their admission were identified via documentation, with a note from the cardiology or infectious diseases service saved to the medical record. Both patients admitted to these teams as well as patients who received a consult from these services were included. Patients undergoing surgical procedures and patients with central venous catheters (CVCs) were identified by Current Procedural Terminology (CPT) codes (Supplemental Table 7).
All laboratory data were obtained directly and were limited to the day of admission. Laboratory values included serum chemistries (potassium, sodium, creatinine, serum urea nitrogen, glucose), complete blood count components (total white blood count, hemoglobin, platelet count, mean corpuscular volume, mean corpuscular hemoglobin concentration [MCHC], red cell distribution width [RDW], automated neutrophil count), and additional biomarkers (lactate, C-reactive protein, erythrocyte sedimentation rate, and partial thromboplastin time). Missing values were replaced with imputed median data. Known laboratory variables that have been reported to have poor outcomes27 in pediatric patients with VTE include factor VIII activity level and d-dimer quantification. Because these studies are routinely obtained at this institution after diagnosis of VTE, these were reviewed but not included in the final model because of concerns about reverse causation.
Temporal Validation Cohort Data Collection
Additional retrospective data from November 2, 2017, to January 31, 2020, were obtained from the EMR (Institutional Review Board 200334) consisting of 44 138 admission encounters to Monroe Carell Jr. Children’s Hospital at Vanderbilt. Admission encounters were identified identically to the derivation cohort. We identified 43 454 admission encounters without HA-VTE and 684 encounters with HA-VTE. Given that the medical center had transitioned to a different EMR, patients with infectious disease and cardiology conditions were identified on the basis of consult and admission orders rather than via documentation. Other variables were identified in the same manner as they were in the derivation cohort.
Statistical Analysis
Patient variables were summarized by using counts and proportions for categorical variables, and medians with 95% confidence intervals (CIs) were used for continuous variables. Potential predictor variables were tested for association with HA-VTE by using an uncorrected χ2 test or Mann–Whitney U test for categorical and continuous variables, respectively. A 2-sided P value of <.05 was used to indicate statistical significance.
Risk of HA-VTE was estimated by using a logistic regression model. Variable inclusion in the model was based on expert clinical opinion (eg, CVC placement was included for face validity), availability within the EMR within 24 hours of admission, and univariate association with HA-VTE. Using a conservative limit of 15 HA-VTE events per predictor variable, with 815 events in the derivation cohort, the model could accommodate up to 54 variables without overfitting. Interactions in the model were not prespecified, so no interactions were included. Nonlinearity of predictor variables was assessed by using 5-knot restricted cubic splines. The predictive accuracies of the linear and nonlinear models were compared by using concordance statistics. The predictive accuracy of the final model was evaluated by using a concordance statistic, positive and negative predicted values (at risk thresholds of 5% and 10%), and a bootstrap calibration curve. The model was reviewed to determine if the imputed data caused meaningful changes to the model’s accuracy. Subgroup analyses were performed by calculating the concordance statistic for subpopulations within the derivation cohort, identified as described above. In addition, the model was assessed in a separate temporal validation cohort.
Statistical analysis was performed in R version 3.6.1 with the “Hmisc” and “rms” (regression modeling strategies) extension packages. SPSS version 27 (IBM SPSS Statistics, IBM Corporation) was also used for model development and data review. Transparent reporting of a multivariable prediction model for individual prognosis or diagnosis (TRIPOD) reporting guidelines for prognostic studies were followed.28
Results
Patient Characteristics
The derivation cohort included a total of 111 352 patient admission encounters to Monroe Carell Jr. Children’s Hospital at Vanderbilt. The validation cohort included a total of 44 138 additional patient admission encounters. The characteristics of the encounters for the derivation and validation cohorts are described in Table 1. The median age of the admitted patients was slightly lower in the derivation cohort (1.4 years) than the validation cohort (2.5 years). The sex distribution was similar in both cohorts, with slightly more admissions of male patients than female patients. Racial distribution across both cohorts shows the majority of patients identified as white, with the second largest racial group being African American; most patients did not identify as Hispanic. An increase in the number of patients with cardiology and infectious disease conditions and the number of patients with CVCs were demonstrated in the validation cohort. Laboratory values for RDW, MCHC, and lactate were similar across both cohorts.
In the derivation cohort, the encounters were divided into 110 537 patient encounters without HA-VTE and 815 patient encounters with HA-VTE. These data were used to develop the model as described above (Table 2). Given that case definition was defined by ICD-9 or ICD-10 codes, we formally evaluated coding accuracy through an analysis of 160 random records (80 records from the case group and 80 from the control group). In this analysis, no HA-VTE events were identified in the control group and all patients with HA-VTE were confirmed to have a thrombus. The derivation cohort provided data to determine that the thrombosis prevalence at our institution is 0.7%. Seasonal variation was not observed the basis of a review of HA-VTE events by month. The HA-VTE risk percentage score is derived by using the equation shown in Supplemental Fig 4. The distribution of the risk percentage score across the derivation cohort is shown in Supplemental Fig 5, and the distribution of the risk percentage score divided into patients with and without HA-VTE is in Supplemental Fig 6.
Model Description
Eleven variables were included in the final logistic regression model (Fig 1). Variables that exhibited a statistically significant association in univariate analysis were not automatically selected to be included in the final model for a variety of reasons, including ease of extractability in the medical record system, timing of availability around admission, and concern for reverse causation. Race and ethnicity were not included in the model to avoid introducing a potential bias. Further details regarding the model building process are detailed in the Supplemental Table 8. With 815 encounters with HA-VTE in the derivation cohort, the final model was fitted by using 74.1 events per degree of freedom, which demonstrates that the model does not overfit the data. Out of >40 potential variables, the variables that exhibited strong associations with developing HA-VTE, in order of importance, were presence of a CVC (odds ratio [OR] 4.9; 95% CI 4.0–5.8; P < .01), a history of thrombosis (OR 8.7; 95% CI 6.6–11.3; P < .01), and cardiology conditions (OR 4.0; 95% CI 3.3–4.8; P < .01). Additional significant variables included whether a blood gas was performed, whether the patient had an infectious disease condition, patient age, MCHC, RDW, and lactate. The final model yielded excellent discriminatory ability (concordance statistic = 0.908; 95% CI 0.896–0.918) (Fig 2A). Comparing the predictive accuracies of the models with linear and nonlinear terms, it was determined that there was no gain in predictive accuracy by using the nonlinear model (concordance statistic in the linear model = 0.908 vs 0.909 in the nonlinear model). The correlation matrix and the model SEs were reviewed, and there was no evidence of substantial collinearity. Additional evaluation of the model in various pediatric subpopulations (Table 3), including racial and ethnic subgroups, confirmed that the model performed well within these subpopulations.
The positive predictive value was 20.1% with a negative predictive value of 99.5% at a risk threshold of 10% (Supplemental Fig 7A); at a risk threshold of 5%, the positive predictive value decreased to 13.6% and the negative predictive value increased to 99.6% (Supplemental Fig 7B).
Model Temporal Validation
The validation cohort was divided into 43 454 encounters without HA-VTE and 684 (1.5%) encounters with HA-VTE. The data from the separate validation data set were reviewed to confirm that there were similar characteristics to the derivation cohort (Table 1), and overall, there were no substantial differences between the cohorts. Increases in patients with CVCs, patients with cardiology conditions, and patients with infectious disease conditions were all felt to reflect true increases in clinical practice over time. The original model was applied to the validation data, without reestimating of the coefficients, and provided a concordance statistic of 0.904 (95% CI 0.894–0.913) (see Fig 2B). The calibration curve (Fig 3B) indicated adequate calibration at lower values of predicted risk (<20%). However, <1% of patients in either cohort had predicted risk scores that reached levels ≥10%; therefore, we are confident that the model performs well for the majority of patients with average risk of HA-VTE.
Discussion
In this large, single-center study, we developed and validated a risk-prediction model to identify pediatric patients at risk for developing HA-VTE, with excellent discrimination and adequate calibration from common and easily retrievable variables in the EMR. The overarching goal was to develop one model that would identify pediatric patients early who are at risk for developing an HA-VTE by using the identified variables to compute a personalized risk probability for each pediatric patient on admission to the medical center.
Compared to other published models (Table 4), this model was designed to identify patients at risk starting at admission in contrast with models that include length of stay (LOS) as a variable.13,16,17,29 The model was also designed to use automatically extracted data from the EMR, which required slight adaptation of other published pediatric HA-VTE risk factors. Certain risk factors can be easily extracted from the EMR and exhibited significant associations with HA-VTE in this analysis and previously published studies: age,8,19,22,24,26 presence of a CVC,13,17–19,22,24,26,30 and personal history of VTE.19,30 Systemic infection or bacteremia is used in multiple published models,13,16,17,29 whereas this model uses whether the patient has an infectious disease condition as an easily extractable surrogate risk factor. Many previously published models were designed for specific subpopulations of patients8,10,13,15,16,22,23 ; in contrast, we designed this model to be widely applicable across a general pediatric hospitalized population. Moreover, whereas previous models captured risk in separate models for trauma22,24,26 and other patients undergoing surgery,8,10 patients in the ICU,13 and patients with acute lymphoblastic leukemia,23 in this study, variables common in the subpopulations are captured in the general pediatric model. For example, rather than having a separate model for patients undergoing surgical procedures, surgery is included as a predictor in this general model. This strength is further demonstrated by the subgroup analyses performed on the derivation cohort (Table 3), which shows this model’s strong discriminatory capability across multiple patient populations.
Other published pediatric HA-VTE risk-prediction models do not include laboratory values. A growing body of evidence in adult patients suggests that increased RDW is a predictive indicator of VTE.31 Similarly, in a study evaluating laboratory values in Bechet disease, with deep vein thrombosis noted as a common complication, the authors report decreased MCHC levels are associated with more severe disease.32 Elevated lactate has been demonstrated to contribute to a prothrombotic environment in adult patients who are undergoing cardiac surgery requiring cardiopulmonary bypass.33 Each of these laboratory values in this pediatric population exhibits a significant association with HA-VTE in the univariate analysis and in combination in the logistic regression model.
Compared with existing models, this risk-prediction model provides output information in the form of a probability, which is a continuous calculation more specific than traditional point risk scores. This avoids dichotomization into large risk groups and allows for more granular and meaningful specificity. The model is computed within the first 24 hours of admission on the basis of information in the EMR and can be updated as new data are available.
Limitations
This study has limitations that should be considered. Because it is a single-center study, we cannot speak to the generalizability of the results; however, our center is a tertiary care children’s hospital that serves a large population of patients with demographics consistent with other academic institutions. The derivation and temporal validation cohorts arose from the same medical center. Future research is needed to assess the external validity of the VTE risk-prediction model at other institutions. Most of the predictors used in this VTE risk-prediction model are available at other medical centers, and we are planning to continue to evaluate the model at additional sites in the future. Smaller community hospitals may not have the consulting services available, and we hope to address this in future iterations of the model. Similarly, some values (eg, lactate) were not available in all patients because of the clinical needs of the patient. Sensitivity analyses showed no substantial differences between models in which the data were imputed from those in which data were treated as missing (Supplemental Fig 8); moreover, because we envision future use of the risk-prediction model to occur in the background of an electronic health record, we do not anticipate unintentional increases in obtaining laboratory studies unique to this model that would not be clinically indicated otherwise.
We chose to use data that could be routinely extracted in an automated fashion from the EMR. For instance, the Braden score is one measurement of immobility; however, these data could not be readily extracted from the EMR. Future models may benefit from inclusion of variables that objectively measure immobility, a known adult risk factor for VTE.
Case identification was performed with ICD-9 or ICD-10 codes, which may have resulted in misidentification of admission encounters with HA-VTE and encounters without HA-VTE. Diagnosis codes are known to have reduced positive predictive value, particularly when in a secondary position, although recent updates to thromboembolism codes may reduce this occurrence.34 In our chart review, we found the coding was accurate, although we recognize that this is a small subset of total records. It is also important to note that the diagnostic codes corresponding to personal history of thrombosis are often underused. Nevertheless, these codes are readily available and make large database studies feasible during early model development.35 The use of ICD-9 and ICD-10 codes in model development also restricted us to using a binary assessment of VTE rather than a time-to-VTE model.
Finally, the model calibration at higher risk percentages (>20%) is less accurate than at lower risk percentages (Fig 3). Given that the model is designed to bring general pediatric patients at elevated risk of developing HA-VTE to the attention of a hematologist, we believe that loss of precision at high percentages is less clinically meaningful. The model does not seek to exclude VTE as a diagnosis, nor does it attempt to differentiate risk percentages among higher-risk patients (ie, those >10%–20%); rather, it seeks to identify individuals upon admission with characteristics that increase their risk, however modestly, of this uncommon clinical event.
Future Directions
Although pediatric VTE remains an uncommon diagnosis, the national incidence continues to rise.1,2 Developing a clinical tool to better identify pediatric patients who are at an elevated risk for HA-VTE development would allow for potential prophylactic interventions to be initiated before thrombus development. By developing a tool that considers each patient’s specific risk factors, we can provide individuals with an evidence-based, precision medicine approach to reduce HA-VTE occurrence. We are currently conducting a pragmatic randomized trial (clinicaltrials.gov [identifier NCT04574895]) to determine if hematology review of patients with elevated risk derived from this model can increase the number of elevated-risk patients being placed on HA-VTE prophylaxis compared with the current standard of care. To accomplish this, we will evaluate the model’s performance not only upon admission but also on subsequent hospital days, during which the clinical circumstances for the patient may change (eg, placement of a central line). We will also evaluate whether these notifications and subsequent interventions will decrease overall pediatric HA-VTE at our hospital.
Conclusions
We developed and temporally validated a general pediatric HA-VTE risk-prediction model, which is currently being evaluated with a randomized pragmatic trial at Monroe Carell Jr. Children’s Hospital at Vanderbilt. We consider this risk-prediction model to be a valuable clinical decision tool, and we continue to assess how to best use it to identify patients at elevated risk for HA-VTE as early as possible. Despite low event numbers compared with adults, pediatric HA-VTE rates continue to rise, and they can have long-term adverse effects. Future studies into optimal prevention and treatment techniques continue to be necessary, particularly among pediatric populations.
Acknowledgments
This project was supported by the Advanced Vanderbilt Artificial Intelligence Laboratory and Learning Healthcare System groups at Vanderbilt University Medical Center. We also appreciate Ryan Moore, MS, for his assistance with this work.
Dr Wheeler conceptualized and designed the study, researched the literature, identified the data to be exported, and reviewed and revised the manuscript; Mr Byrne, Mr Domenico, and Dr French performed the statistical analysis, conceptualized and designed the study, and drafted, reviewed, and revised the manuscript; Dr Walker analyzed the data for the models, drafted the initial manuscript, and revised the manuscript; Dr Creech mentored Dr Walker, reviewed data, and critically reviewed the manuscript for important intellectual content; and all authors approved the final manuscript as submitted and agree to be accountable for all aspects of the work.
FINANCIAL DISCLOSURE: The authors have indicated they have no financial relationships relevant to this article to disclose.
FUNDING: Research reported in this publication was supported by the National Center for Advancing Translational Sciences of the National Institutes of Health under award UL1 TR000445. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health. This project is being funded through the Vanderbilt University Medical Center Pathology, Microbiology, and Immunology Innovation Fund, an endowment established for the purpose of supporting Pathology, Microbiology, and Immunology Chair–approved special projects within the Vanderbilt University Medical Center Department of Pathology, Microbiology, and Immunology. Funded by the National Institutes of Health (NIH).
POTENTIAL CONFLICT OF INTEREST: The authors have indicated they have no potential conflicts of interest to disclose.
- CI
confidence interval
- CPT
Current Procedural Technology
- CVC
central venous catheter
- EMR
electronic medical record
- HA-VTE
hospital-associated venous thromboembolism
- ICD-9
International Classification of Diseases, Ninth Revision
- ICD-10
International Classification of Diseases, 10th Revision
- LOS
length of stay
- MCHC
mean corpuscular hemoglobin concentration
- OR
odds ratio
- RDW
red cell distribution width
- VTE
venous thromboembolism