Multiple scores exist to characterize organ dysfunction in children.
To review the literature on multiple organ dysfunction (MOD) scoring systems to estimate severity of illness and to characterize the performance characteristics of currently used scoring tools and clinical assessments for organ dysfunction in critically ill children.
Electronic searches of PubMed and Embase were conducted from January 1992 to January 2020.
Studies were included if they evaluated critically ill children with MOD, evaluated the performance characteristics of scoring tools for MOD, and assessed outcomes related to mortality, functional status, organ-specific outcomes, or other patient-centered outcomes.
Data were abstracted into a standard data extraction form by a task force member.
Of 1152 unique abstracts screened, 156 full text studies were assessed including a total of 54 eligible studies. The most commonly reported scores were the Pediatric Logistic Organ Dysfunction Score (PELOD), pediatric Sequential Organ Failure Assessment score (pSOFA), Pediatric Index of Mortality (PIM), PRISM, and counts of organ dysfunction using the International Pediatric Sepsis Definition Consensus Conference. Cut-offs for specific organ dysfunction criteria, diagnostic elements included, and use of counts versus weighting varied substantially.
While scores demonstrated an increase in mortality associated with the severity and number of organ dysfunctions, the performance ranged widely.
The multitude of scores on organ dysfunction to assess severity of illness indicates a need for unified and data-driven organ dysfunction criteria, derived and validated in large, heterogenous international databases of critically ill children.
Multiple organ dysfunction syndrome (MODS) in critically ill children remains associated with a high morbidity and persistently high mortality.1 A recent study utilizing the Virtual Pediatric Systems database, including nearly 200 000 PICU admissions, revealed a mortality of 10.3% among children with MODS compared with 0.7% in children without MODS.2 In MODS survivors, the risk of survival with poor functional status as assessed by the Pediatric Overall Performance Category/Pediatric Cerebral Performance Category was increased severalfold. Recent research into the pathophysiology of critical illness illustrates that different MODS phenotypes may reflect patient populations more likely to respond to distinct, targeted therapies. Reliable identification of patients with MODS is therefore required to: (1) accurately characterize epidemiology, (2) assist in prognostication, (3) select patient groups where risk/benefit of specific treatments may vary, and (4) efficiently enroll selected patients into targeted trials.
However, to date, diagnostic criteria of MODS remain a matter of debate and there is no agreement on a gold standard for MODS, which organs to include, and thresholds to define dysfunction for individual organ systems. A unified approach to MODS is further hampered by patient heterogeneity of previous studies. Some studies have focused primarily on the prediction of mortality, whereas others report on scores as a description of illness severity. Most scores for MODS have been in use for many years, but a comprehensive review of the performance of different scores is lacking.
As part of the Pediatric Organ Dysfunction Information Update Mandate (PODIUM) project, we aimed to review the literature on MODS scoring systems to characterize the performance characteristics of currently used scoring tools and clinical assessments for organ dysfunction in critically ill children.
Methods
The PODIUM taskforce sought to develop evidence-based criteria for organ dysfunctions in children. As part of this process, a subgroup on MODS (SW, PL, EJ, CC, JLW, LJS) reviewed the literature on MODS scoring systems. The present article reports on the systematic review on organ dysfunction scoring systems performed as part of PODIUM and provides a critical evaluation of the available literature with recommendations for future research. Details on data sources, study selection, data extraction, data synthesis, and risk of bias assessment utilized by the PODIUM collaborative are presented in the PODIUM Executive Summary.3
Results
Overview of Commonly Used Scores
Out of 1152 unique abstracts, 159 full texts were reviewed, of which 54 provided data on scores for the purpose of this review, as shown in the PRISMA flowchart (Fig 1), data tables (Supplemental Information, Supplemental Tables 1, 2, and 3), and risk of bias assessment summary (Supplemental Information, Supplemental Fig 1). Many scores have been developed and reported in critically ill children (Table 1). Scores show substantial differences in their scope (predictive, descriptive, diagnostic, Fig 2), number and type of variables assessed (Table 2 and Fig 3), suitability to measure organ dysfunctions, time frame, and applicability to different clinical settings.
Study flow diagram according to the Preferred Reporting Items for Systematic review and Meta-Analysis Protocols recommendations.
Study flow diagram according to the Preferred Reporting Items for Systematic review and Meta-Analysis Protocols recommendations.
Comparison of variables used to calculate commonly used organ dysfunction scores.
Comparison of variables used to calculate commonly used organ dysfunction scores.
Comparison of the Performance of Different Multiple Organ Dysfunction Scoring Tools
Criteria . | SCORES . | ||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
PIM-2/3 . | PRISM-III . | PELOD-2 (dPELOD-2) . | qPELOD . | SOFA . | pSOFA/mSOFA . | LODS . | MOSF/MODS (count) . | PeRF . | SICK . | PEDIA . | BEP . | TISS . | Arzeno 2015 . | Meyer 2005 . | |
Number of studies | 5 | 4 | 8 | 1 | 4 | 2 | 1 | 4 | 1 | 1 | 1 | 1 | 1 | 1 | 1 |
Validation | |||||||||||||||
Reference standard | Death | Death | Death | Death | Death | Death | Death | Death | Death | Death | Death | Death | Death | Death | Death |
Case-mix applied to | PICU, Mening | PICU, sepsis, RRT | PICU | Sepsis | PICU, RRT, GI | CICU, PICU | Hosp fever | PICU, ARDS, Sepsis | ARDS | Hosp fever, PICU | Hosp fever | Mening | PICU | PICU brain | PICU oncology |
Validity | |||||||||||||||
Construct (score reflects MODS?) | No | Yes | Yes | No | Yes | Yes | No | Yes | Yes | No | No | No | No | Yes | Yes |
Content (score includes all organs?) | No | Yes | No | No | Yes | Yes | No | Yes | Yes | No | No | No | No | Yes | Yes |
Criterion (↑score → ↑ death) | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes |
Discrimination | |||||||||||||||
AUROC | Poor–good | Mod–good | Good | Good | Mod–good | Good | Good | Good | Mod–good | Mod–good | Good | Mod | Mod | Mod | NR |
Calibration | Good | Good | Poor to good | Good | Mod to good | Good | Good | Good | Poor to good | Poor to good | Good | Unknown | Unknown | Good | Unknown |
Reliability | Unknown | Unknown | Unknown | Unknown | Unknown | Unknown | Unknown | Unknown | Unknown | Unknown | Unknown | Unknown | Unknown | Unknown | Unknown |
Ease of Use | Mod | Mod | Mod | Good | Good | Good | — | Good | — | — | — | Good | — | — | — |
Ease of interpretation | Mod | Mod | Good | Good | Good | Good | — | Good | — | — | — | Good | — | — | — |
External validity/ generalizability | Unknown | Unknown | Unknown | Unknown | Mod | Unknown | Unknown | Good | Unknown | Mod | Unknown | Unknown | Unknown | Unknown | Unknown |
Criteria . | SCORES . | ||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
PIM-2/3 . | PRISM-III . | PELOD-2 (dPELOD-2) . | qPELOD . | SOFA . | pSOFA/mSOFA . | LODS . | MOSF/MODS (count) . | PeRF . | SICK . | PEDIA . | BEP . | TISS . | Arzeno 2015 . | Meyer 2005 . | |
Number of studies | 5 | 4 | 8 | 1 | 4 | 2 | 1 | 4 | 1 | 1 | 1 | 1 | 1 | 1 | 1 |
Validation | |||||||||||||||
Reference standard | Death | Death | Death | Death | Death | Death | Death | Death | Death | Death | Death | Death | Death | Death | Death |
Case-mix applied to | PICU, Mening | PICU, sepsis, RRT | PICU | Sepsis | PICU, RRT, GI | CICU, PICU | Hosp fever | PICU, ARDS, Sepsis | ARDS | Hosp fever, PICU | Hosp fever | Mening | PICU | PICU brain | PICU oncology |
Validity | |||||||||||||||
Construct (score reflects MODS?) | No | Yes | Yes | No | Yes | Yes | No | Yes | Yes | No | No | No | No | Yes | Yes |
Content (score includes all organs?) | No | Yes | No | No | Yes | Yes | No | Yes | Yes | No | No | No | No | Yes | Yes |
Criterion (↑score → ↑ death) | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes |
Discrimination | |||||||||||||||
AUROC | Poor–good | Mod–good | Good | Good | Mod–good | Good | Good | Good | Mod–good | Mod–good | Good | Mod | Mod | Mod | NR |
Calibration | Good | Good | Poor to good | Good | Mod to good | Good | Good | Good | Poor to good | Poor to good | Good | Unknown | Unknown | Good | Unknown |
Reliability | Unknown | Unknown | Unknown | Unknown | Unknown | Unknown | Unknown | Unknown | Unknown | Unknown | Unknown | Unknown | Unknown | Unknown | Unknown |
Ease of Use | Mod | Mod | Mod | Good | Good | Good | — | Good | — | — | — | Good | — | — | — |
Ease of interpretation | Mod | Mod | Good | Good | Good | Good | — | Good | — | — | — | Good | — | — | — |
External validity/ generalizability | Unknown | Unknown | Unknown | Unknown | Mod | Unknown | Unknown | Good | Unknown | Mod | Unknown | Unknown | Unknown | Unknown | Unknown |
dPELOD-2, Pediatric Logistic Organ Dysfunction on day 1; SOFA, Sequential Organ Failure Assessment; mSOFA, modified sequential organ failure assessment; LODS, logistic organ dysfunction score; MOSF, multiple organ system failure; PeRF, pediatric respiratory failure score; SICK, signs of inflammation in children that can kill score; PEDIA, pediatric early death index for Africa score; BEP, base excess and platelet count at presentation score; TISS; therapeutic intervention scoring system; RRT, renal replacement therapy; CICU, cardiac ICU; Hosp fever, hospitalized patients with fever; ARDS, acute respiratory distress syndrome; Mening, meningococcal disease; AUROC, area under the receiver operating characteristic; Mod, moderate; NR, not reported; —, not applicable.
Comparison of Characteristics of Main Organ Dysfunction Assessment Tools
Scores . | PIM-3 . | PRISM-III and IV . | PELOD-2 (dPELOD-2) . | qPELOD . | pSOFA . | 2005 IPSDCC (Goldstein) . |
---|---|---|---|---|---|---|
Organ systems | Not specific to organs | Not specific to organs | Cardiovascular system, plus metabolic (lactate), respiratory, hematologic, renal, CNS | Cardiovascular system, CNS | Cardiovascular system, respiratory, hematologic, hepatic, renal, CNS | Cardiovascular system, respiratory, hematologic, hepatic, renal, CNS |
Main purpose (at time of design of the score) | Prediction of mortality, ICU benchmarking | Prediction of mortality, ICU benchmarking | Description of severity | Prediction of mortality in sepsis | Description of severity | Diagnosis of organ dysfunction |
Number of items | 11 | 17 | 10 | 3 | 8 (12 if counting Spo2 and individual inotropes) | 18 |
Number of laboratory items (number of laboratory items available as POC, ie, blood gas components, glucose, and lactate) | 2 (2) | 12 (5) | 6 (3) | 0 | 3 (1) | 9 (3) |
Development methods | Derivation cohort Australia, New Zealand, Ireland, and the United Kingdom n = 53 112 in 2010–2011 | Derivation cohort United States n = 10 078 in 2011–2013 | Derivation cohort French/Belgium n = 3671 in 2006–2007 | A priori (aligned with qSOFA and PELOD-2) | A priori (aligned with SOFA and PELOD-2) | A priori (expert statement) |
Validation/calibration | Multiple validations | Multiple validations | Multiple validations | NA | Multiple validations | NA |
Time frame | Within 60 min of admission, including first contact outside PICU by PICU team | First 4 h of PICU admission, minus 2 h to 4 h for laboratory variables | Daily every 24 h of PICU admission | First 24 h of PICU admission | Daily every 24 h of PICU admission | Not specified |
Patient information | Yes | Yes in PRISM-IV | No | No | No | No |
Treatment information | Yes (ventilation) | No | Yes (ventilation) | No | Yes (ventilation, vasoactives) | Yes (ventilation, vasoactives) |
Applicability outside PICU | Poor | Poor | Moderate | Very good | Moderate | Good |
Applicability in resource-limited setting | Good | Poor | Poor | Very good | Moderate | Moderate |
Scores . | PIM-3 . | PRISM-III and IV . | PELOD-2 (dPELOD-2) . | qPELOD . | pSOFA . | 2005 IPSDCC (Goldstein) . |
---|---|---|---|---|---|---|
Organ systems | Not specific to organs | Not specific to organs | Cardiovascular system, plus metabolic (lactate), respiratory, hematologic, renal, CNS | Cardiovascular system, CNS | Cardiovascular system, respiratory, hematologic, hepatic, renal, CNS | Cardiovascular system, respiratory, hematologic, hepatic, renal, CNS |
Main purpose (at time of design of the score) | Prediction of mortality, ICU benchmarking | Prediction of mortality, ICU benchmarking | Description of severity | Prediction of mortality in sepsis | Description of severity | Diagnosis of organ dysfunction |
Number of items | 11 | 17 | 10 | 3 | 8 (12 if counting Spo2 and individual inotropes) | 18 |
Number of laboratory items (number of laboratory items available as POC, ie, blood gas components, glucose, and lactate) | 2 (2) | 12 (5) | 6 (3) | 0 | 3 (1) | 9 (3) |
Development methods | Derivation cohort Australia, New Zealand, Ireland, and the United Kingdom n = 53 112 in 2010–2011 | Derivation cohort United States n = 10 078 in 2011–2013 | Derivation cohort French/Belgium n = 3671 in 2006–2007 | A priori (aligned with qSOFA and PELOD-2) | A priori (aligned with SOFA and PELOD-2) | A priori (expert statement) |
Validation/calibration | Multiple validations | Multiple validations | Multiple validations | NA | Multiple validations | NA |
Time frame | Within 60 min of admission, including first contact outside PICU by PICU team | First 4 h of PICU admission, minus 2 h to 4 h for laboratory variables | Daily every 24 h of PICU admission | First 24 h of PICU admission | Daily every 24 h of PICU admission | Not specified |
Patient information | Yes | Yes in PRISM-IV | No | No | No | No |
Treatment information | Yes (ventilation) | No | Yes (ventilation) | No | Yes (ventilation, vasoactives) | Yes (ventilation, vasoactives) |
Applicability outside PICU | Poor | Poor | Moderate | Very good | Moderate | Good |
Applicability in resource-limited setting | Good | Poor | Poor | Very good | Moderate | Moderate |
dPELOD-2, Pediatric Logistic Organ Dysfunction on day 1; qPELOD, Quick Pediatric Logistic Organ Dysfunction; CNS, central nervous system; Spo2, pulse oxygen saturation; POC, point of care; NA, not applicable.
In terms of predictive scores, the Pediatric Index of Mortality-3 (PIM-3)4 and the Pediatric Risk of Mortality-IV (PRISM-IV)5 scores (and their predecessors) are the most commonly used. However, because these scores are intended for PICU patients with and without organ dysfunctions, they may have limited applicability for assessment specific to patients with MODS.4,5 Although PIM-3 contains information on cardiovascular (systolic blood pressure), respiratory (need for mechanical ventilation) and neurologic dysfunction (dilated pupils), it does not lend itself to assessment of individual organ dysfunctions or MODS. The PRISM-IV physiologic score contains information on cardiac (heart rate, systolic blood pressure, temperature), neurologic (pupillary reactivity, mental status), respiratory (arterial PO2, pH, PCO2, total bicarbonate), hematologic (white blood cell count, platelet count, prothrombin, and partial thromboplastin time) and chemical score components (glucose, potassium, blood urea nitrogen, creatinine).
Commonly used descriptive scores include the Pediatric Logistic Organ Dysfunction Score-2 (PELOD-2)6 and, more recently, the pediatric Sequential Organ Failure Assessment (pSOFA).7 PELOD-2 assesses 5 (neurologic, cardiovascular, renal, respiratory, and hematologic) organ dysfunctions, and pSOFA includes 6 (including hepatic) organ dysfunctions. PELOD-2 was derived from a multicenter European PICU cohort. In contrast, pSOFA was constructed as a modification from the adult Sequential Organ Failure Assessment score with application of age-specific thresholds based on PELOD-2.
Diagnostic scores are designed to characterize presence of (multi)organ dysfunction for the purpose of correct classification and/or selection for clinical studies. Although not a score in the strict sense, the 2005 International Pediatric Sepsis Definition Consensus Conference (IPSDCC)8,9 statement defined criteria for 6 organ dysfunctions which have been widely used, both in patients with and without sepsis.
In addition to these more commonly used scores, the literature search identified a number of articles proposing other approaches to assess organ dysfunctions in both broad and specific patient populations (Table 1 and Supplemental Tables 1 and 2).
Predictive Scores
Predictive scores such as PIM-34 or PRISM-IV5 describe the severity of illness at a defined baseline time point, which is often a time window around PICU admission, or time of randomization in clinical trials. The premise of predictive scores is founded on predicting the outcome with minimal influence by therapies provided to treat the condition (ie, is the observed severity of illness attributable to the disease that brings the patient to the PICU or to treatment given after PICU admission?), and on a temporal separation between the prediction and the outcome (ie, is the score predicting, rather than describing, death?). The reliability of a predictive score is better if the data are collected before any care is given or if the data are unresponsive to care. The discriminative value of a test is estimated by measuring its area under the receiver operating characteristic curve and the Hosmer-Lemeshow goodness of fit, with death used most commonly as the outcome. Good calibration refers to the agreement between predicted and observed rates of death across the spectrum of the score and may be measured by Cox calibration regression or other techniques. Reproducibility across different sites and health care settings is desirable to enable comparison of baseline risk of death for benchmarking. Predicted scores are not intended to be used in individual patients to guide treatment or to inform end-of-life decisions because they were validated in whole PICU populations, not in single patients. These scores need to be updated regularly because the population of PICU patients changes over time and because the risk of mortality changes over time for many specific diseases.
There are no predictive scores specific to patients with MODS at PICU admission or at randomization. Although PIM and PRISM represent the most frequently used predictive scores, organ dysfunction scores such as PELOD or pSOFA, obtained in a time window around PICU admission (such as day 1), also have predictive value for mortality. In addition, PELOD-2 on day of admission and maximum and cumulative PELOD-2 scores were associated with health-related quality of life 3 months postdischarge in a recent pediatric sepsis cohort.10
Descriptive Scores
Descriptive organ dysfunction scores estimate the severity of cases at defined time points or time intervals. Descriptive scores focus on the differentiation between patients with mild versus severe illness. Descriptive scores should reliably capture (un)responsiveness to care, as well as disease progression or resolution, and may thereby provide additional information not reflected in baseline prediction.11 Although simplicity is desirable to facilitate clinical application, descriptive scores aim to characterize the number and severity of organ dysfunctions. For example, the final PELOD-2 score utilizes 10 out of 17 criteria assessed in the derivation6,12 because these 10 were sufficient to explain the statistical variability related to the risk of death observed in the index population. The discriminative value of descriptive scores is estimated by measuring its area under the receiver operating characteristic to differentiate death and/or severe adverse outcomes. The calibration of a descriptive score to predict the risk of adverse outcomes should be excellent in the index population used to create and validate the score. On the other hand, calibration in other populations is less important because comorbidities and medical practice can differ significantly in different PICUs and in different countries. Updating descriptive scores over time is somewhat less important compared with predictive scores. Descriptive scores, as predictive scores, have been validated in large populations, not in individual patients and all subpopulations; thus, they should not be used to guide treatment or inform end-of-life decisions at the bedside. For example, the PELOD-2 score can be used in critically ill children with respiratory problems,13 as well as children with suspected infection,14 but we do not know how reliable the score is in other subpopulations of PICU patients, such as trauma patients.
Diagnostic Criteria for MODS
MODS represents a syndrome, not a specific disease entity, because MODS reflects a group of symptoms and signs that consistently occur together, the combination of which is associated with predictable outcomes. Diagnostic criteria are important to enable correct classification for (1) selecting specific monitoring, interventions and clinical pathways, (2) prognostication, and (3) reliably characterizing epidemiology. Contrary to a syndrome such as trisomy 21, where a consistent list of symptoms and signs relates to one common, genetic finding that defines the “gold standard,” the diagnosis of many conditions often depends on “the subjective interaction of an observer, and its defining boundaries are both arbitrary and a little fuzzy.”15 Presently, there is no reference standard for MODS, and diagnosis is based on different approaches to physiologic data, such as blood pressure, interventions (such as ventilation), and laboratory parameters (such as creatinine concentrations). Since formal criteria for pediatric organ dysfunction were first proposed in 1987 by Wilkinson,16 subsequent iterations, such as criteria proposed in 1996 by Proulx17 and in 2005 by Goldstein,8 were largely independent rather than the result of a consistent, iterative revision process. Importantly, these initial criteria for MODS were not data-driven but defined by expert consensus opinion. Although there is ample evidence for the association of increasing MODS severity with risk of death in critically ill children,1,18 the diagnostic performance of these (multi)organ dysfunction criteria in terms of sensitivity and specificity has been understudied. When it was studied, results indicated substantially worse performance compared with descriptive scores.19,20
Limitations of Current (Multi)Organ Dysfunction Scores
Clinicians and researchers base management and diagnostic decisions at least partially on objective physiologic parameters such as blood pressure, heart rate, or neurologic state. Although there is ample observational evidence to support the relevance of individual organs in relation to outcomes, a closer look reveals substantial differences in thresholds applied. For example, an adolescent with a creatinine of 100 micromol/L will score 2 points for renal dysfunction in PELOD-2 and pSOFA but not be counted as kidney dysfunction by IPSDCC. To complicate the matter further, ICUs internationally care for an increasing proportion of children with complex chronic health care conditions, and only some scores incorporate changes from baseline.21 In addition, thresholds may vary with concomitantly administered therapy; for example, Glasgow Coma Scale in presence of sedation and/or neuromuscular blockade.
Furthermore, the comparison of scores (Table 2, Fig 3) reveals inconsistencies in terms of which organs are included; for example, lactate is measured in PELOD-2 and IPSDCC only, whereas hepatic dysfunction is not included in PELOD-2. Although some of these differences stem from score design methodology (a priori versus derivation), they may simply relate to whether some of the criteria were available in the databases used for derivation/validation. The issue is further accentuated as scores variably consider the level of support provided to an organ; only pSOFA and IPSDCC consider vasoactive-inotrope support, for example, and none consider renal replacement therapies or extracorporeal membrane oxygenation. A child may thus exhibit severe MODS requiring extracorporeal membrane oxygenation and renal replacement therapies resulting in normal blood pressure, blood gases, and creatinine, yet only be scored for the mechanical ventilation component by some tools. In addition, scores often do not fully account for the evolution of critical care. For example, pSOFA includes vasoactive-inotropic support, but milrinone, vasopressin, and angiotensin-II analogs are not included.22 Importantly, the different approaches used to classify organ dysfunction severity (ie, binary in IPSCC, weighted score in PELOD-2, discrete score in pSOFA) hinder direct comparisons of absolute score levels in relation to MODS. For example, a score as high as 4 in both PELOD-2 or pSOFA may either reflect severe dysfunction in a single organ system or mild dysfunction across several organ systems.
It is also important to consider that, despite the merits of the procedures applied for derivation, validation, and calibration, the patient cohorts used should be considered historic and were almost exclusively biased toward PICUs in the United States, Canada, Western Europe, Australia, and New Zealand. Considering the expansion of PICU services around the world over the past 2 decades, it is imperative to ensure scores, or adapted versions, are applicable to different health care settings, some of which may have different resource levels. Finally, the focus of organ dysfunction scores has inevitably been on children admitted to PICUs that have the capacity to collect data on organ dysfunctions and severity. Yet, patient care represents a continuum, including emergency department to PICU, operating theater to PICU, or interhospital transfer to PICU journeys; hence, it would be desirable if scores could be readily applied outside the PICU environment.
Future Developments
An ideal MODS reference score should be specific to MODS and fulfill the following quality criteria:
highly sensitive and specific performance for clinically relevant outcomes such as mortality;
operator-independence;
criteria should be met as soon as possible while MODS is developing;
good reproducibility;
readily available in diverse PICU and non-PICU settings; and
good performance, as well, for nonmortality outcomes such as prolonged dependency on ICU support and mid- to long-term quality of life and functional status.
In the era of electronic health records (EHRs), the availability of multisite, multinational, and preferably, multisetting granular data from initial presentation throughout intensive care stay to discharge or death is promising and will enable better data-driven, rather than purely expert-based, approaches.23 Acknowledging that developing new scores from EHR data will inherently result in a bias toward high-resource settings such as selected PICUs in the United States, the international shift toward EHR in many countries, and the creation of high-quality databases in resource-limited settings, opens new opportunities for validation and adaptation to meet the requirements of different settings. Data-driven approaches should rigorously derive and validate criteria in large and independent databases, assess the intrarater and interrater reproducibility of the new list of diagnostic criteria, and compare the discriminative capacity of any new diagnostic criteria for MODS versus existing scoring systems.
Recent developments in clinician-driven approaches from the more traditional Delphi study are worth considering because they have the potential to be pragmatic and combine both the clinician’s perception of the gestalt of a condition and the data collected during the course of a disease. For example, a “temporary (presumptive) diagnosis” of MODS put forward by one or more independent clinicians can be compared with a “confirmatory (post hoc) diagnosis” of MODS, the latter diagnosis being made by an adjudicating committee.24,25 The analysis can then assess the sensitivity and specificity of items used for temporary and confirmatory diagnosis. Using a Bayesian strategy and likelihood ratio to ascertain the diagnosis can further improve the reliability of the diagnosis of MODS made by the members of the adjudicating committee.26
The challenge of developing 1 universal MODS gold standard can be partially overcome with appropriate methodologies. For example, latent class analysis may serve to identify surrogate gold standards. Supervised and unsupervised learning algorithms can identify clusters of patients with similar features and outcomes, which may serve to characterize phenotypes more likely to respond to certain interventions.27 Importantly, computational approaches carry enormous promise to overcome the limitations of static (1 assessment in a time window) score measures because dynamic measures of change over time may be more informative on a patient`s pattern of disease, response to disease, and response to treatment.
It is important to note that even future large-scale, data-derived criteria and scores for MODS are likely to fall short from a number of perspectives. First, recent developments such as cytokine and gene expression profiles, proteomics, genomics, or highly granular analysis of EHR data, such as heart rate variability, may lead to improved tools. However, their clinical usefulness remains to be determined, and applicability to different settings will pose major challenges because of the resources and expertise involved. Second, although it is feasible to derive best cutoffs for individual organ dysfunction based on optimal performance in terms of sensitivity and specificity, we are currently unable to delineate when organ dysfunction begins (for example, is a slightly elevated creatinine in a child with gastroenteritis receiving enteral rehydration equivalent to the same creatinine concentration in a child heading toward sepsis-related MODS?). Third, some alterations in physiology may reflect adaptive hypo- (eg, hibernation) or hyperfunction (tachycardia to meet increased cardiac output requirements), but current approaches struggle to discriminate these from dysfunction associated with worse outcomes.
Conclusions
After more than 3 decades of MODS research in critically ill children,16 and a large body of observational data demonstrating worse short- and long-term outcomes in children with MODS, present approaches remain hampered by lack of validation, standardization, and applicability, indicating an urgent need for revised MODS criteria. The creation of large international research networks contributing high-resolution data and the advances in computational science are expected to lead to a paradigm shift in the development and application of organ dysfunction scores. It is highly desirable to combine efforts aiming to yield data-driven criteria for organ and multiorgan dysfunction. In addition, these efforts should aim to derive parsimonious scores for easier application at an early stage, even in settings where resources are limited, to pave the way toward interventions more likely to improve outcomes for children with MODS globally.
FUNDING: Dr Schlapbach was supported by a National Health and Medical Research Council practitioner fellowship and by the Children`s Hospital Foundation, Brisbane, Australia. This work was supported by National Institutes of Health, National Institute of Neurological Disorders and Stroke, grant R01 NS106292 to Dr Bembea. The funders had no role in the design and conduct of the study. Funded by the National Institutes of Health (NIH).
Drs Schlapbach and Weiss conceptualized and designed this review, drafted the initial manuscript, and reviewed and revised the manuscript; Drs Bembea, Lacroix, and Zimmerman designed, led, and supervised the Pediatric Organ Dysfunction Information Update Mandate project and contributed to sections of the manuscript; Drs Carcillo, Leclerc, Leteurtre, Tissieres, and Wynn contributed to sections of the manuscript; and all authors approved the final manuscript as submitted and agree to be accountable for all aspects of the work.
The guidelines/recommendations in this article are not American Academy of Pediatrics policy, and publication herein does not imply endorsement.
- EHR
electronic health record
- IPSDCC
International Pediatric Sepsis Definition Consensus Conference
- MODS
multiple organ dysfunction syndrome
- PELOD
Pediatric Logistic Organ Dysfunction score
- PIM
Pediatric Index of Mortality
- PODIUM
Pediatric Organ Dysfunction Information Update Mandate
- PRISM
Pediatric Risk of Mortality
- pSOFA
Sequential Organ Failure Assessment
References
Competing Interests
FINANCIAL DISCLOSURE: The authors have indicated they have no financial relationships relevant to this article to disclose.
CONFLICT OF INTEREST: The authors have indicated they have no potential conflicts of interest to disclose.
Comments