The goal of a diagnostic test is to provide information on the probability of disease. In this article, we review the principles of diagnostic test characteristics, including sensitivity, specificity, positive and negative predictive value, receiver operating characteristics curves, likelihood ratios, and interval likelihood ratios. We illustrate how interval likelihood ratios optimize the information that can be obtained from test results that can take on >2 values, how they are reflected in the slope of the receiver operating characteristics curve, and how they can be easily calculated from published data.
Pediatric hospitalists use diagnostic tests and clinical prediction rules to decrease diagnostic uncertainty and inform a child’s management. Nevertheless, health care providers often recommend tests without considering each test’s diagnostic characteristics,1 and overtesting can lead to false-positives and -negatives, incorrect diagnoses, and overtreatment.2 Understanding test characteristics can enhance pediatric hospitalists’ ability to practice evidence-based medicine (Table 1).
Principles of Diagnostic Tests
Some diagnostic tests have naturally dichotomous results, whereas other tests can be made into dichotomous tests by selecting a cutoff value. Dichotomous tests provide a binary answer to the question of whether a patient has the disease. A 2 × 2 table summarizes the 4 outcomes of a dichotomous test in relation to the true status of the patient (Fig 1):
True-positive (TP): patient has the disease, and the test correctly identifies the patient as positive.
False-positive (FP): patient does not have the disease, but the test incorrectly identifies the patient as positive.
True-negative (TN): patient does not have the disease, and the test correctly identifies the patient as negative.
False-negative (FN): patient has the disease, but the test incorrectly identifies the patient as negative.
We will first review sensitivity, specificity, and positive and negative predictive values, which are test characteristics best reserved for dichotomous tests. We will use clinical examples of nitrite tests to detect urinary tract infections (UTIs) and procalcitonin to detect invasive bacterial infections in young infants to illustrate these concepts.
Among those with the disease, sensitivity is the probability that the diagnostic test will be positive. Sensitivity is calculated as TP/(TP + FN). For example, if the nitrite test has a 40% sensitivity in detecting children with UTIs, this means that, among 100 children with UTIs, 40 will have positive nitrite tests. A test with 100% sensitivity detects every person with the disease, whereas a test with low sensitivity can be falsely reassuring as it may give negative results in some individuals with the disease. Sensitivity reflects how good a test is only among individuals with the disease.
Among individuals without the disease, specificity is the probability that a test will be negative. Specificity is calculated as TN/(TN + FP). If the nitrite test has a 98% specificity, this means that, among 100 children who do not have UTIs, 98 will have a negative nitrite test and be correctly identified as not having a UTI and 2 will have a falsely positive result. A false-positive result may worry the individual, waste limited resources, and lead to unnecessary additional tests or treatments. Tests that have perfect specificity will not have any false-positives. Specificity can only be calculated among people who do not have the disease. Sensitivity and specificity are generally assumed unaffected by the pretest probability (the probability of disease before learning the test result), although this is not always the case.3
Positive and Negative Predictive Value
Positive predictive value (PPV) is the probability that a person with a positive test has the disease and represents the proportion of true-positives out of all positive tests. PPV is calculated as TP/(TP + FP). Negative predictive value (NPV) is the probability that a person with a negative test does not have the disease. NPV represents the proportion of true-negatives out of all individuals who test negative for the disease, which is calculated as TN/(FN + TN).
For example, in a hypothetical population with a 9.1% (100/1100) pretest probability of UTIs, the PPV and NPV of the nitrite test with 40% sensitivity and 98% specificity can be calculated from these values (Fig 1). A 67% PPV means that, among 100 children who have positive nitrites, 67 children will have UTIs. A 94% NPV means that, among 100 children who have negative nitrites, 94 children will not have UTIs.
The PPV and NPV are both influenced by the pretest (or “prior”) probability of the disease. For diseases with a higher pretest probability, the PPV will be higher. On the other hand, even an excellent diagnostic test that is used to detect a rare disease may have a low PPV.
When determining the acceptability of PPV or NPV for a diagnostic test and disease, it is important to consider the implications of false-positives and -negatives to the patient and population. Diagnostic tests with many false-negatives for contagious, fatal, or treatable diseases are undesirable because of the clinical consequences of misclassification. Conversely, tests with many false-positives that lead to invasive additional tests or risky treatments will be undesirable.
Receiver Operating Characteristics Curve
For nondichotomous test results, the sensitivity and specificity depend on the chosen cutoff for a positive result. The receiver operating characteristics (ROC) curve reveals the tradeoff between the true-positive rate, or sensitivity, versus the false-positive rate, or 1 minus specificity, at multiple possible cutoffs for classifying a test as positive (Fig 2A).4 In general, there is a tradeoff between sensitivity and specificity because it is rare that a test is perfectly sensitive and specific. Two hypothetical cutoffs are displayed in Fig 2A: One with low sensitivity/high specificity and another with high sensitivity/low specificity. The sensitivity can be increased by decreasing the cutoff for a positive test, which leads to more true-positives. However, this generally decreases the test’s specificity by increasing false-positives.
The area under the ROC curve (AUROC) quantifies the discrimination of the diagnostic test and can be used to compare 2 or more tests. The closer the AUROC curve is to the 45-degree diagonal line, the worse the test’s discrimination.5 An AUROC of 0.5 means the diagnostic test has no discrimination, similar to tossing a coin.5 (Note that the AUROC is not a good measure of discrimination for tests that may indicate disease with both high and low values, such as white blood cell count in young infants at risk for sepsis).
A likelihood ratio of 1 indicates that a test result provides no information on the probability of disease because that result is equally likely in those with and without the disease. A value >1 suggests that the result is associated with the disease; the higher the likelihood ratio, the stronger the association with the disease. Likelihood ratios <1 are associated with the absence of the disease; the closer the likelihood ratio is to 0, the stronger the association with the absence of disease.6 The magnitude of the change from pretest to posttest probability at a certain likelihood ratio depends on the pretest probability. For example, with a pretest probability of 50%, tests with likelihood ratios of 10 and 0.1 would result in posttest probabilities of 91% and 9%, respectively. However, with a pretest probability of 1%, those same likelihood ratios would result in posttest probabilities of 9% and 0.1%, respectively.
When test results are reported as positive or negative, there are only 2 likelihood ratios: a positive likelihood ratio (sensitivity/[1 − specificity]) and a negative likelihood ratio ([1 − sensitivity]/specificity).
Using likelihood ratios to determine posttest probabilities requires converting probabilities to odds and can be simplified by using online calculators.7,8 Probability in this case is the measure of the likelihood of disease, whereas odds represent a ratio of the likelihood of disease to the likelihood of no disease (odds = probability/1 − probability). Posttest probabilities can also be calculated manually by using the following steps3 :
Convert pretest probability to pretest odds: pretest odds = pretest probability/(1 − pretest probability).
Calculate posttest odds: posttest odds = pretest odds × likelihood ratio.
Convert posttest odds to posttest probability: posttest probability = posttest odds/(1 + posttest odds).
When probabilities and odds are low (<5% to 10%), their values are similar, so skipping steps involving their conversion will produce similar results.
Procalcitonin is an inflammatory marker recommended for the risk stratification of febrile young infants, who are at risk for invasive bacterial infections (IBI; ie, bacteremia and bacterial meningitis).9 Milcent et al assessed the diagnostic characteristics of procalcitonin for the detection of IBI in >2000 infants 7 to 91 days old.10 Procalcitonin had an excellent AUROC of 0.91 to detect IBI.10 The sensitivity of a procalcitonin of ≥0.3 ng/mL for the outcome of IBI was 90%, whereas for level ≥0.5 ng/mL, it decreased to 85%, and for level ≥2.0 ng/mL, it decreased to 60%.10 This means that, among 100 febrile infants with IBI, 90 will have a procalcitonin level ≥0.3 ng/mL, 85 will have a procalcitonin level ≥0.5 ng/mL, and 60 will have a procalcitonin level ≥2.0 ng/mL. Additionally, among children who did not have IBI, the specificity of a procalcitonin <0.3 ng/mL was 78%, whereas for a level of <0.5 ng/mL, it increased to 85%, and for a level <2.0 ng/mL, it increased to 94%.10 This means that, among 100 febrile infants who do not have IBI, 78 infants will have a procalcitonin level <0.3 ng/mL and 22 infants will have a procalcitonin level ≥0.3 ng/mL, suggesting the test at this cutoff has a moderate rate of false-positives.
Applying Likelihood Ratios
If a 35-day-old, well-appearing term infant presents with a fever and procalcitonin level of 2.5 ng/mL, what are the chances that the infant has an IBI? Let us assume the pretest probability of IBI among febrile infants aged 29 to 56 days is ∼2%.9,11 Following the steps above, we will calculate the posttest probability.
Pretest odds = pretest probability/(1 − pretest probability) = 0.02/(1 − 0.02) = 0.02. The pretest odds are similar to the pretest probability because of the low probability of IBI.
Posttest odds = pretest odds × likelihood ratio for procalcitonin of 2.5 ng/mL = 0.02 × 9.6 (likelihood ratio for a procalcitonin level ≥2.0 ng/mL from Milcent et al; consistent with the sensitivity of 60% and specificity of 94% quoted above)10 = 0.192.
Posttest probability = posttest odds/(1 + posttest odds) = 0.192/(1 + 0.192) = 0.16.
Hence, assuming a pretest probability of 2%, the infant with a procalcitonin level of 2.5 has a 16% posttest probability of an IBI. Using a different pretest probability of IBI or likelihood ratio will change the posttest probability.
Interval Likelihood Ratios for Multilevel Tests
Diagnostic tests that have continuous rather than dichotomous results are often presented with sensitivity and specificity at different cutoffs. In the example by Milcent et al, data are presented for procalcitonin cutoffs of ≥0.3, ≥0.5, and ≥2.0 (Fig 3A).10 The interpretation of a result that falls between presented cutoffs is challenging because the result would be considered a “positive” result using one cutoff and a “negative” result using another. For example, to determine the posttest probability of a procalcitonin value of 0.8, one could either use the positive likelihood ratio for ≥0.5 (5.6) or the negative likelihood ratio for ≥2.0 (0.4), which would result in different conclusions about the probability of the outcome.10 Interval likelihood ratios, which are likelihood ratios calculated for an interval of test results, offer more granular data for clinical applications.12 If interval likelihood ratios are not presented, they can be estimated from published data on sensitivity and specificity from multiple cutoffs using a strategy detailed in Fig 3B.3
Interval likelihood ratios are also related to the shape of the ROC curve. An interval likelihood ratio is equal to the slope of the ROC curve over that interval.3 In Fig 2B, we created a ROC curve using data from 3 of the cutoffs published by Milcent et al. The likelihood ratios of various intervals can be estimated by looking at how the slope of the ROC curve changes.
Limitations of Diagnostic Test Characteristics
Limitations to consider include variability in the quality and relevance of studies used to generate estimates, accuracy only for the population studied, and variability in the ability of gold standards to incontrovertibly distinguish diseases from no diseases.13 Obtaining data required for some gold standards may be cost-prohibitive or introduce more risks than benefits to patients. Moreover, gold standards may not exist for some diseases.
Calculating and interpreting sensitivity, specificity, and predictive values are essential in understanding diagnostic test characteristics and practicing evidence-based medicine. For diagnostic tests with continuous values, ROC curves reveal the tradeoffs between sensitivity and specificity at different cut points. Likelihood ratios are a powerful way to apply diagnostic test characteristics to daily practice. Interval likelihood ratios can be calculated from presented data even if not published, further empowering hospitalists to understand how a specific test result alters the probability of disease.
FUNDING: No external funding.
CONFLICT OF INTEREST DISCLOSURES: The authors have indicated they have no potential conflicts of interest to disclose.
Drs Mediratta and Wang conceptualized the manuscript, drafted the initial manuscript, and reviewed and revised the manuscript; Dr Newman conceptualized the manuscript and critically reviewed and revised the manuscript; and all authors approved the final manuscript as submitted and agree to be accountable for all aspects of the work.