The goal of a diagnostic test is to provide information on the probability of disease. In this article, we review the principles of diagnostic test characteristics, including sensitivity, specificity, positive and negative predictive value, receiver operating characteristics curves, likelihood ratios, and interval likelihood ratios. We illustrate how interval likelihood ratios optimize the information that can be obtained from test results that can take on >2 values, how they are reflected in the slope of the receiver operating characteristics curve, and how they can be easily calculated from published data.

Pediatric hospitalists use diagnostic tests and clinical prediction rules to decrease diagnostic uncertainty and inform a child’s management. Nevertheless, health care providers often recommend tests without considering each test’s diagnostic characteristics,1  and overtesting can lead to false-positives and -negatives, incorrect diagnoses, and overtreatment.2  Understanding test characteristics can enhance pediatric hospitalists’ ability to practice evidence-based medicine (Table 1).

TABLE 1

Take Home Points

 1. Sensitivity is the probability that a test will be positive for someone with the disease. 2. Specificity is the probability that a test will be negative for someone without the disease. 3. Different cutoffs for continuous diagnostic tests can be visualized in a ROC curve, in which the tradeoff between true-positives (sensitivity) and false-positives (1−specificity) is displayed. 4. Likelihood ratios indicate how much a particular test result alters the probability of disease and, unlike sensitivity and specificity, can take full advantage of the information available from tests that are not naturally dichotomous. 5. Interval likelihood ratios allow for more detailed guidance on how to interpret test results in specific intervals and can often be easily calculated from published data on sensitivity and specificity at different cutoffs for defining abnormality.
 1. Sensitivity is the probability that a test will be positive for someone with the disease. 2. Specificity is the probability that a test will be negative for someone without the disease. 3. Different cutoffs for continuous diagnostic tests can be visualized in a ROC curve, in which the tradeoff between true-positives (sensitivity) and false-positives (1−specificity) is displayed. 4. Likelihood ratios indicate how much a particular test result alters the probability of disease and, unlike sensitivity and specificity, can take full advantage of the information available from tests that are not naturally dichotomous. 5. Interval likelihood ratios allow for more detailed guidance on how to interpret test results in specific intervals and can often be easily calculated from published data on sensitivity and specificity at different cutoffs for defining abnormality.

Some diagnostic tests have naturally dichotomous results, whereas other tests can be made into dichotomous tests by selecting a cutoff value. Dichotomous tests provide a binary answer to the question of whether a patient has the disease. A 2 × 2 table summarizes the 4 outcomes of a dichotomous test in relation to the true status of the patient (Fig 1):

FIGURE 1

Principles of diagnostic tests. The numerators and denominators for the definitions of sensitivity, specificity, positive predictive value, and negative predictive value are shown. A hypothetical example is shown for a test with 40% sensitivity, 98% specificity, and a pretest probability of 100/1100= 9.1%.

FIGURE 1

Principles of diagnostic tests. The numerators and denominators for the definitions of sensitivity, specificity, positive predictive value, and negative predictive value are shown. A hypothetical example is shown for a test with 40% sensitivity, 98% specificity, and a pretest probability of 100/1100= 9.1%.

Close modal

True-positive (TP): patient has the disease, and the test correctly identifies the patient as positive.

False-positive (FP): patient does not have the disease, but the test incorrectly identifies the patient as positive.

True-negative (TN): patient does not have the disease, and the test correctly identifies the patient as negative.

False-negative (FN): patient has the disease, but the test incorrectly identifies the patient as negative.

We will first review sensitivity, specificity, and positive and negative predictive values, which are test characteristics best reserved for dichotomous tests. We will use clinical examples of nitrite tests to detect urinary tract infections (UTIs) and procalcitonin to detect invasive bacterial infections in young infants to illustrate these concepts.

Among those with the disease, sensitivity is the probability that the diagnostic test will be positive. Sensitivity is calculated as TP/(TP + FN). For example, if the nitrite test has a 40% sensitivity in detecting children with UTIs, this means that, among 100 children with UTIs, 40 will have positive nitrite tests. A test with 100% sensitivity detects every person with the disease, whereas a test with low sensitivity can be falsely reassuring as it may give negative results in some individuals with the disease. Sensitivity reflects how good a test is only among individuals with the disease.

Among individuals without the disease, specificity is the probability that a test will be negative. Specificity is calculated as TN/(TN + FP). If the nitrite test has a 98% specificity, this means that, among 100 children who do not have UTIs, 98 will have a negative nitrite test and be correctly identified as not having a UTI and 2 will have a falsely positive result. A false-positive result may worry the individual, waste limited resources, and lead to unnecessary additional tests or treatments. Tests that have perfect specificity will not have any false-positives. Specificity can only be calculated among people who do not have the disease. Sensitivity and specificity are generally assumed unaffected by the pretest probability (the probability of disease before learning the test result), although this is not always the case.3

Positive predictive value (PPV) is the probability that a person with a positive test has the disease and represents the proportion of true-positives out of all positive tests. PPV is calculated as TP/(TP + FP). Negative predictive value (NPV) is the probability that a person with a negative test does not have the disease. NPV represents the proportion of true-negatives out of all individuals who test negative for the disease, which is calculated as TN/(FN + TN).

For example, in a hypothetical population with a 9.1% (100/1100) pretest probability of UTIs, the PPV and NPV of the nitrite test with 40% sensitivity and 98% specificity can be calculated from these values (Fig 1). A 67% PPV means that, among 100 children who have positive nitrites, 67 children will have UTIs. A 94% NPV means that, among 100 children who have negative nitrites, 94 children will not have UTIs.

The PPV and NPV are both influenced by the pretest (or “prior”) probability of the disease. For diseases with a higher pretest probability, the PPV will be higher. On the other hand, even an excellent diagnostic test that is used to detect a rare disease may have a low PPV.

When determining the acceptability of PPV or NPV for a diagnostic test and disease, it is important to consider the implications of false-positives and -negatives to the patient and population. Diagnostic tests with many false-negatives for contagious, fatal, or treatable diseases are undesirable because of the clinical consequences of misclassification. Conversely, tests with many false-positives that lead to invasive additional tests or risky treatments will be undesirable.

For nondichotomous test results, the sensitivity and specificity depend on the chosen cutoff for a positive result. The receiver operating characteristics (ROC) curve reveals the tradeoff between the true-positive rate, or sensitivity, versus the false-positive rate, or 1 minus specificity, at multiple possible cutoffs for classifying a test as positive (Fig 2A).4  In general, there is a tradeoff between sensitivity and specificity because it is rare that a test is perfectly sensitive and specific. Two hypothetical cutoffs are displayed in Fig 2A: One with low sensitivity/high specificity and another with high sensitivity/low specificity. The sensitivity can be increased by decreasing the cutoff for a positive test, which leads to more true-positives. However, this generally decreases the test’s specificity by increasing false-positives.

FIGURE 2

ROC curves. A, Sample ROC curves. An AUROC of >0.9 is excellent (line A), 0.8–0.9 is good, 0.7–0.8 is acceptable (line B), 0.5–0.7 is poor, and an AUROC of 0.5 means the diagnostic test has no discrimination or could be due to chance (line C).14  The triangle refers to a cut point with high sensitivity/low specificity, and the circle refers to a cut point with low sensitivity/high specificity. B, Calculating interval likelihood ratios from ROC curves. Each dot represents a different cut point for considering the procalcitonin result as being positive for an invasive bacterial illness (based on data from Milcent et al10 ). The ROC curve reveals how the sensitivity and specificity change with different cut points, which create binary results for continuous data. The cut points of procalcitonin ≥2.0 ng/mL, ≥0.5 ng/mL, and ≥0.3 ng/mL for predicting invasive bacterial infection are plotted. The interval likelihood ratio between 0.5 ng/mL and 2.0 ng/mL can be calculated by dividing the change in sensitivity over the change in 1 minus specificity = 25%/9% = 2.8.

FIGURE 2

ROC curves. A, Sample ROC curves. An AUROC of >0.9 is excellent (line A), 0.8–0.9 is good, 0.7–0.8 is acceptable (line B), 0.5–0.7 is poor, and an AUROC of 0.5 means the diagnostic test has no discrimination or could be due to chance (line C).14  The triangle refers to a cut point with high sensitivity/low specificity, and the circle refers to a cut point with low sensitivity/high specificity. B, Calculating interval likelihood ratios from ROC curves. Each dot represents a different cut point for considering the procalcitonin result as being positive for an invasive bacterial illness (based on data from Milcent et al10 ). The ROC curve reveals how the sensitivity and specificity change with different cut points, which create binary results for continuous data. The cut points of procalcitonin ≥2.0 ng/mL, ≥0.5 ng/mL, and ≥0.3 ng/mL for predicting invasive bacterial infection are plotted. The interval likelihood ratio between 0.5 ng/mL and 2.0 ng/mL can be calculated by dividing the change in sensitivity over the change in 1 minus specificity = 25%/9% = 2.8.

Close modal

The area under the ROC curve (AUROC) quantifies the discrimination of the diagnostic test and can be used to compare 2 or more tests. The closer the AUROC curve is to the 45-degree diagonal line, the worse the test’s discrimination.5  An AUROC of 0.5 means the diagnostic test has no discrimination, similar to tossing a coin.5  (Note that the AUROC is not a good measure of discrimination for tests that may indicate disease with both high and low values, such as white blood cell count in young infants at risk for sepsis).

Likelihood ratios are a valuable way to quantify how test results alter the probability of disease. Unlike sensitivity and specificity, likelihood ratios do not require dichotomizing test results into positive and negative and can be used for tests that are not naturally dichotomous.3  A likelihood ratio is calculated using the following formula:
$Likelihood ratio for a given test result =Probability of the result in those with the diseaseProbability of the result in those without the disease$

A likelihood ratio of 1 indicates that a test result provides no information on the probability of disease because that result is equally likely in those with and without the disease. A value >1 suggests that the result is associated with the disease; the higher the likelihood ratio, the stronger the association with the disease. Likelihood ratios <1 are associated with the absence of the disease; the closer the likelihood ratio is to 0, the stronger the association with the absence of disease.6  The magnitude of the change from pretest to posttest probability at a certain likelihood ratio depends on the pretest probability. For example, with a pretest probability of 50%, tests with likelihood ratios of 10 and 0.1 would result in posttest probabilities of 91% and 9%, respectively. However, with a pretest probability of 1%, those same likelihood ratios would result in posttest probabilities of 9% and 0.1%, respectively.

When test results are reported as positive or negative, there are only 2 likelihood ratios: a positive likelihood ratio (sensitivity/[1 − specificity]) and a negative likelihood ratio ([1 − sensitivity]/specificity).

Using likelihood ratios to determine posttest probabilities requires converting probabilities to odds and can be simplified by using online calculators.7,8  Probability in this case is the measure of the likelihood of disease, whereas odds represent a ratio of the likelihood of disease to the likelihood of no disease (odds = probability/1 − probability). Posttest probabilities can also be calculated manually by using the following steps3 :

1. Convert pretest probability to pretest odds: pretest odds = pretest probability/(1 − pretest probability).

2. Calculate posttest odds: posttest odds = pretest odds × likelihood ratio.

3. Convert posttest odds to posttest probability: posttest probability = posttest odds/(1 + posttest odds).

When probabilities and odds are low (<5% to 10%), their values are similar, so skipping steps involving their conversion will produce similar results.

Procalcitonin is an inflammatory marker recommended for the risk stratification of febrile young infants, who are at risk for invasive bacterial infections (IBI; ie, bacteremia and bacterial meningitis).9  Milcent et al assessed the diagnostic characteristics of procalcitonin for the detection of IBI in >2000 infants 7 to 91 days old.10  Procalcitonin had an excellent AUROC of 0.91 to detect IBI.10  The sensitivity of a procalcitonin of ≥0.3 ng/mL for the outcome of IBI was 90%, whereas for level ≥0.5 ng/mL, it decreased to 85%, and for level ≥2.0 ng/mL, it decreased to 60%.10  This means that, among 100 febrile infants with IBI, 90 will have a procalcitonin level ≥0.3 ng/mL, 85 will have a procalcitonin level ≥0.5 ng/mL, and 60 will have a procalcitonin level ≥2.0 ng/mL. Additionally, among children who did not have IBI, the specificity of a procalcitonin <0.3 ng/mL was 78%, whereas for a level of <0.5 ng/mL, it increased to 85%, and for a level <2.0 ng/mL, it increased to 94%.10  This means that, among 100 febrile infants who do not have IBI, 78 infants will have a procalcitonin level <0.3 ng/mL and 22 infants will have a procalcitonin level ≥0.3 ng/mL, suggesting the test at this cutoff has a moderate rate of false-positives.

If a 35-day-old, well-appearing term infant presents with a fever and procalcitonin level of 2.5 ng/mL, what are the chances that the infant has an IBI? Let us assume the pretest probability of IBI among febrile infants aged 29 to 56 days is ∼2%.9,11  Following the steps above, we will calculate the posttest probability.

1. Pretest odds = pretest probability/(1 − pretest probability) = 0.02/(1 − 0.02) = 0.02. The pretest odds are similar to the pretest probability because of the low probability of IBI.

2. Posttest odds = pretest odds × likelihood ratio for procalcitonin of 2.5 ng/mL = 0.02 × 9.6 (likelihood ratio for a procalcitonin level ≥2.0 ng/mL from Milcent et al; consistent with the sensitivity of 60% and specificity of 94% quoted above)10  = 0.192.

3. Posttest probability = posttest odds/(1 + posttest odds) = 0.192/(1 + 0.192) = 0.16.

Hence, assuming a pretest probability of 2%, the infant with a procalcitonin level of 2.5 has a 16% posttest probability of an IBI. Using a different pretest probability of IBI or likelihood ratio will change the posttest probability.

Diagnostic tests that have continuous rather than dichotomous results are often presented with sensitivity and specificity at different cutoffs. In the example by Milcent et al, data are presented for procalcitonin cutoffs of ≥0.3, ≥0.5, and ≥2.0 (Fig 3A).10  The interpretation of a result that falls between presented cutoffs is challenging because the result would be considered a “positive” result using one cutoff and a “negative” result using another. For example, to determine the posttest probability of a procalcitonin value of 0.8, one could either use the positive likelihood ratio for ≥0.5 (5.6) or the negative likelihood ratio for ≥2.0 (0.4), which would result in different conclusions about the probability of the outcome.10  Interval likelihood ratios, which are likelihood ratios calculated for an interval of test results, offer more granular data for clinical applications.12  If interval likelihood ratios are not presented, they can be estimated from published data on sensitivity and specificity from multiple cutoffs using a strategy detailed in Fig 3B.3

FIGURE 3

Clinical example: the use of procalcitonin in the detection of invasive bacterial infection in febrile young infants. A, Diagnostic test characteristics for procalcitonin and the outcome of IBI in febrile infants from Milcent et al.10  B, Calculating interval likelihood ratios from published data.

FIGURE 3

Clinical example: the use of procalcitonin in the detection of invasive bacterial infection in febrile young infants. A, Diagnostic test characteristics for procalcitonin and the outcome of IBI in febrile infants from Milcent et al.10  B, Calculating interval likelihood ratios from published data.

Close modal

Interval likelihood ratios are also related to the shape of the ROC curve. An interval likelihood ratio is equal to the slope of the ROC curve over that interval.3  In Fig 2B, we created a ROC curve using data from 3 of the cutoffs published by Milcent et al. The likelihood ratios of various intervals can be estimated by looking at how the slope of the ROC curve changes.

Limitations to consider include variability in the quality and relevance of studies used to generate estimates, accuracy only for the population studied, and variability in the ability of gold standards to incontrovertibly distinguish diseases from no diseases.13  Obtaining data required for some gold standards may be cost-prohibitive or introduce more risks than benefits to patients. Moreover, gold standards may not exist for some diseases.

Calculating and interpreting sensitivity, specificity, and predictive values are essential in understanding diagnostic test characteristics and practicing evidence-based medicine. For diagnostic tests with continuous values, ROC curves reveal the tradeoffs between sensitivity and specificity at different cut points. Likelihood ratios are a powerful way to apply diagnostic test characteristics to daily practice. Interval likelihood ratios can be calculated from presented data even if not published, further empowering hospitalists to understand how a specific test result alters the probability of disease.

FUNDING: No external funding.

CONFLICT OF INTEREST DISCLOSURES: The authors have indicated they have no potential conflicts of interest to disclose.

Drs Mediratta and Wang conceptualized the manuscript, drafted the initial manuscript, and reviewed and revised the manuscript; Dr Newman conceptualized the manuscript and critically reviewed and revised the manuscript; and all authors approved the final manuscript as submitted and agree to be accountable for all aspects of the work.

1.
Whiting
PF
,
Davenport
C
,
Jameson
C
, et al
.
How well do health professionals interpret diagnostic information? A systematic review
.
BMJ Open
.
2015
;
5
(
7
):
e008155
2.
Lam
JH
,
Pickles
K
,
Stanaway
FF
,
Bell
KJL
.
Why clinicians overtest: development of a thematic framework
.
BMC Health Serv Res
.
2020
;
20
(
1
):
1011
3.
Newman
TB
,
Kohn
MA
.
Evidence-Based Diagnosis: An Introduction to Clinical Epidemiology
, 2nd ed.
Cambridge, UK
:
Cambridge University Press
;
2020
4.
Gallagher
EJ
.
Evidence-based emergency medicine/editorial. The problem with sensitivity and specificity
.
Ann Emerg Med
.
2003
;
42
(
2
):
298
303
5.
Hajian-Tilaki
K
.
Receiver operating characteristic (ROC) curve analysis for medical diagnostic test evaluation
.
Caspian J Intern Med
.
2013
;
4
(
2
):
627
635
6.
Deeks
JJ
,
Altman
DG
.
Diagnostic tests 4: likelihood ratios
.
BMJ
.
2004
;
329
(
7458
):
168
169
7.
Kohn
MA
,
Senyak
J
.
Post-test probability calculators
.
Available at: https://sample-size.net/post-probability-calculator-test-new/. Accessed December 12, 2022
8.
MedCalc Software Ltd
.
Diagnostic test evaluation calculator
.
Available at: https://www.medcalc.org/calc/diagnostic_test.php. Accessed February 28, 2023
9.
Pantell
RH
,
Roberts
KB
,
WG
, et al
;
Subcommittee on Febrile Infants
.
Evaluation and management of well-appearing febrile infants 8 to 60 days old
.
Pediatrics
.
2021
;
148
(
2
):
e2021052228
10.
Milcent
K
,
Faesch
S
,
Gras-Le Guen
C
, et al
.
Use of procalcitonin assays to predict serious bacterial infection in young febrile infants
.
JAMA Pediatr
.
2016
;
170
(
1
):
62
69
11.
Burstein
B
,
Anderson
G
,
Yannopoulos
A
.
Prevalence of serious bacterial infections among febrile infants 90 days or younger in a Canadian urban pediatric emergency department during the COVID-19 pandemic
.
JAMA Netw Open
.
2021
;
4
(
7
):
e2116919
12.
Brown
MD
,
Reeves
MJ
.
Evidence-based emergency medicine/skills for evidence-based emergency care. Interval likelihood ratios: another advantage for the evidence-based diagnostician
.
Ann Emerg Med
.
2003
;
42
(
2
):
292
297
13.
Trevethan
R
.
Sensitivity, specificity, and predictive values: foundations, pliabilities, and pitfalls in research and practice
.
Front Public Health
.
2017
;
5
:
307
14.
Hosmer
DW
,
Lemeshow
S
,
Sturdivant
RX
.
Chapter 5: Assessing the Fit of the Model
. In:
Applied Logistic Regression
. 3rd ed.
Hoboken, NJ
:
Wiley
;
2013
:
153
225