Sensitivity, Specificity, PPV/NPV, Likelihood Ratios & ROC
- All diagnostic-test statistics derive from the 2x2 contingency table comparing the test against a reference (gold) standard: true positives, false positives, false negatives, true negatives.
- SENSITIVITY = TP/(TP+FN) and SPECIFICITY = TN/(TN+FP) are intrinsic test properties (read DOWN the disease columns) and are largely INDEPENDENT of prevalence.
- SnNout / SpPin: a highly SENSITIVE test that is NEGATIVE rules disease OUT; a highly SPECIFIC test that is POSITIVE rules disease IN.
- POSITIVE and NEGATIVE PREDICTIVE VALUES answer the clinically useful question ('given this result, does the patient have disease?') but DEPEND ON PREVALENCE (pre-test probability) - the same test has a lower PPV in a low-prevalence population.
- LIKELIHOOD RATIOS convert a result into a change in the odds of disease and are prevalence-independent: LR+ = Sn/(1-Sp), LR- = (1-Sn)/Sp; an LR+ greater than 10 or LR- less than 0.1 produces large, often conclusive shifts in probability.
- The ROC curve plots sensitivity against (1 - specificity) over all thresholds; the AREA UNDER THE CURVE (AUC) summarises discrimination (0.5 = no better than chance, 1.0 = perfect).
- “Read the 2x2 the right way: sensitivity/specificity go DOWN the columns (by true disease status); predictive values go ACROSS the rows (by test result).
- “If asked 'why is PPV low in screening?' - because PPV falls as prevalence falls, so even a good test throws up many false positives in a low-prevalence population.
- “Use likelihood ratios at the bedside: they combine sensitivity and specificity into one number and let you move from pre-test to post-test probability.
Sensitivity and specificity are calculated vertically, within each true-disease column. They describe the test itself and are largely independent of prevalence (they do change with disease spectrum/severity).
PPV and NPV are calculated horizontally, within each test-result row. They answer the patient's question ("do I have it?") but depend on prevalence - so they are population-specific.
The 2x2 Contingency Table
Every diagnostic statistic is built from a 2x2 table comparing the test result against the reference (gold) standard:
| 0 | 1 | 2 | 3 |
|---|---|---|---|
| Test POSITIVE | True Positive (TP) | False Positive (FP) | TP + FP -> used for PPV |
| Test NEGATIVE | False Negative (FN) | True Negative (TN) | FN + TN -> used for NPV |
| Column total | TP + FN (all diseased) | FP + TN (all well) | N |
- Sensitivity = TP / (TP + FN) - of all who HAVE the disease, the fraction the test catches (down the "disease present" column).
- Specificity = TN / (TN + FP) - of all who do NOT have the disease, the fraction the test clears (down the "disease absent" column).
- Positive predictive value (PPV) = TP / (TP + FP) - of all who test POSITIVE, the fraction truly diseased (across the "test positive" row).
- Negative predictive value (NPV) = TN / (TN + FN) - of all who test NEGATIVE, the fraction truly well (across the "test negative" row).
Sensitivity & Specificity (SnNout / SpPin)
A test with high sensitivity has few false negatives, so a NEGATIVE result reliably rules the disease OUT: SnNout. High-sensitivity tests are the screening/'safety-net' tests you want negative.
A test with high specificity has few false positives, so a POSITIVE result reliably rules the disease IN: SpPin. High-specificity tests are the confirmatory tests you trust when positive.
A meta-analysis of clinical examination for ACL rupture provides a perfect illustration. The Lachman test is highly sensitive (pooled sensitivity 85%, specificity 94%) - a good screening test you want negative (SnNout). The pivot-shift test is highly specific (specificity 98%) but insensitive (sensitivity 24%) - so a positive pivot shift effectively rules ACL rupture IN (SpPin), but a negative one does not rule it out. The anterior drawer performs well in chronic injury (sensitivity 92%, specificity 91%) but poorly when acute. This is why the recommendation is to perform the Lachman test (to screen) and the pivot shift (to confirm).
For a test read on a continuous scale, moving the cut-off trades sensitivity against specificity: lowering the threshold catches more true positives (higher sensitivity) but creates more false positives (lower specificity), and vice versa. There is no free lunch - the ROC curve displays this entire trade-off (below). The "best" threshold depends on the cost of a missed case versus a false alarm.
Predictive Values & Prevalence
Sensitivity and specificity are properties of the test; predictive values are properties of the test applied to a particular population. As prevalence (pre-test probability) falls, the pool of truly diseased people shrinks relative to the well, so even a small false-positive rate generates many false positives - and PPV falls (while NPV rises). The identical test therefore has a high PPV in a high-prevalence (specialist clinic) setting and a low PPV in a low-prevalence (population screening) setting. This is the single most important caveat when applying published test data to your own patients.
| 0 | 1 | 2 | 3 |
|---|---|---|---|
| Population screening | Low | PPV LOW (many false positives) | NPV high |
| Symptomatic clinic referral | Moderate-high | PPV higher | NPV lower |
| Tertiary/specialist with classic features | High | PPV HIGH | NPV lower |
Likelihood Ratios & Post-test Probability
Likelihood ratios (LRs) combine sensitivity and specificity into a single number that tells you how much a given result changes the odds of disease - and, unlike predictive values, they are independent of prevalence:
- LR+ = sensitivity / (1 - specificity) - how much MORE likely a positive result is in disease than in health.
- LR- = (1 - sensitivity) / specificity - how much LESS likely a negative result makes disease.
You apply them as: pre-test odds x LR = post-test odds (convert probability to odds, multiply, convert back; a Fagan nomogram does this graphically).
| 0 | 1 | 2 |
|---|---|---|
| greater than 10 | less than 0.1 | Large, often conclusive change |
| 5 to 10 | 0.1 to 0.2 | Moderate change |
| 2 to 5 | 0.2 to 0.5 | Small change |
| 1 to 2 | 0.5 to 1 | Minimal / rarely important change |
| 1 | 1 | No change (test useless at that result) |
Using the Lachman data (sensitivity 85%, specificity 94%): LR+ = 0.85 / (1 - 0.94) = 0.85 / 0.06 = ~14 (a positive Lachman strongly raises the probability of ACL rupture), and LR- = (1 - 0.85) / 0.94 = 0.15 / 0.94 = ~0.16 (a negative Lachman meaningfully lowers it). The high LR+ explains why a positive Lachman is so persuasive.
The ROC Curve & AUC
For a test measured on a continuous (or ordinal) scale, the ROC curve plots sensitivity (true positive rate) on the y-axis against 1 - specificity (false positive rate) on the x-axis as the threshold is varied across its full range. A test with no discriminating ability follows the diagonal (the line of chance); a good test bows toward the top-left corner. The area under the curve (AUC) summarises overall discrimination in a single number:
| 0 | 1 |
|---|---|
| 0.5 | No better than chance |
| 0.7 to 0.8 | Acceptable |
| 0.8 to 0.9 | Excellent |
| greater than 0.9 | Outstanding |
| 1.0 | Perfect separation |

The ROC curve makes the threshold-independent performance of a test visible and lets you compare two tests (the one with the larger AUC discriminates better overall) and choose an operating point that balances the costs of false negatives and false positives for your clinical question.
Evidence & Key Studies
Clinical diagnosis of an anterior cruciate ligament rupture: a meta-analysis
- Pooled across 28 studies: the Lachman test had sensitivity 85% (95% CI 83-87) and specificity 94% (92-95) - a strong screening test (SnNout).
- The pivot-shift test was highly specific (98%, 96-99) but insensitive (24%, 21-27) - a confirmatory test (SpPin); the anterior drawer performed well in chronic but not acute injury.
- Demonstrates how sensitivity, specificity and the SnNout/SpPin principle guide which clinical tests to combine.
Determining risk of falls in community-dwelling older adults: a systematic review and meta-analysis using post-test probability
- Calculated sensitivity, specificity, likelihood ratios and post-test probability for many fall-risk measures, showing how these statistics are applied to real clinical tests.
- No single test had strong post-test probability; cumulative (combined) measures performed better - illustrating how likelihood ratios chain from pre-test to post-test probability.
- Identifies the best-supported functional measures (e.g. Timed Up and Go, Berg Balance Scale, 5x sit-to-stand) by their diagnostic statistics.
According to PubMed, the orthopaedic test-performance figures used as worked examples (Lachman, pivot shift, anterior drawer) come from the cited ACL meta-analysis, and the likelihood-ratio/post-test-probability application from the cited falls meta-analysis. The formulas and definitions are standard biostatistics and are presented as mathematical relationships, not empirical claims.
Clinical Decision Scenarios
Practise clinical reasoning and management decisions out loud
“Draw a 2x2 table for a diagnostic test against a gold standard and define sensitivity, specificity, and the positive and negative predictive values. Which depend on prevalence?”
“What is a likelihood ratio, why is it useful, and what is a ROC curve?”
Mnemonics & Memory Aids
SnNout / SpPin
Hook:SnNout and SpPin: sensitive-negative rules out, specific-positive rules in.
COLUMNS
Hook:Read sensitivity/specificity down the COLUMNS, predictive values across the rows.
From the 2x2 table
- Sensitivity = TP/(TP+FN); Specificity = TN/(TN+FP) - down the columns, prevalence-independent
- PPV = TP/(TP+FP); NPV = TN/(TN+FN) - across the rows, prevalence-DEPENDENT
- SnNout: sensitive + negative rules OUT; SpPin: specific + positive rules IN
Predictive values
- Answer the patient's question but depend on prevalence (pre-test probability)
- Low prevalence (screening) -> low PPV, high NPV
- High prevalence (specialist) -> high PPV
Likelihood ratios
- LR+ = Sn/(1-Sp); LR- = (1-Sn)/Sp; prevalence-independent
- Pre-test odds x LR = post-test odds (Fagan nomogram)
- LR+ >10 or LR- <0.1 = large, often conclusive change
ROC / AUC
- ROC = sensitivity vs (1 - specificity) across thresholds
- Good test bows to top-left; diagonal = chance
- AUC 0.5 chance, 0.8-0.9 excellent, 1.0 perfect; compare tests by AUC