Patient-Reported Outcomes | Measurement Properties | Clinical Application
Outcome Measure Types
Critical Must-Knows
- PROM: Patient-Reported Outcome Measure - patient completes without clinician interpretation. Captures patient perspective.
- MCID: Smallest change in score that patients perceive as meaningful benefit. Essential for clinical interpretation.
- Validity: Does the measure assess what it claims to assess? (content, construct, criterion validity)
- Reliability: Does the measure give consistent results? (test-retest, inter-rater, internal consistency)
- Responsiveness: Can the measure detect clinically meaningful change over time? (ceiling/floor effects)
Clinical Pearls
- "SF-36 has 2 components: Physical (PCS) and Mental (MCS) - scored 0-100, higher is better
- "WOMAC assesses 3 domains: Pain, Stiffness, Function - scored 0-96, lower is better (or normalized 0-100)
- "DASH measures upper extremity disability - 0-100 scale, 0 = no disability
- "Floor/ceiling effects over 15% indicate measure may not detect worsening or improvement
Clinical Imaging
Imaging Gallery




Critical PROM Concepts
Why PROMs Matter
Patient-Centered Care: Surgeon assessment may not match patient experience. PROMs capture what matters to patients - pain, function, quality of life. Required for value-based care.
MCID is Essential
Clinical Significance: A statistically significant change (p less than 0.05) may not matter to patients. MCID defines meaningful improvement. Compare observed change to MCID, not just p-value.
Generic vs Specific
Trade-off: Generic (SF-36) allows comparison across conditions but less sensitive. Specific (WOMAC) highly sensitive to joint pathology but cannot compare to other joints.
Measurement Properties
Quality Assessment: Valid (measures what it claims), Reliable (consistent results), Responsive (detects change). Poor properties = unreliable conclusions.
At a Glance
Patient-Reported Outcome Measures (PROMs) capture the patient's perspective on pain, function, and quality of life without clinician interpretation. The MCID (Minimal Clinically Important Difference) defines the smallest change that patients perceive as meaningful—compare observed change to MCID, not just p-values. PROMs are classified as generic (SF-36, EQ-5D—compare across conditions), region-specific (DASH for upper limb, LEFS for lower limb), or joint/disease-specific (WOMAC for hip/knee, ODI for spine—most sensitive to pathology). Key measurement properties are validity (measures what it claims), reliability (consistent results), and responsiveness (detects change over time). Floor and ceiling effects greater than 15% indicate the measure cannot detect deterioration or improvement respectively.
VRRMeasurement Properties (PROM Quality)
| V | Validity Does it measure what it claims? (Content, Construct, Criterion) |
| R | Reliability Consistent results? (Test-retest, Inter-rater, Internal consistency) |
| R | Responsiveness Detects change over time? (Minimal floor/ceiling effects) |
| V | Validity Does it measure what it claims? (Content, Construct, Criterion) |
| R | Reliability Consistent results? (Test-retest, Inter-rater, Internal consistency) |
| R | Responsiveness Detects change over time? (Minimal floor/ceiling effects) |
Hook:VRR your PROMs - Validity, Reliability, Responsiveness ensure high-quality outcome measurement!
SWANKCommon Orthopaedic PROMs by Region
| S | Shoulder: ASES, Constant ASES = American Shoulder and Elbow Surgeons score |
| W | Wrist/Hand: DASH, QuickDASH DASH = Disabilities of Arm, Shoulder, and Hand |
| A | All Regions: SF-36, EQ-5D Generic health status measures |
| N | kNee/Hip: WOMAC, OKS/OHS WOMAC most common for hip/knee arthritis |
| K | bacK/Spine: ODI, NDI ODI = Oswestry Disability Index for lumbar spine |
| S | Shoulder: ASES, Constant ASES = American Shoulder and Elbow Surgeons score | N | kNee/Hip: WOMAC, OKS/OHS WOMAC most common for hip/knee arthritis |
| W | Wrist/Hand: DASH, QuickDASH DASH = Disabilities of Arm, Shoulder, and Hand | K | bacK/Spine: ODI, NDI ODI = Oswestry Disability Index for lumbar spine |
| A | All Regions: SF-36, EQ-5D Generic health status measures |
Hook:SWANK PROMs cover all major orthopaedic regions - memorize these for exams!
Overview and Introduction
What are PROMs?
Patient-Reported Outcome Measures (PROMs) are standardized, validated questionnaires that patients complete without clinician interpretation. They capture the patient perspective on health status, symptoms, function, and quality of life.
Why PROMs Matter:
- Patient-Centered Care: Surgeon assessment may not match patient experience
- Quantifies Subjective Outcomes: Pain, function, satisfaction cannot be objectively measured
- Value-Based Care: Payers increasingly link reimbursement to patient-reported outcomes
- Quality Improvement: Registries (AOANJRR) use PROMs to benchmark performance
- Research: Essential for clinical trials to demonstrate treatment efficacy
PROM vs Clinician-Reported Outcomes:
- PROMs capture what matters to patients (pain, daily activities, quality of life)
- Clinician measures (ROM, strength) important but may not correlate with patient satisfaction
- Best practice: Use both PROMs and objective measures
Principles of Outcome Measurement
Outcome measurement rests on a hierarchy: what you measure, how you measure it, and how you interpret it. The WHO ICF framework (body structure/function, activity, participation) is a useful map - PROMs predominantly capture the activity and participation levels, while clinician measures (range of motion, strength, radiographs) capture body structure and function.
Types of outcome:
- Patient-Reported Outcome Measures (PROMs) - the patient's own rating of symptoms, function and quality of life, with no clinician interpretation.
- Clinician-Reported Outcomes (ClinROs) - examiner-derived (range of motion, Constant strength, neurological grade).
- Performance Outcomes (PerfOs) - observed task performance (Timed Up-and-Go, six-minute walk).
- Composite scores - blend domains (e.g. Constant-Murley combines patient pain with examiner-measured strength and range), which improves breadth but can obscure which domain drives a change.
Anchoring concepts for interpretation:
- MCID - smallest change a patient perceives as worthwhile (see dedicated section).
- PASS (Patient Acceptable Symptom State) - the post-treatment score above which a patient considers their state satisfactory; increasingly preferred to MCID because it reports an attainable end state rather than a change.
- SCB (Substantial Clinical Benefit) - a higher threshold than MCID denoting a large, clearly meaningful improvement.
- Floor and ceiling effects - distort responsiveness when too many patients cluster at the extremes.
A good study pre-specifies a single primary PROM, justifies it on measurement properties, and reports both mean change versus MCID and the proportion of patients reaching MCID or PASS.
Types of Outcome Measures
Generic PROMs
Purpose: Assess overall health status across any condition. Allow comparison between different diseases and populations.
SF-36 (Short Form-36 Health Survey)
Description: 36-item generic health status measure.
Domains (8 subscales):
- Physical Functioning
- Role Physical (work/activities due to physical health)
- Bodily Pain
- General Health
- Vitality (energy/fatigue)
- Social Functioning
- Role Emotional (work/activities due to emotional problems)
- Mental Health
Scoring:
- Each subscale: 0-100 (higher = better health)
- Physical Component Summary (PCS): Aggregate of physical domains
- Mental Component Summary (MCS): Aggregate of mental domains
MCID: Approximately 5 points for PCS and MCS.
Advantages: Population norms available, allows cross-disease comparison.
Limitations: Less sensitive to specific musculoskeletal pathology than joint-specific measures.
SF-36 is the most widely used generic PROM in orthopaedic research.
Joint-Specific PROMs
WOMAC (Western Ontario and McMaster Universities Arthritis Index)
Description: Most widely used PROM for hip and knee osteoarthritis.
Domains (24 items):
- Pain (5 items): Pain with various activities
- Stiffness (2 items): Morning and later-day stiffness
- Physical Function (17 items): Difficulty with daily activities
Scoring Options:
- Likert Scale: 0-4 per item, total 0-96 (lower = better)
- VAS: 0-100mm per item
- Often normalized to 0-100 scale (higher = better or lower = worse depending on version)
MCID: Approximately 10-15 points (on 100-point scale).
Advantages: Excellent validity and reliability for hip/knee OA, widely used in arthroplasty research.
Limitations: Designed for arthritis - less applicable to ligament injuries, fractures.
WOMAC is the gold standard for hip and knee arthroplasty outcome assessment.
Choosing Between PROMs: A Comparison
The most common exam error is treating all PROMs as interchangeable. The table below contrasts the major instrument types so you can justify a choice under viva pressure.
PROM Types: Strengths, Limitations and Best Use
| Instrument Type | Examples | Key Strength | Key Limitation | Best Use |
|---|---|---|---|---|
| Generic profile | SF-36, SF-12 | Cross-disease comparison, population norms, captures whole-person health | Lower responsiveness to focal joint pathology | Secondary outcome; comparing burden across conditions |
| Generic utility | EQ-5D, SF-6D | Generates QALY utility (0 to 1) for cost-utility analysis | Coarse (few levels); ceiling effects in healthy people | Health-economic evaluation, payer/HTA submissions |
| Region-specific | DASH/QuickDASH, LEFS | One score across a whole limb when pathology spans joints | Less sensitive than single-joint scores | Multi-level or undefined upper/lower limb pathology |
| Joint/disease-specific | WOMAC, OHS/OKS, ASES, ODI | Highest responsiveness to the target joint or disease | Cannot compare across joints or to general population | Primary outcome in arthroplasty/disease-specific trials and registries |
Controversies and Areas of Uncertainty
PROM science is evolving and several issues remain genuinely unsettled - useful "areas of debate" answers in a viva.
MCID is not a single number
The same PROM yields different MCIDs depending on whether an anchor-based or distribution-based method is used, the anchor question, baseline severity, and follow-up timing. Quoting "the MCID" as if fixed is a recognised pitfall - always state the method and population.
MCID versus PASS
MCID reports change; PASS reports an acceptable end state. A patient can exceed the MCID yet remain symptomatic and dissatisfied. Many groups now favour PASS or the proportion reaching a "good outcome" as more patient-relevant than mean change.
Ceiling effects and legacy scores
Widely used scores (Constant, Harris Hip, some Oxford items) show marked ceiling effects in well-functioning patients, masking further improvement and biasing comparisons of already good results.
Response shift and missing data
Patients recalibrate their internal standard for "good" over time (response shift), complicating before/after comparisons. Differential loss to follow-up - sicker patients dropping out - inflates apparent improvement; complete-case analysis is a common source of bias.
CAT and item-response theory
Computer-adaptive testing (e.g. PROMIS) tailors items to the respondent, reducing burden and floor/ceiling effects, but legacy thresholds (MCID, ICC targets) do not transfer directly and cross-walks are imperfect.
Linking PROMs to payment
Using PROMs for reimbursement or surgeon-level ranking risks gaming, risk-aversion (avoiding complex patients) and inadequate case-mix adjustment - reasons several systems publish PROMs for benchmarking rather than direct pay-for-performance.
Measurement Properties
Validity
Definition: Does the measure assess what it claims to assess?
Types of Validity
| Type | Definition | How to Assess | Example |
|---|---|---|---|
| Content Validity | Covers all relevant aspects of construct | Expert panel review, patient input | WOMAC includes pain, stiffness, function for OA |
| Construct Validity | Correlates with related measures, discriminates from unrelated | Correlation with similar PROMs (convergent), lack of correlation with dissimilar (discriminant) | WOMAC correlates with knee ROM (convergent) but not with mental health scores (discriminant) |
| Criterion Validity | Correlates with gold standard | Compare to established measure | New knee score correlates with WOMAC |
Reliability
Definition: Does the measure give consistent results when condition is stable?
Types of Reliability
| Type | Definition | How to Assess | Target |
|---|---|---|---|
| Test-Retest | Same result when repeated in stable patients | Intraclass Correlation Coefficient (ICC) | ICC greater than 0.70 |
| Inter-Rater | Different raters get same result | ICC for clinician-administered measures | ICC greater than 0.70 |
| Internal Consistency | Items within scale measure same construct | Cronbach alpha | Alpha 0.70 to 0.95 (too high suggests redundancy) |
Responsiveness
Definition: Can the measure detect clinically meaningful change over time?
Floor Effect: High proportion (over 15%) score at minimum (worst possible).
- Problem: Cannot detect worsening in these patients.
Ceiling Effect: High proportion score at maximum (best possible).
- Problem: Cannot detect improvement in these patients.
Responsiveness Index: Standardized Response Mean (SRM) or Effect Size.
- SRM greater than 0.8: Large responsiveness (good)
- SRM 0.5 to 0.8: Moderate responsiveness
- SRM less than 0.5: Small responsiveness (may miss change)
Understanding responsiveness prevents choosing measures that cannot detect improvement.
Minimal Clinically Important Difference (MCID)
What is MCID?
Definition: The smallest change in PROM score that patients perceive as beneficial and would mandate a change in management.
Purpose: Distinguish statistically significant from clinically meaningful change.
How MCID is Determined
Methods:
-
Anchor-Based: Compare PROM change to external anchor (patient global assessment)
- "Compared to before surgery, how would you rate your improvement: Much better, Better, Same, Worse?"
- Calculate MCID as mean change for "Better" group.
-
Distribution-Based: Use statistical thresholds (0.5 SD, Standard Error of Measurement)
- MCID = 0.5 × standard deviation
- Less clinically intuitive than anchor-based.
Clinical Application:
- If mean improvement = 8 points and MCID = 10 points → Improvement is statistically significant but NOT clinically meaningful.
- If 95% CI = 12 to 18 points and MCID = 10 → Entire CI exceeds MCID → Clinically meaningful improvement.
Always compare treatment effects to MCID, not just p-values.
Clinical Application and Relevance
Choosing the Right PROM
Joint-specific for sensitivity (WOMAC for THA). Generic for cross-disease comparison and population norms (SF-36). Use both when possible to capture joint-specific and overall health.
Interpreting PROM Data
Compare change to MCID, not just statistical significance. Check floor/ceiling effects - over 15% suggests measure may not detect change. Report mean change AND proportion exceeding MCID.
Registry Requirements
Major arthroplasty registries (NJR, AJRR, AOANJRR, SHAR) collect PROMs. Pre-operative baseline and post-operative follow-up (1 year, 5 year). Allows benchmarking and quality improvement.
Value-Based Care
Payers increasingly link reimbursement to PROMs. Demonstrating patient-reported improvement justifies procedures. PROMs essential for value-based contracts.
Evidence Base
WOMAC: Original Validation (Landmark)
- WOMAC developed and validated within a double-blind RCT of two NSAIDs in hip and knee osteoarthritis
- Self-administered, disease-specific instrument with pain, stiffness and physical-function subscales
- Subscales fulfilled conventional criteria for face, content and construct validity
- Demonstrated reliability, responsiveness and relative efficiency
- Described by the authors as a 'high-performance' instrument for evaluative OA research
SF-36: Conceptual Framework (Landmark Generic PROM)
- Introduced the 36-item Short Form (SF-36) from the Medical Outcomes Study
- Surveys eight health concepts spanning physical and mental health domains
- Designed for self-administration in people aged 14 years and over, or by trained interviewer
- Built for clinical practice, research, health-policy evaluation and population surveys
- Established the template for generic, profile-based health-status measurement
DASH: Development of an Upper-Extremity PROM
- Joint AAOS, COMSS and Institute for Work and Health initiative to create a region-wide upper-limb measure
- Item generation produced 821 candidate items reduced to a focused symptom and function set
- Single questionnaire spans the whole upper limb rather than an isolated joint
- Field tested across centres in the United States, Canada and Australia
- Provided the basis for the validated DASH and the shortened QuickDASH
Oxford Hip Score: A Joint-Specific Registry PROM
- Developed a 12-item patient-completed questionnaire for total hip replacement (n=220, prospective)
- High internal consistency and satisfactory test-retest reproducibility
- Validity confirmed by correlation with Charnley score, SF-36 and AIMS
- Standardised effect size (responsiveness) compared favourably with SF-36 and AIMS
- Short, practical and sensitive to clinically important change after THR
Exam Viva Scenarios
Use these scenarios to practise clinical reasoning and management decisions
Scenario 1: PROM Selection
"You are planning an RCT comparing cemented vs uncemented THA. What outcome measures would you use and why?"
Scenario 2: MCID Interpretation
"An RCT of 200 patients found that new rehab protocol improved WOMAC score by mean 8 points (95% CI 5 to 11 points, p = 0.001) compared to standard protocol. The MCID for WOMAC is 10 points. How do you interpret this result?"
Scenario 3: Measurement Properties and PROM Appraisal
"A colleague proposes adopting a brand-new shoulder PROM for your unit. How would you appraise whether it is fit for purpose, and what numbers would you want to see?"
MCQ Practice Points
PROM Types
Q: What is the difference between a generic PROM (SF-36) and a joint-specific PROM (WOMAC)? A: Generic PROMs assess overall health status across any condition, allow comparison between diseases and to population norms, but are less sensitive to specific joint pathology. Joint-specific PROMs are highly sensitive to pathology in a single joint but cannot compare across different joints or to general population.
MCID Importance
Q: Why is MCID important when interpreting PROM changes? A: MCID defines clinically meaningful change - the smallest improvement that patients perceive as beneficial. Statistically significant changes (p less than 0.05) may not exceed MCID and thus not be clinically important. Always compare observed change to MCID, not just p-value.
Floor and Ceiling Effects
Q: What is a ceiling effect and why does it matter? A: Ceiling effect occurs when high proportion (over 15%) of patients score at maximum (best possible score). This prevents the measure from detecting improvement in these patients and reduces responsiveness. Choose a different measure or add a more challenging domain if ceiling effects are problematic.
Validity vs Reliability
Q: What is the difference between validity and reliability? A: Validity = Does the measure assess what it claims to assess? (accuracy). Reliability = Does the measure give consistent results when repeated in stable patients? (precision). A measure can be reliable but not valid (consistently wrong), but cannot be valid without being reliable.
Test-Retest Reliability
Q: What ICC value indicates good test-retest reliability? A: ICC greater than 0.70 indicates acceptable reliability. ICC (Intraclass Correlation Coefficient) ranges 0-1. ICC greater than 0.90 is excellent, 0.70-0.90 is good, less than 0.70 is poor. This measures consistency when same patient completes PROM twice with stable condition.
Responsiveness Measures
Q: How is responsiveness quantified? A: Standardized Response Mean (SRM) or Effect Size. SRM = mean change / SD of change. SRM greater than 0.8 = large responsiveness (good), 0.5-0.8 = moderate, less than 0.5 = small (may miss clinically important change). Responsiveness is essential for detecting treatment effects.
Guidelines, Registries & Global Practice
PROM collection has shifted from research tool to routine quality infrastructure worldwide, but the chosen instruments, mandate and uptake vary by region.
Global epidemiology and uptake. Joint arthroplasty is among the most-studied PROM settings: large registry programmes consistently show that the majority of hip and knee replacement patients achieve improvements exceeding the MCID at one year, while a meaningful minority (commonly cited around one in five for knees) report being unsatisfied - a gap PROMs make visible that complication rates alone miss. Uptake is high in publicly funded, registry-linked systems and far patchier where collection is voluntary or unfunded.
Guidance and Registry Programmes Side by Side
How Major Bodies Approach PROMs
| Body / Region | Stance on PROMs | Typical Instruments |
|---|---|---|
| NHS / NJR (UK) | National PROMs programme historically mandated pre/post hip and knee replacement; NJR links implant survival to outcomes | Oxford Hip/Knee Score, EQ-5D |
| AOANJRR (Australia) | Registry-integrated PROM collection at standardised intervals for benchmarking | Oxford Hip/Knee Score, EQ-5D, VAS |
| AAOS / registries (US) | AAOS promotes PROMs and CMS value-based programmes increasingly require them; AJRR collects PROMs | HOOS/KOOS JR, PROMIS, VR-12 |
| Nordic registries (SHAR, NAR) | Long-standing registry PROM collection underpinning revision and bearing comparisons | EQ-5D, joint-specific scores, satisfaction VAS |
| ICHOM (international) | Defines standard outcome sets to harmonise PROMs across countries for a given condition | Condition-specific standard sets (e.g. hip/knee OA) |
Methodological standards. The COSMIN initiative (Mokkink and colleagues) provides the most widely cited international consensus framework for selecting and appraising PROMs, complementing earlier quality criteria. ICHOM standard sets push toward globally comparable outcome reporting.
High- versus limited-resource practice variation. In well-resourced, registry-linked systems, electronic PROM (ePROM) capture is increasingly embedded in routine care, enabling case-mix-adjusted benchmarking. In limited-resource settings, barriers include literacy and language (validated translations are not universal), lack of electronic infrastructure, staffing for follow-up, and the cost of licensed instruments - so brief, free, culturally validated tools (and pragmatic VAS/EQ-5D use) are favoured. The principle that statistical significance does not equal clinical significance, and that MCID/PASS should anchor interpretation, applies in every setting regardless of resource level.
Management Algorithm

OUTCOME MEASURES AND PROMs
Clinical summary
Common Orthopaedic PROMs
- •Generic: SF-36 (PCS/MCS, 0-100, higher better), EQ-5D (utility 0-1)
- •Hip/Knee: WOMAC (pain/stiffness/function, 0-96 or 0-100, lower or higher better depending on version)
- •Upper Extremity: DASH (0-100, 0 = no disability), QuickDASH (11 items)
- •Shoulder: ASES (0-100, higher better), Constant score
- •Spine: ODI (Oswestry 0-100%, lower better), NDI (Neck Disability)
MCID Values
- •SF-36 PCS/MCS: MCID approximately 5 points
- •WOMAC: MCID 10-15 points (on 100-point scale)
- •DASH: MCID 10-15 points
- •VAS Pain: MCID 15-20mm (on 100mm scale)
- •Always compare treatment effect to MCID for clinical significance
Measurement Properties
- •Validity = Does it measure what it claims? (content, construct, criterion)
- •Reliability = Consistent results? (test-retest ICC greater than 0.70, Cronbach alpha 0.70-0.95)
- •Responsiveness = Detects change? (SRM greater than 0.8 = large, less than 15% floor/ceiling effects)
- •Floor effect = Too many at minimum (cannot detect worsening)
- •Ceiling effect = Too many at maximum (cannot detect improvement)
PROM Selection
- •Joint-specific for sensitivity (WOMAC for THA trial)
- •Generic for cross-disease comparison and population norms (SF-36)
- •Utility measure for cost-effectiveness (EQ-5D)
- •Use combination: Joint-specific (primary) + Generic (secondary)
- •Check floor/ceiling effects (over 15% problematic)
Interpreting PROM Data
- •Compare mean change to MCID, not just p-value
- •Check if 95% CI excludes MCID threshold
- •Report proportion of patients achieving MCID
- •Wide CI crossing MCID = uncertain clinical significance
- •Large sample with trivial effect (below MCID) = not clinically important
Clinical Application
- •Major registries (NJR, AJRR, AOANJRR) collect PROMs (baseline and follow-up)
- •Value-based care links reimbursement to PROM improvement
- •Statistical significance ≠Clinical significance
- •Generic vs Specific trade-off: Comparison vs Sensitivity
- •Pre-specify primary PROM and timing in study protocol