High-yield overview

Patient-Reported Outcomes | Measurement Properties | Clinical Application

PROMPatient-Reported Outcome Measure

MCIDMinimal Clinically Important Difference

VASVisual Analog Scale (0-100mm)

SF-36Generic Health Status Measure

Outcome Measure Types

Generic PROMs

PatternSF-36, EQ-5D - any condition

TreatmentCompare across diseases, population norms

Region-Specific PROMs

PatternDASH (arm), LEFS (leg) - anatomic region

TreatmentSensitive to regional pathology

Joint-Specific PROMs

PatternWOMAC (hip/knee), ASES (shoulder) - single joint

TreatmentMost sensitive to joint pathology

Disease-Specific PROMs

PatternODI (spine), FAAM (ankle) - specific condition

TreatmentTailored to disease features

Critical Must-Knows

PROM: Patient-Reported Outcome Measure - patient completes without clinician interpretation. Captures patient perspective.
MCID: Smallest change in score that patients perceive as meaningful benefit. Essential for clinical interpretation.
Validity: Does the measure assess what it claims to assess? (content, construct, criterion validity)
Reliability: Does the measure give consistent results? (test-retest, inter-rater, internal consistency)
Responsiveness: Can the measure detect clinically meaningful change over time? (ceiling/floor effects)

Clinical Pearls

"
SF-36 has 2 components: Physical (PCS) and Mental (MCS) - scored 0-100, higher is better
"
WOMAC assesses 3 domains: Pain, Stiffness, Function - scored 0-96, lower is better (or normalized 0-100)
"
DASH measures upper extremity disability - 0-100 scale, 0 = no disability
"
Floor/ceiling effects over 15% indicate measure may not detect worsening or improvement

Clinical Imaging

Imaging Gallery

SF12v2 PCS and HOOS threshold values (represented by dashed lines) are dependent on preoperative MCS score and demonstrate a linear relationship. Postoperative data are plotted in a binned fashion, wh — SF12v2 PCS and HOOS threshold values (represented by dashed lines) are dependent on preoperative MCS score and demonstrate a linear relationship. PostCredit: Berliner JL et al. via Clin. Orthop. Relat. Res. via Open-i (NIH) (Open Access (CC BY))

A diagrammatic representation of different alignment parameters based on The Knee Society Total Knee Arthroplasty Roentgenographic Evaluation and Scoring System (Viswanathan et al. 2008a). The Coronal — A diagrammatic representation of different alignment parameters based on The Knee Society Total Knee Arthroplasty Roentgenographic Evaluation and ScorCredit: Hadi M et al. via Springerplus via Open-i (NIH) (Open Access (CC BY))

1-year postoperative estimated point defi cit in quality of life estimated by SF-36, in elderly and younger patients operated for LDH compared to a published age-matched reference data population (Sul — 1-year postoperative estimated point defi cit in quality of life estimated by SF-36, in elderly and younger patients operated for LDH compared to a puCredit: Open-i / NIH via Open-i (NIH) (Open Access (CC BY))

(A) A photograph of the electronic Patient-Reported Outcome Measures (ePROMs) portal being used on a tablet device in the outpatient setting. (B) A photograph of a patient completing an ePROMs quality — (A) A photograph of the electronic Patient-Reported Outcome Measures (ePROMs) portal being used on a tablet device in the outpatient setting. (B) A phCredit: Malhotra K et al. via BMJ Open via Open-i (NIH) (Open Access (CC BY))

Critical PROM Concepts

Why PROMs Matter

Patient-Centered Care: Surgeon assessment may not match patient experience. PROMs capture what matters to patients - pain, function, quality of life. Required for value-based care.

MCID is Essential

Clinical Significance: A statistically significant change (p less than 0.05) may not matter to patients. MCID defines meaningful improvement. Compare observed change to MCID, not just p-value.

Generic vs Specific

Trade-off: Generic (SF-36) allows comparison across conditions but less sensitive. Specific (WOMAC) highly sensitive to joint pathology but cannot compare to other joints.

Measurement Properties

Quality Assessment: Valid (measures what it claims), Reliable (consistent results), Responsive (detects change). Poor properties = unreliable conclusions.

At a Glance

Patient-Reported Outcome Measures (PROMs) capture the patient's perspective on pain, function, and quality of life without clinician interpretation. The MCID (Minimal Clinically Important Difference) defines the smallest change that patients perceive as meaningful—compare observed change to MCID, not just p-values. PROMs are classified as generic (SF-36, EQ-5D—compare across conditions), region-specific (DASH for upper limb, LEFS for lower limb), or joint/disease-specific (WOMAC for hip/knee, ODI for spine—most sensitive to pathology). Key measurement properties are validity (measures what it claims), reliability (consistent results), and responsiveness (detects change over time). Floor and ceiling effects greater than 15% indicate the measure cannot detect deterioration or improvement respectively.

Mnemonic

VRRMeasurement Properties (PROM Quality)

V	Validity Does it measure what it claims? (Content, Construct, Criterion)
R	Reliability Consistent results? (Test-retest, Inter-rater, Internal consistency)
R	Responsiveness Detects change over time? (Minimal floor/ceiling effects)

V	Validity Does it measure what it claims? (Content, Construct, Criterion)
R	Reliability Consistent results? (Test-retest, Inter-rater, Internal consistency)
R	Responsiveness Detects change over time? (Minimal floor/ceiling effects)

Hook:VRR your PROMs - Validity, Reliability, Responsiveness ensure high-quality outcome measurement!

Mnemonic

SWANKCommon Orthopaedic PROMs by Region

S	Shoulder: ASES, Constant ASES = American Shoulder and Elbow Surgeons score
W	Wrist/Hand: DASH, QuickDASH DASH = Disabilities of Arm, Shoulder, and Hand
A	All Regions: SF-36, EQ-5D Generic health status measures
N	kNee/Hip: WOMAC, OKS/OHS WOMAC most common for hip/knee arthritis
K	bacK/Spine: ODI, NDI ODI = Oswestry Disability Index for lumbar spine

S	Shoulder: ASES, Constant ASES = American Shoulder and Elbow Surgeons score	N	kNee/Hip: WOMAC, OKS/OHS WOMAC most common for hip/knee arthritis
W	Wrist/Hand: DASH, QuickDASH DASH = Disabilities of Arm, Shoulder, and Hand	K	bacK/Spine: ODI, NDI ODI = Oswestry Disability Index for lumbar spine
A	All Regions: SF-36, EQ-5D Generic health status measures

Hook:SWANK PROMs cover all major orthopaedic regions - memorize these for exams!

Overview and Introduction

What are PROMs?

Patient-Reported Outcome Measures (PROMs) are standardized, validated questionnaires that patients complete without clinician interpretation. They capture the patient perspective on health status, symptoms, function, and quality of life.

Why PROMs Matter:

Patient-Centered Care: Surgeon assessment may not match patient experience
Quantifies Subjective Outcomes: Pain, function, satisfaction cannot be objectively measured
Value-Based Care: Payers increasingly link reimbursement to patient-reported outcomes
Quality Improvement: Registries (AOANJRR) use PROMs to benchmark performance
Research: Essential for clinical trials to demonstrate treatment efficacy

PROM vs Clinician-Reported Outcomes:

PROMs capture what matters to patients (pain, daily activities, quality of life)
Clinician measures (ROM, strength) important but may not correlate with patient satisfaction
Best practice: Use both PROMs and objective measures

Principles of Outcome Measurement

Outcome measurement rests on a hierarchy: what you measure, how you measure it, and how you interpret it. The WHO ICF framework (body structure/function, activity, participation) is a useful map - PROMs predominantly capture the activity and participation levels, while clinician measures (range of motion, strength, radiographs) capture body structure and function.

Types of outcome:

Patient-Reported Outcome Measures (PROMs) - the patient's own rating of symptoms, function and quality of life, with no clinician interpretation.
Clinician-Reported Outcomes (ClinROs) - examiner-derived (range of motion, Constant strength, neurological grade).
Performance Outcomes (PerfOs) - observed task performance (Timed Up-and-Go, six-minute walk).
Composite scores - blend domains (e.g. Constant-Murley combines patient pain with examiner-measured strength and range), which improves breadth but can obscure which domain drives a change.

Anchoring concepts for interpretation:

MCID - smallest change a patient perceives as worthwhile (see dedicated section).
PASS (Patient Acceptable Symptom State) - the post-treatment score above which a patient considers their state satisfactory; increasingly preferred to MCID because it reports an attainable end state rather than a change.
SCB (Substantial Clinical Benefit) - a higher threshold than MCID denoting a large, clearly meaningful improvement.
Floor and ceiling effects - distort responsiveness when too many patients cluster at the extremes.

A good study pre-specifies a single primary PROM, justifies it on measurement properties, and reports both mean change versus MCID and the proportion of patients reaching MCID or PASS.

Types of Outcome Measures

Generic PROMs

Purpose: Assess overall health status across any condition. Allow comparison between different diseases and populations.

SF-36 (Short Form-36 Health Survey)

Description: 36-item generic health status measure.

Domains (8 subscales):

Physical Functioning
Role Physical (work/activities due to physical health)
Bodily Pain
General Health
Vitality (energy/fatigue)
Social Functioning
Role Emotional (work/activities due to emotional problems)
Mental Health

Scoring:

Each subscale: 0-100 (higher = better health)
Physical Component Summary (PCS): Aggregate of physical domains
Mental Component Summary (MCS): Aggregate of mental domains

MCID: Approximately 5 points for PCS and MCS.

Advantages: Population norms available, allows cross-disease comparison.

Limitations: Less sensitive to specific musculoskeletal pathology than joint-specific measures.

SF-36 is the most widely used generic PROM in orthopaedic research.

Joint-Specific PROMs

WOMAC (Western Ontario and McMaster Universities Arthritis Index)

Description: Most widely used PROM for hip and knee osteoarthritis.

Domains (24 items):

Pain (5 items): Pain with various activities
Stiffness (2 items): Morning and later-day stiffness
Physical Function (17 items): Difficulty with daily activities

Scoring Options:

Likert Scale: 0-4 per item, total 0-96 (lower = better)
VAS: 0-100mm per item
Often normalized to 0-100 scale (higher = better or lower = worse depending on version)

MCID: Approximately 10-15 points (on 100-point scale).

Advantages: Excellent validity and reliability for hip/knee OA, widely used in arthroplasty research.

Limitations: Designed for arthritis - less applicable to ligament injuries, fractures.

WOMAC is the gold standard for hip and knee arthroplasty outcome assessment.

Choosing Between PROMs: A Comparison

The most common exam error is treating all PROMs as interchangeable. The table below contrasts the major instrument types so you can justify a choice under viva pressure.

PROM Types: Strengths, Limitations and Best Use

Instrument Type	Examples	Key Strength	Key Limitation	Best Use
Generic profile	SF-36, SF-12	Cross-disease comparison, population norms, captures whole-person health	Lower responsiveness to focal joint pathology	Secondary outcome; comparing burden across conditions
Generic utility	EQ-5D, SF-6D	Generates QALY utility (0 to 1) for cost-utility analysis	Coarse (few levels); ceiling effects in healthy people	Health-economic evaluation, payer/HTA submissions
Region-specific	DASH/QuickDASH, LEFS	One score across a whole limb when pathology spans joints	Less sensitive than single-joint scores	Multi-level or undefined upper/lower limb pathology
Joint/disease-specific	WOMAC, OHS/OKS, ASES, ODI	Highest responsiveness to the target joint or disease	Cannot compare across joints or to general population	Primary outcome in arthroplasty/disease-specific trials and registries

Controversies and Areas of Uncertainty

PROM science is evolving and several issues remain genuinely unsettled - useful "areas of debate" answers in a viva.

MCID is not a single number

The same PROM yields different MCIDs depending on whether an anchor-based or distribution-based method is used, the anchor question, baseline severity, and follow-up timing. Quoting "the MCID" as if fixed is a recognised pitfall - always state the method and population.

MCID versus PASS

MCID reports change; PASS reports an acceptable end state. A patient can exceed the MCID yet remain symptomatic and dissatisfied. Many groups now favour PASS or the proportion reaching a "good outcome" as more patient-relevant than mean change.

Ceiling effects and legacy scores

Widely used scores (Constant, Harris Hip, some Oxford items) show marked ceiling effects in well-functioning patients, masking further improvement and biasing comparisons of already good results.

Response shift and missing data

Patients recalibrate their internal standard for "good" over time (response shift), complicating before/after comparisons. Differential loss to follow-up - sicker patients dropping out - inflates apparent improvement; complete-case analysis is a common source of bias.

CAT and item-response theory

Computer-adaptive testing (e.g. PROMIS) tailors items to the respondent, reducing burden and floor/ceiling effects, but legacy thresholds (MCID, ICC targets) do not transfer directly and cross-walks are imperfect.

Linking PROMs to payment

Using PROMs for reimbursement or surgeon-level ranking risks gaming, risk-aversion (avoiding complex patients) and inadequate case-mix adjustment - reasons several systems publish PROMs for benchmarking rather than direct pay-for-performance.

Measurement Properties

Validity

Definition: Does the measure assess what it claims to assess?

Types of Validity

Type	Definition	How to Assess	Example
Content Validity	Covers all relevant aspects of construct	Expert panel review, patient input	WOMAC includes pain, stiffness, function for OA
Construct Validity	Correlates with related measures, discriminates from unrelated	Correlation with similar PROMs (convergent), lack of correlation with dissimilar (discriminant)	WOMAC correlates with knee ROM (convergent) but not with mental health scores (discriminant)
Criterion Validity	Correlates with gold standard	Compare to established measure	New knee score correlates with WOMAC

Reliability

Definition: Does the measure give consistent results when condition is stable?

Types of Reliability

Type	Definition	How to Assess	Target
Test-Retest	Same result when repeated in stable patients	Intraclass Correlation Coefficient (ICC)	ICC greater than 0.70
Inter-Rater	Different raters get same result	ICC for clinician-administered measures	ICC greater than 0.70
Internal Consistency	Items within scale measure same construct	Cronbach alpha	Alpha 0.70 to 0.95 (too high suggests redundancy)

Responsiveness

Definition: Can the measure detect clinically meaningful change over time?

Floor Effect: High proportion (over 15%) score at minimum (worst possible).

Problem: Cannot detect worsening in these patients.

Ceiling Effect: High proportion score at maximum (best possible).

Problem: Cannot detect improvement in these patients.

Responsiveness Index: Standardized Response Mean (SRM) or Effect Size.

SRM greater than 0.8: Large responsiveness (good)
SRM 0.5 to 0.8: Moderate responsiveness
SRM less than 0.5: Small responsiveness (may miss change)

Understanding responsiveness prevents choosing measures that cannot detect improvement.

Minimal Clinically Important Difference (MCID)

What is MCID?

Definition: The smallest change in PROM score that patients perceive as beneficial and would mandate a change in management.

Purpose: Distinguish statistically significant from clinically meaningful change.

How MCID is Determined

Methods:

Anchor-Based: Compare PROM change to external anchor (patient global assessment)
- "Compared to before surgery, how would you rate your improvement: Much better, Better, Same, Worse?"
- Calculate MCID as mean change for "Better" group.
Distribution-Based: Use statistical thresholds (0.5 SD, Standard Error of Measurement)
- MCID = 0.5 × standard deviation
- Less clinically intuitive than anchor-based.

Clinical Application:

If mean improvement = 8 points and MCID = 10 points → Improvement is statistically significant but NOT clinically meaningful.
If 95% CI = 12 to 18 points and MCID = 10 → Entire CI exceeds MCID → Clinically meaningful improvement.

Always compare treatment effects to MCID, not just p-values.

Clinical Application and Relevance

Choosing the Right PROM

Joint-specific for sensitivity (WOMAC for THA). Generic for cross-disease comparison and population norms (SF-36). Use both when possible to capture joint-specific and overall health.

Interpreting PROM Data

Compare change to MCID, not just statistical significance. Check floor/ceiling effects - over 15% suggests measure may not detect change. Report mean change AND proportion exceeding MCID.

Registry Requirements

Major arthroplasty registries (NJR, AJRR, AOANJRR, SHAR) collect PROMs. Pre-operative baseline and post-operative follow-up (1 year, 5 year). Allows benchmarking and quality improvement.

Value-Based Care

Payers increasingly link reimbursement to PROMs. Demonstrating patient-reported improvement justifies procedures. PROMs essential for value-based contracts.

Evidence Base

WOMAC: Original Validation (Landmark)

Bellamy N, Buchanan WW, Goldsmith CH, Campbell J, Stitt LW • Journal of Rheumatology (1988)

Key Findings:

WOMAC developed and validated within a double-blind RCT of two NSAIDs in hip and knee osteoarthritis
Self-administered, disease-specific instrument with pain, stiffness and physical-function subscales
Subscales fulfilled conventional criteria for face, content and construct validity
Demonstrated reliability, responsiveness and relative efficiency
Described by the authors as a 'high-performance' instrument for evaluative OA research

Clinical Implication: WOMAC remains the reference joint-specific PROM for hip and knee osteoarthritis and arthroplasty research.

Limitation: Designed for arthritis - less applicable to trauma, ligament injury or non-arthritic hip pain.

Verify on PubMed (PMID 3068365)

SF-36: Conceptual Framework (Landmark Generic PROM)

Ware JE, Sherbourne CD • Medical Care (1992)

Key Findings:

Introduced the 36-item Short Form (SF-36) from the Medical Outcomes Study
Surveys eight health concepts spanning physical and mental health domains
Designed for self-administration in people aged 14 years and over, or by trained interviewer
Built for clinical practice, research, health-policy evaluation and population surveys
Established the template for generic, profile-based health-status measurement

Clinical Implication: SF-36 is the most widely used generic PROM; its eight domains aggregate into Physical (PCS) and Mental (MCS) component summaries that allow cross-disease and population-norm comparison.

Limitation: Generic by design - less responsive to focal joint pathology than disease-specific measures, with ceiling effects in healthy populations.

Verify on PubMed (PMID 1593914)

DASH: Development of an Upper-Extremity PROM

Hudak PL, Amadio PC, Bombardier C (Upper Extremity Collaborative Group) • American Journal of Industrial Medicine (1996)

Key Findings:

Joint AAOS, COMSS and Institute for Work and Health initiative to create a region-wide upper-limb measure
Item generation produced 821 candidate items reduced to a focused symptom and function set
Single questionnaire spans the whole upper limb rather than an isolated joint
Field tested across centres in the United States, Canada and Australia
Provided the basis for the validated DASH and the shortened QuickDASH

Clinical Implication: DASH allows a single, comparable disability score across any upper-limb condition, useful when pathology spans more than one joint.

Limitation: Region-specific breadth means it is less sensitive than joint-specific scores (for example ASES) for isolated shoulder or elbow problems.

Verify on PubMed (PMID 8773720)

Oxford Hip Score: A Joint-Specific Registry PROM

Dawson J, Fitzpatrick R, Carr A, Murray D • Journal of Bone and Joint Surgery (British) (1996)

Key Findings:

Developed a 12-item patient-completed questionnaire for total hip replacement (n=220, prospective)
High internal consistency and satisfactory test-retest reproducibility
Validity confirmed by correlation with Charnley score, SF-36 and AIMS
Standardised effect size (responsiveness) compared favourably with SF-36 and AIMS
Short, practical and sensitive to clinically important change after THR

Clinical Implication: The Oxford Hip Score (and its knee counterpart, OKS) is brief, responsive and is the joint-specific PROM used by major arthroplasty registries such as the UK NJR for hip and knee outcomes.

Limitation: Designed for arthroplasty outcome assessment; less suited to non-arthritic or paediatric hip conditions.

Verify on PubMed (PMID 8666621)

Exam Viva Scenarios

Use these scenarios to practise clinical reasoning and management decisions

CLINICAL SCENARIOStandard

Scenario 1: PROM Selection

CLINICAL PROMPT

"You are planning an RCT comparing cemented vs uncemented THA. What outcome measures would you use and why?"

PRACTICAL APPROACH

For a THA trial, I would use a combination of joint-specific and generic PROMs to comprehensively assess outcomes. My primary outcome would be the **WOMAC score**, which is the gold standard patient-reported measure for hip osteoarthritis and arthroplasty. WOMAC assesses three domains: pain, stiffness, and physical function, with 24 items total. It is highly validated, reliable, and responsive to change after THA, with an MCID of approximately 10-15 points on a 100-point scale. As a secondary outcome, I would include the **SF-36**, particularly the Physical Component Summary (PCS), to assess overall health status and quality of life. This allows comparison to population norms and captures health benefits beyond the hip joint. I would also measure the **EQ-5D** to generate utility scores for cost-effectiveness analysis, which is increasingly required by payers and health systems for value-based care. Additionally, I would include objective measures such as **range of motion** and **radiographic assessment of component positioning and osseointegration**. I would collect PROMs at baseline (pre-operative), and post-operatively at 6 weeks, 3 months, 6 months, 1 year, and annually thereafter. This comprehensive approach captures patient-reported outcomes (WOMAC), overall health (SF-36), economic value (EQ-5D), and clinical success (ROM, radiographs).

KEY CLINICAL POINTS

Primary outcome: WOMAC (joint-specific, most sensitive to hip pathology)

Secondary outcomes: SF-36 PCS (generic health), EQ-5D (utility for cost-effectiveness)

Include objective measures: ROM, radiographs

Timing: Baseline and multiple post-op timepoints (6 weeks, 3 months, 6 months, 1 year, annually)

COMMON PITFALLS

Using only generic measures (less sensitive to joint-specific change)

Not mentioning MCID or how to interpret clinical significance

Not including economic outcome measure (EQ-5D for utility)

Not specifying outcome timing (baseline and follow-up intervals)

FURTHER QUESTIONS

"What is the MCID for WOMAC and why does it matter?"

"What is the difference between WOMAC and SF-36?"

"How would you handle missing PROM data in your analysis?"

CLINICAL SCENARIOChallenging

Scenario 2: MCID Interpretation

CLINICAL PROMPT

"An RCT of 200 patients found that new rehab protocol improved WOMAC score by mean 8 points (95% CI 5 to 11 points, p = 0.001) compared to standard protocol. The MCID for WOMAC is 10 points. How do you interpret this result?"

PRACTICAL APPROACH

This result requires careful interpretation because it demonstrates statistical significance without clear clinical significance. Let me analyze each component. First, **statistical significance**: p = 0.001 is highly statistically significant, well below the conventional 0.05 threshold, and the 95% CI of 5 to 11 points excludes zero, confirming a real difference exists. Second, **effect size**: the mean improvement is 8 points. Third, **clinical significance**: the MCID for WOMAC is 10 points, meaning patients perceive a 10-point change as meaningful benefit. The observed 8-point improvement falls short of this threshold. Fourth, **confidence interval assessment**: the CI ranges from 5 to 11 points. The lower bound (5 points) is well below the MCID, but the upper bound (11 points) exceeds it. This creates uncertainty - the true effect could be clinically meaningful (if at the upper end of CI) or not (if at the lower end). **Interpretation**: While statistically significant, this result is clinically uncertain. The point estimate of 8 points suggests the benefit may not be meaningful to most patients. However, the CI crossing the MCID threshold indicates we cannot definitively rule out a clinically important effect. **Recommendation**: I would interpret this as weak evidence for clinical benefit. To make a confident recommendation, we would need a larger study with adequate power to detect a 10-point difference, which would narrow the CI and clarify whether the true effect exceeds the MCID. Additionally, I would analyze the proportion of patients who achieved the MCID - if 60-70% of patients improved by at least 10 points, this might be clinically worthwhile despite the mean being below MCID.

KEY CLINICAL POINTS

Statistical significance (p = 0.001) does NOT equal clinical significance

Mean improvement (8 points) below MCID (10 points) suggests limited clinical benefit

CI (5-11) crosses MCID, creating uncertainty about clinical importance

Need larger study to narrow CI and clarify if effect exceeds MCID

Alternative analysis: proportion of patients achieving MCID

COMMON PITFALLS

Concluding treatment is effective based solely on p less than 0.05

Not comparing effect size to MCID

Not interpreting confidence interval in relation to MCID threshold

Not suggesting larger study or alternative analyses

FURTHER QUESTIONS

"What is the Minimal Clinically Important Difference (MCID)?"

"How would you design a study specifically powered to detect the MCID?"

"What other analyses could help interpret this borderline result?"

CLINICAL SCENARIOChallenging

Scenario 3: Measurement Properties and PROM Appraisal

CLINICAL PROMPT

"A colleague proposes adopting a brand-new shoulder PROM for your unit. How would you appraise whether it is fit for purpose, and what numbers would you want to see?"

PRACTICAL APPROACH

I would appraise the instrument against the three pillars of measurement quality - validity, reliability and responsiveness - ideally using a structured framework such as the COSMIN taxonomy or the Terwee quality criteria. First, **validity**: does it measure what it claims? I would look for content validity (developed with patient and clinician input, covering the relevant shoulder constructs of pain and function), construct validity (convergent correlation with established scores such as the ASES or Constant, and discriminant lack of correlation with unrelated constructs like mental-health scores), and ideally criterion validity against an accepted reference. Second, **reliability**: I want test-retest reliability with an intraclass correlation coefficient of at least 0.70 in stable patients, and internal consistency with a Cronbach alpha between 0.70 and 0.95 - too high suggests redundant items. Third, **responsiveness**: can it detect real change after treatment? I would want a reported effect size or standardised response mean (ideally over 0.8 for a large effect) and, critically, a published MCID with the method used to derive it. Fourth, I would check **floor and ceiling effects** - if more than 15% of patients score at the extremes, the measure cannot detect deterioration or improvement in those patients. I would also consider feasibility: completion time, reading level, language validation and licensing cost, and whether it has been validated in a population resembling mine. If it lacks responsiveness data or a defined MCID, I would not adopt it for outcome studies regardless of how face-valid it appears, because I could not interpret a change score. Overall I would be cautious about replacing well-established scores (ASES, Constant) that already permit comparison with the wider literature unless the new measure offers a clear advantage such as reduced burden via computer-adaptive testing.

KEY CLINICAL POINTS

Structure the answer as validity, reliability, responsiveness (COSMIN/Terwee framework)

Reliability targets: test-retest ICC at least 0.70, Cronbach alpha 0.70 to 0.95

Responsiveness: effect size/SRM and a published MCID with its derivation method

Floor/ceiling effects over 15% are a red flag

Feasibility and comparability with existing literature matter for adoption

COMMON PITFALLS

Confusing reliability (consistency) with validity (accuracy)

Accepting a measure on face validity alone without responsiveness or MCID data

Forgetting that a measure can be reliable yet invalid (consistently wrong)

Ignoring floor/ceiling effects and population/language validation

FURTHER QUESTIONS

"Can a measure be reliable but not valid? Give an example."

"What is the difference between Cronbach alpha and the ICC?"

"Why might you keep an older score even if a new one is statistically superior?"

MCQ Practice Points

PROM Types

Q: What is the difference between a generic PROM (SF-36) and a joint-specific PROM (WOMAC)? A: Generic PROMs assess overall health status across any condition, allow comparison between diseases and to population norms, but are less sensitive to specific joint pathology. Joint-specific PROMs are highly sensitive to pathology in a single joint but cannot compare across different joints or to general population.

MCID Importance

Q: Why is MCID important when interpreting PROM changes? A: MCID defines clinically meaningful change - the smallest improvement that patients perceive as beneficial. Statistically significant changes (p less than 0.05) may not exceed MCID and thus not be clinically important. Always compare observed change to MCID, not just p-value.

Floor and Ceiling Effects

Q: What is a ceiling effect and why does it matter? A: Ceiling effect occurs when high proportion (over 15%) of patients score at maximum (best possible score). This prevents the measure from detecting improvement in these patients and reduces responsiveness. Choose a different measure or add a more challenging domain if ceiling effects are problematic.

Validity vs Reliability

Q: What is the difference between validity and reliability? A: Validity = Does the measure assess what it claims to assess? (accuracy). Reliability = Does the measure give consistent results when repeated in stable patients? (precision). A measure can be reliable but not valid (consistently wrong), but cannot be valid without being reliable.

Test-Retest Reliability

Q: What ICC value indicates good test-retest reliability? A: ICC greater than 0.70 indicates acceptable reliability. ICC (Intraclass Correlation Coefficient) ranges 0-1. ICC greater than 0.90 is excellent, 0.70-0.90 is good, less than 0.70 is poor. This measures consistency when same patient completes PROM twice with stable condition.

Responsiveness Measures

Q: How is responsiveness quantified? A: Standardized Response Mean (SRM) or Effect Size. SRM = mean change / SD of change. SRM greater than 0.8 = large responsiveness (good), 0.5-0.8 = moderate, less than 0.5 = small (may miss clinically important change). Responsiveness is essential for detecting treatment effects.

Guidelines, Registries & Global Practice

PROM collection has shifted from research tool to routine quality infrastructure worldwide, but the chosen instruments, mandate and uptake vary by region.

Global epidemiology and uptake. Joint arthroplasty is among the most-studied PROM settings: large registry programmes consistently show that the majority of hip and knee replacement patients achieve improvements exceeding the MCID at one year, while a meaningful minority (commonly cited around one in five for knees) report being unsatisfied - a gap PROMs make visible that complication rates alone miss. Uptake is high in publicly funded, registry-linked systems and far patchier where collection is voluntary or unfunded.

Guidance and Registry Programmes Side by Side

How Major Bodies Approach PROMs

Body / Region	Stance on PROMs	Typical Instruments
NHS / NJR (UK)	National PROMs programme historically mandated pre/post hip and knee replacement; NJR links implant survival to outcomes	Oxford Hip/Knee Score, EQ-5D
AOANJRR (Australia)	Registry-integrated PROM collection at standardised intervals for benchmarking	Oxford Hip/Knee Score, EQ-5D, VAS
AAOS / registries (US)	AAOS promotes PROMs and CMS value-based programmes increasingly require them; AJRR collects PROMs	HOOS/KOOS JR, PROMIS, VR-12
Nordic registries (SHAR, NAR)	Long-standing registry PROM collection underpinning revision and bearing comparisons	EQ-5D, joint-specific scores, satisfaction VAS
ICHOM (international)	Defines standard outcome sets to harmonise PROMs across countries for a given condition	Condition-specific standard sets (e.g. hip/knee OA)

Methodological standards. The COSMIN initiative (Mokkink and colleagues) provides the most widely cited international consensus framework for selecting and appraising PROMs, complementing earlier quality criteria. ICHOM standard sets push toward globally comparable outcome reporting.

High- versus limited-resource practice variation. In well-resourced, registry-linked systems, electronic PROM (ePROM) capture is increasingly embedded in routine care, enabling case-mix-adjusted benchmarking. In limited-resource settings, barriers include literacy and language (validated translations are not universal), lack of electronic infrastructure, staffing for follow-up, and the cost of licensed instruments - so brief, free, culturally validated tools (and pragmatic VAS/EQ-5D use) are favoured. The principle that statistical significance does not equal clinical significance, and that MCID/PASS should anchor interpretation, applies in every setting regardless of resource level.

Management Algorithm

OUTCOME MEASURES AND PROMs

Clinical summary

Common Orthopaedic PROMs

•Generic: SF-36 (PCS/MCS, 0-100, higher better), EQ-5D (utility 0-1)
•Hip/Knee: WOMAC (pain/stiffness/function, 0-96 or 0-100, lower or higher better depending on version)
•Upper Extremity: DASH (0-100, 0 = no disability), QuickDASH (11 items)
•Shoulder: ASES (0-100, higher better), Constant score
•Spine: ODI (Oswestry 0-100%, lower better), NDI (Neck Disability)

MCID Values

•SF-36 PCS/MCS: MCID approximately 5 points
•WOMAC: MCID 10-15 points (on 100-point scale)
•DASH: MCID 10-15 points
•VAS Pain: MCID 15-20mm (on 100mm scale)
•Always compare treatment effect to MCID for clinical significance

Measurement Properties

•Validity = Does it measure what it claims? (content, construct, criterion)
•Reliability = Consistent results? (test-retest ICC greater than 0.70, Cronbach alpha 0.70-0.95)
•Responsiveness = Detects change? (SRM greater than 0.8 = large, less than 15% floor/ceiling effects)
•Floor effect = Too many at minimum (cannot detect worsening)
•Ceiling effect = Too many at maximum (cannot detect improvement)

PROM Selection

•Joint-specific for sensitivity (WOMAC for THA trial)
•Generic for cross-disease comparison and population norms (SF-36)
•Utility measure for cost-effectiveness (EQ-5D)
•Use combination: Joint-specific (primary) + Generic (secondary)
•Check floor/ceiling effects (over 15% problematic)

Interpreting PROM Data

•Compare mean change to MCID, not just p-value
•Check if 95% CI excludes MCID threshold
•Report proportion of patients achieving MCID
•Wide CI crossing MCID = uncertain clinical significance
•Large sample with trivial effect (below MCID) = not clinically important

Clinical Application

•Major registries (NJR, AJRR, AOANJRR) collect PROMs (baseline and follow-up)
•Value-based care links reimbursement to PROM improvement
•Statistical significance ≠ Clinical significance
•Generic vs Specific trade-off: Comparison vs Sensitivity
•Pre-specify primary PROM and timing in study protocol

What are PROMs?

Why PROMs Matter:

Patient-Centered Care: Surgeon assessment may not match patient experience

Quantifies Subjective Outcomes: Pain, function, satisfaction cannot be objectively measured

Value-Based Care: Payers increasingly link reimbursement to patient-reported outcomes

Quality Improvement: Registries (AOANJRR) use PROMs to benchmark performance

Research: Essential for clinical trials to demonstrate treatment efficacy

PROM vs Clinician-Reported Outcomes:

PROMs capture what matters to patients (pain, daily activities, quality of life)

Clinician measures (ROM, strength) important but may not correlate with patient satisfaction

Best practice: Use both PROMs and objective measures

Instrument Type

Examples

Key Strength

Key Limitation

Best Use

Generic profile

SF-36, SF-12

Cross-disease comparison, population norms, captures whole-person health

Lower responsiveness to focal joint pathology

Secondary outcome; comparing burden across conditions

Generic utility

EQ-5D, SF-6D

Generates QALY utility (0 to 1) for cost-utility analysis

Coarse (few levels); ceiling effects in healthy people

Health-economic evaluation, payer/HTA submissions

Region-specific

DASH/QuickDASH, LEFS

One score across a whole limb when pathology spans joints

Less sensitive than single-joint scores

Multi-level or undefined upper/lower limb pathology

Joint/disease-specific

WOMAC, OHS/OKS, ASES, ODI

Highest responsiveness to the target joint or disease

Cannot compare across joints or to general population

Primary outcome in arthroplasty/disease-specific trials and registries

Type

Definition

How to Assess

Example

Content Validity

Covers all relevant aspects of construct

Expert panel review, patient input

WOMAC includes pain, stiffness, function for OA

Construct Validity

Correlates with related measures, discriminates from unrelated

Correlation with similar PROMs (convergent), lack of correlation with dissimilar (discriminant)

WOMAC correlates with knee ROM (convergent) but not with mental health scores (discriminant)

Criterion Validity

Correlates with gold standard

Compare to established measure

New knee score correlates with WOMAC

Type

Definition

How to Assess

Target

Test-Retest

Same result when repeated in stable patients

Intraclass Correlation Coefficient (ICC)

ICC greater than 0.70

Inter-Rater

Different raters get same result

ICC for clinician-administered measures

ICC greater than 0.70

Internal Consistency

Items within scale measure same construct

Cronbach alpha

Alpha 0.70 to 0.95 (too high suggests redundancy)

WOMAC: Original Validation (Landmark)

Bellamy N, Buchanan WW, Goldsmith CH, Campbell J, Stitt LW • Journal of Rheumatology (1988)

Key Findings:

WOMAC developed and validated within a double-blind RCT of two NSAIDs in hip and knee osteoarthritis
Self-administered, disease-specific instrument with pain, stiffness and physical-function subscales
Subscales fulfilled conventional criteria for face, content and construct validity
Demonstrated reliability, responsiveness and relative efficiency
Described by the authors as a 'high-performance' instrument for evaluative OA research

Clinical Implication: WOMAC remains the reference joint-specific PROM for hip and knee osteoarthritis and arthroplasty research.

Limitation: Designed for arthritis - less applicable to trauma, ligament injury or non-arthritic hip pain.

Verify on PubMed (PMID 3068365)

SF-36: Conceptual Framework (Landmark Generic PROM)

Ware JE, Sherbourne CD • Medical Care (1992)

Key Findings:

Introduced the 36-item Short Form (SF-36) from the Medical Outcomes Study
Surveys eight health concepts spanning physical and mental health domains
Designed for self-administration in people aged 14 years and over, or by trained interviewer
Built for clinical practice, research, health-policy evaluation and population surveys
Established the template for generic, profile-based health-status measurement

Limitation: Generic by design - less responsive to focal joint pathology than disease-specific measures, with ceiling effects in healthy populations.

Verify on PubMed (PMID 1593914)

DASH: Development of an Upper-Extremity PROM

Hudak PL, Amadio PC, Bombardier C (Upper Extremity Collaborative Group) • American Journal of Industrial Medicine (1996)

Key Findings:

Joint AAOS, COMSS and Institute for Work and Health initiative to create a region-wide upper-limb measure
Item generation produced 821 candidate items reduced to a focused symptom and function set
Single questionnaire spans the whole upper limb rather than an isolated joint
Field tested across centres in the United States, Canada and Australia
Provided the basis for the validated DASH and the shortened QuickDASH

Clinical Implication: DASH allows a single, comparable disability score across any upper-limb condition, useful when pathology spans more than one joint.

Limitation: Region-specific breadth means it is less sensitive than joint-specific scores (for example ASES) for isolated shoulder or elbow problems.

Verify on PubMed (PMID 8773720)

Oxford Hip Score: A Joint-Specific Registry PROM

Dawson J, Fitzpatrick R, Carr A, Murray D • Journal of Bone and Joint Surgery (British) (1996)

Key Findings:

Developed a 12-item patient-completed questionnaire for total hip replacement (n=220, prospective)
High internal consistency and satisfactory test-retest reproducibility
Validity confirmed by correlation with Charnley score, SF-36 and AIMS
Standardised effect size (responsiveness) compared favourably with SF-36 and AIMS
Short, practical and sensitive to clinically important change after THR

Limitation: Designed for arthroplasty outcome assessment; less suited to non-arthritic or paediatric hip conditions.

Verify on PubMed (PMID 8666621)

Body / Region

Stance on PROMs

Typical Instruments

NHS / NJR (UK)

National PROMs programme historically mandated pre/post hip and knee replacement; NJR links implant survival to outcomes

Oxford Hip/Knee Score, EQ-5D

AOANJRR (Australia)

Registry-integrated PROM collection at standardised intervals for benchmarking

Oxford Hip/Knee Score, EQ-5D, VAS

AAOS / registries (US)

AAOS promotes PROMs and CMS value-based programmes increasingly require them; AJRR collects PROMs

HOOS/KOOS JR, PROMIS, VR-12

Nordic registries (SHAR, NAR)

Long-standing registry PROM collection underpinning revision and bearing comparisons

EQ-5D, joint-specific scores, satisfaction VAS

ICHOM (international)

Defines standard outcome sets to harmonise PROMs across countries for a given condition

Condition-specific standard sets (e.g. hip/knee OA)

Outcome Measures and PROMs

Outcome Measures and PROMs

Outcome Measure Types

Critical Must-Knows

Clinical Pearls

Clinical Imaging

Imaging Gallery

Critical PROM Concepts

Why PROMs Matter

MCID is Essential

Generic vs Specific

Measurement Properties

At a Glance

VRRMeasurement Properties (PROM Quality)

SWANKCommon Orthopaedic PROMs by Region

Overview and Introduction

What are PROMs?

Principles of Outcome Measurement

Types of Outcome Measures

Generic PROMs

SF-36 (Short Form-36 Health Survey)

EQ-5D (EuroQol-5 Dimensions)

Joint-Specific PROMs

WOMAC (Western Ontario and McMaster Universities Arthritis Index)

DASH (Disabilities of Arm, Shoulder and Hand)

ASES (American Shoulder and Elbow Surgeons Score)

ODI (Oswestry Disability Index)

Choosing Between PROMs: A Comparison

PROM Types: Strengths, Limitations and Best Use

Controversies and Areas of Uncertainty

MCID is not a single number

MCID versus PASS

Ceiling effects and legacy scores

Response shift and missing data

CAT and item-response theory

Linking PROMs to payment

Measurement Properties

Validity

Types of Validity

Reliability

Types of Reliability

Responsiveness

Minimal Clinically Important Difference (MCID)

What is MCID?

How MCID is Determined

Clinical Application and Relevance

Choosing the Right PROM

Interpreting PROM Data

Registry Requirements

Value-Based Care

Evidence Base

WOMAC: Original Validation (Landmark)

SF-36: Conceptual Framework (Landmark Generic PROM)

DASH: Development of an Upper-Extremity PROM

Oxford Hip Score: A Joint-Specific Registry PROM

Understanding the MCID: Concepts and Methods

Quality Criteria for PROM Measurement Properties

COSMIN: International Consensus on Measurement Properties

Exam Viva Scenarios

Scenario 1: PROM Selection

Scenario 2: MCID Interpretation

Scenario 3: Measurement Properties and PROM Appraisal

MCQ Practice Points

Guidelines, Registries & Global Practice

Guidance and Registry Programmes Side by Side

How Major Bodies Approach PROMs

Management Algorithm

OUTCOME MEASURES AND PROMs

Common Orthopaedic PROMs

MCID Values

Measurement Properties

PROM Selection

Interpreting PROM Data

Clinical Application

Outcome Measures and PROMs

Outcome Measures and PROMs

Outcome Measure Types

Critical Must-Knows

Clinical Pearls

Clinical Imaging