Critical Appraisal
Question | validity | effect size | applicability
Appraisal Domains
Critical Must-Knows
- Critical appraisal is not a memory test of study designs. It is a structured judgement about whether a result should influence care.
- A randomised trial can still be unreliable. Poor allocation concealment, missing data, crossover and selective reporting can destroy credibility.
- A significant p value is not enough. Report effect size, absolute risk, confidence interval and clinical importance.
- Different questions need different designs. RCTs suit treatment efficacy; cohort studies suit prognosis; diagnostic studies need a reference standard.
- Evidence-based practice combines evidence, clinical expertise and patient values. It is not blind obedience to a paper.
Clinical Pearls
- "Start journal club by stating the PICO in one sentence.
- "Always separate internal validity from external applicability.
- "For binary outcomes, ask for absolute risk reduction and number needed to treat or harm.
- "A clinically trivial difference can be statistically significant in a large study.
- "A negative study may be underpowered rather than proof of no difference.
Do not confuse statistical significance with clinical importance
A p value answers whether the observed difference is compatible with chance under a statistical model. It does not tell you whether the effect is large enough to matter, whether harms are acceptable, or whether the result applies to your patient.

PAPERRead A Paper
Memory Hook:Do not finish the PAPER until you know whether it should change practice.
BIASEDBias Screen
Memory Hook:A BIASED paper can have a beautiful p value.
CARESApply Evidence
Memory Hook:Evidence CARES only when it changes a real decision safely.
Overview
Evidence-based orthopaedics means using the best available research, clinical judgement and patient values to make decisions. It does not mean automatically following the newest paper, the biggest trial, the loudest conference presentation or the most quoted meta-analysis.
The practical question is always:
Can I Believe It?
This is internal validity. Ask whether the methods protected the result from bias, confounding, measurement error and missing data.
Should I Use It?
This is applicability. Ask whether the patient, intervention, comparator, outcome, surgeon skill and health system match your clinical decision.
The one-sentence appraisal opening
Begin by saying: "This paper asks whether intervention X compared with Y improves outcome Z in patients like this, and the key question is whether the methods and effect size are strong enough to change practice."
Concepts and Study Design
PICO before methods
PICO prevents vague appraisal. A paper about "fixation is better" is not appraisable until you define the patient, intervention, comparator and outcome.
PICO in Orthopaedics
| Element | Question | Example |
|---|---|---|
| Patient | Who exactly is being treated? | Older adults with displaced intracapsular femoral neck fracture who were ambulatory before injury. |
| Intervention | What is the treatment, implant or pathway? | Total hip arthroplasty through a specified approach. |
| Comparator | What is it being compared with? | Hemiarthroplasty, non-operative treatment, another implant or another rehabilitation pathway. |
| Outcome | What matters and when? | Reoperation, function, pain, dislocation, infection, mortality, revision, cost and patient-reported outcome at a defined time. |
Match study design to question
Treatment Questions
| Best Designs | What To Check | Orthopaedic Trap |
|---|---|---|
| Randomised trial or high-quality systematic review | Random sequence, allocation concealment, blinding where possible, intention-to-treat and follow-up. | Surgical trials may be hard to blind, so outcome assessment and crossover matter. |
| Registry or cohort study | Confounding control, selection bias, surgeon/implant learning curve and outcome definition. | Registry survival may not capture pain, function or radiographic failure. |

Clinical Relevance
Internal validity: can I believe the result?

Core Bias Checks
| Bias | Question To Ask | Orthopaedic Example |
|---|---|---|
| Selection bias | Were patients allocated or selected in a way that created unfair groups? | Healthier patients receive surgery while frailer patients receive non-operative care. |
| Performance bias | Were co-interventions and rehabilitation similar? | One ACL group receives more supervised physiotherapy than the other. |
| Detection bias | Were outcomes assessed fairly and blindly? | Surgeon-assessed radiographic union favours their preferred implant. |
| Attrition bias | Who was lost to follow-up? | Painful failures do not return to clinic and are counted as successes. |
| Confounding | What else explains the result? | High-volume surgeons use one implant and low-volume surgeons use another. |
| Reporting bias | Were all prespecified outcomes reported? | The published paper reports range of motion but omits reoperation. |
Effect size: what does the number mean?
Interpreting Common Results
| Reported Result | What To Translate | Decision Question |
|---|---|---|
| Mean difference | Difference in points, degrees, millimetres or time. | Is it greater than the minimum clinically important difference? |
| Risk ratio or odds ratio | Relative change plus absolute baseline risk. | How many events are actually prevented or caused? |
| Absolute risk reduction | Event rate difference between groups. | What is the number needed to treat or harm? |
| Hazard ratio | Relative event rate over time. | Are proportional hazards plausible and follow-up long enough? |
| Sensitivity and specificity | Test performance against reference standard. | How does the result change post-test probability? |
Applicability: should I use it?
A valid result still may not apply. Check:
- Patient match: age, frailty, bone quality, comorbidity, activity level and pathology severity.
- Intervention match: implant, surgical approach, rehabilitation protocol and perioperative care.
- Surgeon/system match: volume, learning curve, imaging access, theatre resources and follow-up capability.
- Outcome match: patient-reported outcomes, revision, reoperation, complications, cost and survivorship.
- Time horizon: short-term function may conflict with long-term revision risk.
Registry data and RCTs answer different questions
Registry studies often excel at large-scale implant survivorship and rare revision outcomes. Randomised trials better test efficacy in controlled populations. Neither replaces the other.
Evidence Base
Evidence-based medicine definition
- Evidence-based medicine integrates best evidence with clinical expertise and patient values.
- It is not cookbook medicine.
- External evidence can inform but not replace clinical judgement.
GRADE approach
- GRADE separates certainty of evidence from strength of recommendation.
- Evidence can be downgraded for risk of bias, inconsistency, indirectness, imprecision and publication bias.
- Recommendations also depend on values, harms and resource use.
Reporting guidelines
- Different study types require different reporting checklists.
- Transparent reporting helps readers judge bias and applicability.
- Poor reporting does not always mean poor methods, but it prevents confident appraisal.
AMSTAR 2
- AMSTAR 2 provides a structured method for appraising systematic reviews of healthcare interventions.
- It distinguishes critical from non-critical weaknesses.
- A meta-analysis can be misleading if the review question, search, bias assessment or synthesis is weak.
Clinical Scenarios
Use these scenarios to practise clinical reasoning and management decisions
"You are shown a randomised trial comparing two fixation methods. The conclusion says one implant is statistically superior with p = 0.04."
"A meta-analysis reports that a surgical technique reduces revision risk. The forest plot looks convincing, but the included studies are heterogeneous observational cohorts."
Critical Appraisal Cheat Sheet
Clinical summary
Start
- •State the PICO
- •Identify study design
- •Ask if design matches question
- •Find primary outcome
- •Check follow-up duration
Believe
- •Selection bias
- •Allocation concealment
- •Blinding/outcome assessment
- •Missing data
- •Confounding and reporting bias
Use
- •Absolute and relative effect
- •Confidence interval
- •Clinical importance
- •Benefits versus harms
- •Applicability to patient and setting
"Define the question, test the validity, quantify the effect and decide whether it applies."
