Skip to main content
OrthoVellum
Clinical Atlas
OrthoVellum
Clinical Atlas

Comprehensive orthopaedic learning and teaching for clinical education. Content is educational only and is not a substitute for local supervision, clinical judgement, or institutional policy.

Library

  • Clinical Topics
  • Blog
  • Site Updates
  • Content Methodology
  • Editorial Policy

Company

  • About Us
  • Authors & Disclosure
  • Editorial Policy
  • Editorial Board
  • Content Methodology
  • Advertising Policy
  • Contact
  • FAQ
  • Blog

Legal

  • Terms of Service
  • Privacy Policy
  • Cookie Policy
  • Medical Disclaimer
  • Copyright & DMCA

Support

  • Support OrthoVellum
  • Help Center
  • Accessibility
  • Report an Issue
Evidence. Clarity. Practice.

© 2026 OrthoVellum. For educational purposes only.

Not medical advice. Verify clinically important information against current local guidance.

Evidence-Based Orthopaedics and Critical Appraisal

Back to Topics
Contents
0%
Basic ScienceResearch and Statistics

Evidence-Based Orthopaedics and Critical Appraisal

Advanced orthopaedic guide to evidence-based practice and critical appraisal: PICO, study design, bias, effect size, confidence intervals, applicability, reporting guidelines and journal-club decision-making.

complete
Reviewed: 2026-06-02Maintained by OrthoVellum Medical Education Team

Editorially maintained by OrthoVellum Editorial Team

Clear references, transparent review, and correction process • Published by OrthoVellum Medical Education Team

Editorial boardMethodologyReview policyReport a correction
Educational disclosure

Educational content is reviewed for source visibility, editorial coherence, and correction readiness.

No individual clinician credential is claimed unless a named person is shown.

Verify before clinical use; this is not medical advice or a substitute for local guidance.

High Yield Overview

Critical Appraisal

Question | validity | effect size | applicability

PICOturns a vague question into an answerable one
Biasdecides whether the estimate can be believed
95% CIshows precision and plausible effect range
MCIDasks whether the difference matters clinically

Appraisal Domains

Clinical question
PatternWhat problem, patient, intervention, comparator and outcome are being tested?
TreatmentUse PICO before reading methods.
Internal validity
PatternAre the methods protected from bias?
TreatmentAssess allocation, blinding, follow-up, measurement and confounding.
Result size
PatternHow large and precise is the effect?
TreatmentUse absolute effect, relative effect and confidence intervals.
Applicability
PatternDoes the result apply to the patient, surgeon, implant, system and outcome that matter?
TreatmentChange practice only when benefit, harm and feasibility align.

Critical Must-Knows

  • Critical appraisal is not a memory test of study designs. It is a structured judgement about whether a result should influence care.
  • A randomised trial can still be unreliable. Poor allocation concealment, missing data, crossover and selective reporting can destroy credibility.
  • A significant p value is not enough. Report effect size, absolute risk, confidence interval and clinical importance.
  • Different questions need different designs. RCTs suit treatment efficacy; cohort studies suit prognosis; diagnostic studies need a reference standard.
  • Evidence-based practice combines evidence, clinical expertise and patient values. It is not blind obedience to a paper.

Clinical Pearls

  • "
    Start journal club by stating the PICO in one sentence.
  • "
    Always separate internal validity from external applicability.
  • "
    For binary outcomes, ask for absolute risk reduction and number needed to treat or harm.
  • "
    A clinically trivial difference can be statistically significant in a large study.
  • "
    A negative study may be underpowered rather than proof of no difference.

Do not confuse statistical significance with clinical importance

A p value answers whether the observed difference is compatible with chance under a statistical model. It does not tell you whether the effect is large enough to matter, whether harms are acceptable, or whether the result applies to your patient.

Critical appraisal workflow
A safe appraisal sequence moves from clinical question to PICO, study design, internal validity, effect size, clinical importance and patient applicability.Credit: Original OrthoVellum illustration
Mnemonic

PAPERRead A Paper

P
PICO
Define the patient, intervention, comparator and outcome.
A
Appraise validity
Look for bias before trusting the result.
P
Precision
Read the confidence interval, not just the p value.
E
Effect size
Translate relative and absolute effect into clinical terms.
R
Relevance
Decide whether it applies to your patient and setting.

Memory Hook:Do not finish the PAPER until you know whether it should change practice.

Mnemonic

BIASEDBias Screen

B
Baseline balance
Were groups similar at the start?
I
Intervention fidelity
Were treatments delivered as intended?
A
Allocation concealment
Could enrolment be predicted or manipulated?
S
Selective reporting
Were all prespecified outcomes reported?
E
Endpoint blinding
Were outcomes measured fairly?
D
Dropouts
Was follow-up complete and balanced?

Memory Hook:A BIASED paper can have a beautiful p value.

Mnemonic

CARESApply Evidence

C
Clinical importance
Is the effect larger than a meaningful threshold?
A
Applicability
Does the population match the patient?
R
Risks
Balance complications, reoperation and downstream harm.
E
Expertise
Can the technique be delivered safely in your system?
S
Shared decision
Does the evidence fit the patient's values and goals?

Memory Hook:Evidence CARES only when it changes a real decision safely.

Overview

Evidence-based orthopaedics means using the best available research, clinical judgement and patient values to make decisions. It does not mean automatically following the newest paper, the biggest trial, the loudest conference presentation or the most quoted meta-analysis.

The practical question is always:

Can I Believe It?

This is internal validity. Ask whether the methods protected the result from bias, confounding, measurement error and missing data.

Should I Use It?

This is applicability. Ask whether the patient, intervention, comparator, outcome, surgeon skill and health system match your clinical decision.

The one-sentence appraisal opening

Begin by saying: "This paper asks whether intervention X compared with Y improves outcome Z in patients like this, and the key question is whether the methods and effect size are strong enough to change practice."

Concepts and Study Design

PICO before methods

PICO prevents vague appraisal. A paper about "fixation is better" is not appraisable until you define the patient, intervention, comparator and outcome.

PICO in Orthopaedics

ElementQuestionExample
PatientWho exactly is being treated?Older adults with displaced intracapsular femoral neck fracture who were ambulatory before injury.
InterventionWhat is the treatment, implant or pathway?Total hip arthroplasty through a specified approach.
ComparatorWhat is it being compared with?Hemiarthroplasty, non-operative treatment, another implant or another rehabilitation pathway.
OutcomeWhat matters and when?Reoperation, function, pain, dislocation, infection, mortality, revision, cost and patient-reported outcome at a defined time.

Match study design to question

Treatment Questions

Best DesignsWhat To CheckOrthopaedic Trap
Randomised trial or high-quality systematic reviewRandom sequence, allocation concealment, blinding where possible, intention-to-treat and follow-up.Surgical trials may be hard to blind, so outcome assessment and crossover matter.
Registry or cohort studyConfounding control, selection bias, surgeon/implant learning curve and outcome definition.Registry survival may not capture pain, function or radiographic failure.

Prognostic Questions

Best DesignsWhat To CheckOrthopaedic Trap
Inception cohortConsecutive patients at a similar disease stage, complete follow-up and meaningful outcomes.Late referral cohorts can exaggerate poor prognosis.
Risk modelCalibration, discrimination and external validation.A score may work in its derivation cohort and fail in your population.

Diagnostic Questions

Best DesignsWhat To CheckOrthopaedic Trap
Cross-sectional diagnostic accuracy studyIndependent blind comparison with a valid reference standard.MRI accuracy depends on disease spectrum and reader expertise.
Clinical test studySpectrum of patients, reproducibility, likelihood ratios and reference standard.A special test in a sports clinic may not perform the same in acute trauma.

Harm Questions

Best DesignsWhat To CheckOrthopaedic Trap
Large cohort, registry or case-control studyExposure definition, confounding, event capture and duration of follow-up.Rare complications may be invisible in small RCTs.
Case series or alertsSignal detection, biological plausibility and denominator uncertainty.A dramatic complication report can identify danger but not incidence.
Effect size reporting in orthopaedic critical appraisal
Effect size should be reported in a form that matches the outcome type. Absolute effect and confidence interval are usually more useful than a p value alone.Credit: Original OrthoVellum illustration

Clinical Relevance

Internal validity: can I believe the result?

Bias checks before believing a study
Bias checks should be performed before accepting the conclusion. A large number can still be wrong if bias drives the result.Credit: Original OrthoVellum illustration

Core Bias Checks

BiasQuestion To AskOrthopaedic Example
Selection biasWere patients allocated or selected in a way that created unfair groups?Healthier patients receive surgery while frailer patients receive non-operative care.
Performance biasWere co-interventions and rehabilitation similar?One ACL group receives more supervised physiotherapy than the other.
Detection biasWere outcomes assessed fairly and blindly?Surgeon-assessed radiographic union favours their preferred implant.
Attrition biasWho was lost to follow-up?Painful failures do not return to clinic and are counted as successes.
ConfoundingWhat else explains the result?High-volume surgeons use one implant and low-volume surgeons use another.
Reporting biasWere all prespecified outcomes reported?The published paper reports range of motion but omits reoperation.

Effect size: what does the number mean?

Interpreting Common Results

Reported ResultWhat To TranslateDecision Question
Mean differenceDifference in points, degrees, millimetres or time.Is it greater than the minimum clinically important difference?
Risk ratio or odds ratioRelative change plus absolute baseline risk.How many events are actually prevented or caused?
Absolute risk reductionEvent rate difference between groups.What is the number needed to treat or harm?
Hazard ratioRelative event rate over time.Are proportional hazards plausible and follow-up long enough?
Sensitivity and specificityTest performance against reference standard.How does the result change post-test probability?

Applicability: should I use it?

A valid result still may not apply. Check:

  • Patient match: age, frailty, bone quality, comorbidity, activity level and pathology severity.
  • Intervention match: implant, surgical approach, rehabilitation protocol and perioperative care.
  • Surgeon/system match: volume, learning curve, imaging access, theatre resources and follow-up capability.
  • Outcome match: patient-reported outcomes, revision, reoperation, complications, cost and survivorship.
  • Time horizon: short-term function may conflict with long-term revision risk.

Registry data and RCTs answer different questions

Registry studies often excel at large-scale implant survivorship and rare revision outcomes. Randomised trials better test efficacy in controlled populations. Neither replaces the other.

Evidence Base

Evidence-based medicine definition

Foundational article
Key Findings:
  • Evidence-based medicine integrates best evidence with clinical expertise and patient values.
  • It is not cookbook medicine.
  • External evidence can inform but not replace clinical judgement.
Clinical Implication: The page should teach evidence as decision support, not automatic rule-following.
Limitation: Conceptual article rather than empirical trial.
Source: Sackett et al., BMJ, 1996

GRADE approach

Methodology consensus
Key Findings:
  • GRADE separates certainty of evidence from strength of recommendation.
  • Evidence can be downgraded for risk of bias, inconsistency, indirectness, imprecision and publication bias.
  • Recommendations also depend on values, harms and resource use.
Clinical Implication: A strong recommendation is not just a high-level study; it is a judgement across benefits, harms and certainty.
Limitation: Methodology framework requiring judgement.
Source: Guyatt et al., BMJ, 2008

Reporting guidelines

Reporting standards
Key Findings:
  • Different study types require different reporting checklists.
  • Transparent reporting helps readers judge bias and applicability.
  • Poor reporting does not always mean poor methods, but it prevents confident appraisal.
Clinical Implication: Use the appropriate checklist when reading a trial, observational study, diagnostic study or systematic review.
Limitation: Reporting standards improve transparency but do not guarantee methodological quality.
Source: CONSORT 2010, STROBE 2007, STARD 2015, PRISMA 2020

AMSTAR 2

Critical appraisal tool
Key Findings:
  • AMSTAR 2 provides a structured method for appraising systematic reviews of healthcare interventions.
  • It distinguishes critical from non-critical weaknesses.
  • A meta-analysis can be misleading if the review question, search, bias assessment or synthesis is weak.
Clinical Implication: Do not trust a forest plot before appraising the review methods.
Limitation: Tool requires informed judgement and does not produce a simple numerical quality score.
Source: Shea et al., BMJ, 2017

Practical reporting checklist

Which Checklist Helps?

Study TypeUseful StandardKey Questions
Randomised trialCONSORTRandomisation, allocation concealment, blinding, flow, analysis and harms.
Observational studySTROBESelection, exposure, confounding, missing data and outcome measurement.
Diagnostic accuracy studySTARDPatient spectrum, index test, reference standard, blinding and timing.
Systematic reviewPRISMA and AMSTAR 2Question, protocol, search, bias, heterogeneity and synthesis method.
RecommendationGRADEEvidence certainty, benefit-harm balance, values and resources.
Systematic review evidence selection flowchart
Open-access flowchart example showing transparent evidence selection. In a review, the search, exclusions and final included studies must be auditable.Credit: Heselmans A et al. via Journal of Medical Internet Research / Open-i (CC BY)

Key references

  1. Sackett DL, Rosenberg WMC, Gray JAM, Haynes RB, Richardson WS. Evidence based medicine: what it is and what it isn't. BMJ. 1996;312(7023):71-72.
  2. Guyatt GH, Oxman AD, Vist GE, et al. GRADE: an emerging consensus on rating quality of evidence and strength of recommendations. BMJ. 2008;336(7650):924-926.
  3. Schulz KF, Altman DG, Moher D; CONSORT Group. CONSORT 2010 Statement: updated guidelines for reporting parallel group randomised trials. BMJ. 2010;340:c332.
  4. von Elm E, Altman DG, Egger M, et al. The STROBE statement: guidelines for reporting observational studies. PLoS Med. 2007;4(10):e296.
  5. Bossuyt PM, Reitsma JB, Bruns DE, et al. STARD 2015: an updated list of essential items for reporting diagnostic accuracy studies. BMJ. 2015;351:h5527.
  6. Page MJ, McKenzie JE, Bossuyt PM, et al. The PRISMA 2020 statement: an updated guideline for reporting systematic reviews. BMJ. 2021;372:n71.
  7. Shea BJ, Reeves BC, Wells G, et al. AMSTAR 2: a critical appraisal tool for systematic reviews. BMJ. 2017;358:j4008.
  8. Jaeschke R, Guyatt GH, Sackett DL. Users' guides to the medical literature. III. How to use an article about a diagnostic test. JAMA. 1994;271(5):389-391.

Clinical Scenarios

Use these scenarios to practise clinical reasoning and management decisions

CLINICAL SCENARIOStandard

CLINICAL PROMPT

"You are shown a randomised trial comparing two fixation methods. The conclusion says one implant is statistically superior with p = 0.04."

PRACTICAL APPROACH
I would first define the PICO: patient group, intervention, comparator and outcome. I would check internal validity: randomisation, allocation concealment, blinding of outcome assessment, baseline balance, crossovers, follow-up and intention-to-treat analysis. Then I would look beyond the p value to the absolute and relative effect size, confidence interval, complications and whether the difference exceeds a clinically meaningful threshold. Finally I would ask whether the patients, surgeons, implants, rehabilitation and follow-up match my setting before changing practice.
KEY CLINICAL POINTS
PICO first.
Bias before belief.
Effect size and applicability before practice change.
COMMON PITFALLS
✗Accepting p = 0.04 without effect size.
✗Ignoring missing data or crossover.
✗Applying a specialist-centre result to a different population without judgement.
FURTHER QUESTIONS
"What is allocation concealment?"
"What is intention-to-treat analysis?"
"What is a confidence interval?"
CLINICAL SCENARIOAdvanced

CLINICAL PROMPT

"A meta-analysis reports that a surgical technique reduces revision risk. The forest plot looks convincing, but the included studies are heterogeneous observational cohorts."

PRACTICAL APPROACH
I would not accept the pooled estimate until I appraise the review and the included studies. I would assess whether the review had a protocol, comprehensive search, clear inclusion criteria, duplicate selection, risk-of-bias assessment and appropriate synthesis. For observational cohorts I would look at confounding, indication bias, surgeon volume, implant differences and follow-up. I would interpret heterogeneity using clinical and statistical judgement rather than only the I squared value. If the direction of effect is consistent but residual confounding is likely, I would treat the result as hypothesis-supporting or moderate certainty at best, depending methods and effect size.
KEY CLINICAL POINTS
A forest plot is not proof by itself.
Review quality and primary study bias both matter.
Heterogeneity needs clinical interpretation.
COMMON PITFALLS
✗Trusting the pooled diamond without appraising included studies.
✗Ignoring confounding by indication.
✗Assuming meta-analysis automatically equals highest certainty.
FURTHER QUESTIONS
"What does I squared measure?"
"How does AMSTAR 2 help?"
"When can observational evidence be persuasive?"

Critical Appraisal Cheat Sheet

Clinical summary

Start

  • •State the PICO
  • •Identify study design
  • •Ask if design matches question
  • •Find primary outcome
  • •Check follow-up duration

Believe

  • •Selection bias
  • •Allocation concealment
  • •Blinding/outcome assessment
  • •Missing data
  • •Confounding and reporting bias

Use

  • •Absolute and relative effect
  • •Confidence interval
  • •Clinical importance
  • •Benefits versus harms
  • •Applicability to patient and setting

"Define the question, test the validity, quantify the effect and decide whether it applies."

Study Focus
Estimated read49 min

Decision sections

Related Topics

Articular Cartilage Structure and Function

Bending Moment Distribution in Fracture Fixation

Biceps Femoris Short Head Anatomy

Biofilm Formation in Orthopaedic Infections