Skip to main content
OrthoVellum
Knowledge Hub

Study

  • Topics
  • MCQs
  • ISAWE
  • Operative Surgery
  • Flashcards

Company

  • About Us
  • Editorial Policy
  • Contact
  • FAQ
  • Blog

Legal

  • Terms of Service
  • Privacy Policy
  • Cookie Policy
  • Medical Disclaimer
  • Copyright & DMCA
  • Refund Policy

Support

  • Help Center
  • Accessibility
  • Report an Issue
OrthoVellum

© 2026 OrthoVellum. For educational purposes only.

Not affiliated with the Royal Australasian College of Surgeons.

P-Values and Confidence Intervals

Back to Topics
Contents
0%

P-Values and Confidence Intervals

Comprehensive guide to understanding p-values, confidence intervals, statistical significance, and clinical interpretation in orthopaedic research.

complete
Updated: 2025-12-24
High Yield Overview

P-VALUES AND CONFIDENCE INTERVALS

Statistical Significance | Effect Estimation | Clinical Interpretation

0.05Conventional Alpha Threshold
95%Standard Confidence Level
MCIDClinical Significance Threshold
CI WidthPrecision Indicator

Interpreting Results

p less than 0.05, CI excludes null
PatternStatistically significant result
TreatmentCheck MCID for clinical relevance
p greater than 0.05, CI includes null
PatternNot statistically significant
TreatmentMay be underpowered or no effect
Narrow CI
PatternPrecise estimate, adequate sample
TreatmentHigh confidence in effect size
Wide CI
PatternImprecise estimate, small sample
TreatmentLow confidence, need larger study

Critical Must-Knows

  • P-Value: Probability of observing data as extreme as yours IF null hypothesis is true. NOT probability that null is true.
  • Confidence Interval (95% CI): Range of plausible values for true effect. If repeated many times, 95% of CIs would contain true value.
  • Statistical Significance (p less than 0.05): Does NOT equal clinical importance. Must compare effect to MCID.
  • CI Interpretation: If 95% CI excludes null (0 for difference, 1 for ratio), result is statistically significant at p less than 0.05.
  • CI Width: Narrow CI = precise estimate. Wide CI = imprecise, underpowered study.

Examiner's Pearls

  • "
    p = 0.05 is arbitrary threshold - not magic cutoff between real and unreal
  • "
    p-value depends on sample size - large studies find significance in trivial differences
  • "
    CI provides effect size AND significance - more informative than p-value alone
  • "
    CI that crosses MCID suggests effect may not be clinically meaningful

Critical Interpretation Concepts

What P-Value Is NOT

Common Misconceptions: p-value is NOT (1) probability null is true, (2) probability of Type I error, (3) proof of effect size, or (4) measure of clinical importance.

What P-Value Actually Means

Correct Interpretation: p = 0.03 means if null hypothesis is true, there is 3% chance of observing data this extreme or more extreme by random chance alone.

CI Contains More Information

Advantage: CI shows effect size, direction, precision, and statistical significance. p-value only shows significance, not magnitude.

Statistical vs Clinical Significance

Critical Distinction: p less than 0.05 means statistically significant. Clinical significance requires effect to exceed MCID. Can have one without the other.

Mnemonic

PENTP-Value Common Misinterpretations

P
Probability null is true
WRONG - p-value is probability of data given null is true, NOT probability null is true
E
Effect size
WRONG - p-value does NOT tell you magnitude of effect, only significance
N
Number needed (power)
WRONG - p-value does NOT indicate if study was adequately powered
T
Type I error for THIS study
WRONG - p-value is NOT probability of Type I error (that is alpha = 0.05 before study)

Memory Hook:Do not get PENT up in p-value misinterpretations - these are what p-value is NOT!

Mnemonic

REPSConfidence Interval Interpretation

R
Range of plausible values
CI provides range where true effect likely lies
E
Effect size estimate
Point estimate (mean/median) is best guess of true effect
P
Precision
Narrow CI = precise, wide CI = imprecise
S
Statistical significance
If CI excludes null, result is statistically significant

Memory Hook:Get good REPS with confidence intervals - they build stronger inference than p-values alone!

Overview/Introduction

What Are P-Values and Confidence Intervals?

P-values and confidence intervals (CIs) are the two primary tools for statistical inference in orthopaedic research. They address different but complementary questions:

  • P-value: Tests whether observed data are compatible with the null hypothesis (no effect/difference)
  • Confidence interval: Estimates the range of plausible values for the true effect

Why This Matters: Misinterpretation of p-values is pervasive in medical literature. Understanding these concepts prevents overconfident claims from underpowered studies and helps distinguish statistically significant but clinically trivial findings from truly meaningful results.

Historical Context: Ronald Fisher introduced p-values in the 1920s as a continuous measure of evidence against the null hypothesis. The 0.05 threshold became convention, not scientific law. Jerzy Neyman and Egon Pearson later developed confidence intervals in the 1930s as a complementary approach to estimation.

Current Emphasis: Modern statistical guidelines (ASA 2016, CONSORT, STROBE) emphasize reporting effect sizes and confidence intervals over dichotomous p-value thresholds. Journals increasingly require CIs alongside or instead of p-values.

Clinical Relevance in Orthopaedics:

  • Distinguishing statistical significance from clinical importance (MCID)
  • Interpreting RCT results for treatment decisions
  • Evaluating diagnostic test accuracy studies
  • Assessing prognostic factor analyses
  • Critical appraisal for exam vivas and clinical practice

Understanding p-values and CIs is essential for evidence-based orthopaedic practice and exam success.

Principles of Statistical Inference

Fundamental Definitions

P-Value (Probability Value):

  • Definition: The probability of observing data as extreme as, or more extreme than, what was observed, assuming the null hypothesis is true
  • Formula: p = P(Data | H₀ is true)
  • NOT: p ≠ P(H₀ is true | Data) - this is the most common error
  • Range: 0 to 1 (often expressed as 0 to 100%)

Confidence Interval (CI):

  • Definition: A range of values that, if the study were repeated many times, would contain the true population parameter in 95% of studies (for 95% CI)
  • Components: Point estimate (observed effect) ± margin of error
  • Interpretation: Provides effect size, direction, precision, and significance simultaneously
  • NOT: There is NOT a 95% probability the true value is in this specific CI (frequentist interpretation forbids this)

Null Hypothesis (H₀):

  • Definition: Statement of no effect, no difference, or no association
  • Examples:
    • Mean difference = 0
    • Risk ratio = 1
    • Correlation coefficient = 0

Alternative Hypothesis (H₁):

  • Definition: Statement that there IS an effect, difference, or association
  • Can be: Two-sided (any difference) or one-sided (specific direction)

Relationship Between P-Values and Confidence Intervals

Key Connection: P-values and confidence intervals are mathematically related:

  • For 95% CI: If the CI excludes the null value (0 for differences, 1 for ratios), then p less than 0.05
  • For 99% CI: If the CI excludes the null value, then p less than 0.01
  • For 90% CI: If the CI excludes the null value, then p less than 0.10

Why This Matters: You can determine statistical significance directly from the confidence interval without needing the p-value. This is why modern guidelines emphasize CIs over p-values - they provide MORE information (effect size, precision, AND significance).

Understanding P-Values

What is a P-Value?

Definition: The probability of observing data as extreme as, or more extreme than, the observed data, assuming the null hypothesis is true.

Null Hypothesis (H₀): There is no difference between groups or no effect.

Formula: p = P(Data | H₀ is true)

Interpreting P-Values

P-Value Interpretation

p-valueInterpretationConclusionAction
p less than 0.01Very strong evidence against nullHighly statistically significantCheck effect size and clinical relevance
p = 0.01 to 0.05Moderate evidence against nullStatistically significantCheck confidence interval and MCID
p = 0.05 to 0.10Weak evidence, borderlineNot statistically significantConsider if underpowered, examine trend
p greater than 0.10Little evidence against nullNot statistically significantCheck power, may be true null or Type II error

Key Point: p = 0.051 is NOT fundamentally different from p = 0.049. The 0.05 threshold is arbitrary convention, not natural boundary.

Common P-Value Misconceptions

What P-Value Does NOT Tell You

Misconception 1: p-value is the probability that the null hypothesis is true.

  • WRONG: p assumes null is true, then calculates probability of data.
  • Correct: p is P(Data | Null is true), NOT P(Null is true | Data).

Misconception 2: p-value is the probability of a Type I error.

  • WRONG: Type I error rate is alpha (set before study, usually 0.05).
  • Correct: p-value is calculated from observed data, alpha is pre-set threshold.

Misconception 3: p-value tells you the size of the effect.

  • WRONG: p-value reflects both effect size AND sample size.
  • Correct: Large sample can yield p less than 0.05 for trivial effects.

Misconception 4: p greater than 0.05 proves null hypothesis.

  • WRONG: Failure to reject null does not prove null is true.
  • Correct: May be underpowered (Type II error) or true null.

Understanding these misconceptions prevents misinterpretation.

What P-Value Actually Means

Correct Interpretation: "Assuming there is no true difference between groups, the probability of observing a difference as large as or larger than what we observed, purely by chance, is [p-value]."

Example: p = 0.03 for comparison of two surgical techniques.

Correct Statement: "If the two techniques are truly equivalent, there is a 3% probability of observing a difference this large or larger by random chance alone."

Incorrect Statements:

  • "There is a 3% chance the null hypothesis is true." (NO)
  • "There is a 3% chance this result is a false positive." (NO)
  • "The techniques differ by 3%." (NO)

Understanding proper interpretation prevents overclaiming results.

Understanding Confidence Intervals

What is a Confidence Interval?

Definition: A range of values that likely contains the true population parameter.

95% CI Interpretation: If we repeated the study many times, 95% of the confidence intervals calculated would contain the true effect.

NOT: There is a 95% probability the true value is in this CI (frequentist interpretation).

Relationship Between CI and P-Value

CI and Significance Connection

For 95% CI: If the confidence interval excludes the null value (0 for differences, 1 for ratios), the result is statistically significant at p less than 0.05.

For 99% CI: Corresponds to p less than 0.01 threshold.

For 90% CI: Corresponds to p less than 0.10 threshold.

CI Components

Confidence Interval Interpretation

CI ComponentMeaningExample (Mean Difference)Interpretation
Point EstimateBest guess of true effectMean difference = 8 pointsObserved effect in this sample
Lower BoundMinimum plausible effect95% CI: 2 to 14 pointsTrue effect unlikely below 2
Upper BoundMaximum plausible effect95% CI: 2 to 14 pointsTrue effect unlikely above 14
WidthPrecision of estimateWidth = 12 points (14 minus 2)Wider = less precise, needs larger sample

Clinical Application

Statistical vs Clinical Significance

Four Possible Scenarios:

Statistical and Clinical Significance Matrix

ScenarioStatistical SignificanceClinical SignificanceInterpretation
Idealp less than 0.05, CI excludes 0Effect exceeds MCIDSignificant AND clinically meaningful - implement
Large Sample Problemp less than 0.05, CI excludes 0Effect below MCIDSignificant but trivial - do NOT implement
Underpowered Studyp greater than 0.05, CI includes 0Point estimate exceeds MCIDNot significant but trend - need larger study
True Nullp greater than 0.05, CI includes 0Effect well below MCIDNo effect - do not implement

Key Principle: Always check if effect size (point estimate) and CI bounds exceed MCID, not just if p less than 0.05.

Worked Example: THA Study

Study: Compares cemented vs uncemented THA on WOMAC score at 1 year.

Results:

  • Mean difference = 8 points (cemented better)
  • 95% CI: 1 to 15 points
  • p = 0.02
  • MCID for WOMAC = 10 points

Interpretation:

  1. Statistically Significant: p = 0.02 less than 0.05, CI excludes 0 → Yes
  2. Point Estimate: 8 points less than MCID of 10 → Not clinically meaningful
  3. CI Upper Bound: 15 points greater than MCID → Could be meaningful
  4. CI Lower Bound: 1 point much less than MCID → Could be trivial

Conclusion: Result is statistically significant but clinically uncertain. The CI is wide and crosses the MCID threshold. Point estimate suggests effect may not be clinically important. Need larger study to narrow CI and determine if true effect exceeds 10 points.

Understanding this nuanced interpretation prevents overconfidence in borderline results.

Evidence Base

The ASA Statement on P-Values

5
Wasserstein RL, Lazar NA • American Statistician (2016)
Key Findings:
  • P-values do NOT measure probability that hypothesis is true
  • P-values do NOT measure size or importance of effect
  • Statistical significance (p less than 0.05) does NOT mean practical importance
  • Recommendations: Report effect sizes, confidence intervals, and avoid dichotomizing at p = 0.05
Clinical Implication: Researchers should report CIs and effect sizes, not just p-values, and avoid treating p = 0.05 as absolute threshold.
Limitation: Recommendations not universally adopted - many journals still emphasize p-values over CIs.

Confidence Intervals vs P-Values in Clinical Trials

5
Gardner MJ, Altman DG • BMJ (1986)
Key Findings:
  • Confidence intervals provide more information than p-values alone
  • CI shows magnitude of effect, precision, and statistical significance simultaneously
  • P-value only addresses statistical significance, not clinical importance
  • Recommendation: Always present CIs with point estimates
Clinical Implication: Reporting CIs improves interpretation by showing effect size and plausible range, not just yes/no significance.
Limitation: Study from 1986 but principles remain highly relevant - CI reporting now standard in CONSORT.

Misinterpretation of P-Values in Medical Literature

5
Goodman SN • Seminars in Hematology (2008)
Key Findings:
  • Widespread misinterpretation: p-value as probability null is true
  • Confusion between p-value and Type I error rate (alpha)
  • Over-reliance on p less than 0.05 dichotomy ignores effect size and precision
  • Education needed on proper statistical interpretation
Clinical Implication: Surgeons must understand p-value limitations to avoid misinterpreting research findings.
Limitation: Education efforts ongoing but misinterpretation remains common in clinical practice.

Exam Viva Scenarios

Practice these scenarios to excel in your viva examination

VIVA SCENARIOStandard

Scenario 1: P-Value Interpretation

EXAMINER

"A colleague shows you an RCT comparing two rehab protocols. The study found no significant difference (p = 0.08). She concludes the protocols are equivalent. How do you respond?"

EXCEPTIONAL ANSWER
I would explain that p = 0.08 does NOT prove the protocols are equivalent - it only means we failed to reject the null hypothesis at the conventional alpha = 0.05 threshold. There are two possible explanations: First, the protocols may truly be equivalent (true null hypothesis). Second, and more likely with p = 0.08, the study may be underpowered - a Type II error where we miss a real difference due to insufficient sample size. p = 0.08 actually suggests a trend toward difference, just not reaching conventional statistical significance. To properly assess this, I would examine three things: First, check the confidence interval - if it is wide and includes both no difference and clinically important differences, the study is inconclusive. Second, check the power calculation - if power is below 80 percent, the negative result cannot be trusted. Third, compare the point estimate to the MCID - if the observed difference approaches the MCID, this suggests clinical importance despite p greater than 0.05. To prove equivalence, we would need a non-inferiority or equivalence trial specifically designed to demonstrate similarity, with adequate power and predefined equivalence margin. Simply failing to reject null in an underpowered superiority trial does NOT prove equivalence.
KEY POINTS TO SCORE
p greater than 0.05 does NOT prove null hypothesis or equivalence
May be Type II error (underpowered study missing real effect)
Examine CI, power, and comparison to MCID
Equivalence requires specific equivalence/non-inferiority trial design
COMMON TRAPS
✗Concluding treatments are equivalent based on p greater than 0.05
✗Not mentioning Type II error or power
✗Not asking to see confidence interval
✗Not distinguishing between superiority and equivalence trial designs
LIKELY FOLLOW-UPS
"What is the difference between a superiority trial and a non-inferiority trial?"
"How would you design a study to prove two treatments are equivalent?"
"What is the relationship between p-value and sample size?"
VIVA SCENARIOChallenging

Scenario 2: Statistical vs Clinical Significance

EXAMINER

"An RCT of 1000 patients found statistically significant improvement in WOMAC score with new treatment: mean difference = 3 points, 95% CI 1 to 5 points, p = 0.003. The MCID for WOMAC is 10 points. How do you interpret this?"

EXCEPTIONAL ANSWER
This is a classic example of statistical significance without clinical significance. Let me interpret each component systematically. First, **statistical significance**: p = 0.003 is well below 0.05, and the 95% CI of 1 to 5 points excludes 0, so yes, this is statistically significant. Second, **effect size**: the mean improvement is only 3 points. Third, **clinical significance**: the MCID for WOMAC is 10 points, meaning patients perceive a 10-point change as meaningful improvement. The observed 3-point difference is well below this threshold. Fourth, **confidence interval assessment**: even the upper bound of the CI is only 5 points, still below the 10-point MCID. This tells me we can be 95 percent confident the true effect is somewhere between 1 and 5 points - all values below the clinically important threshold. **Conclusion**: While this result is highly statistically significant due to the large sample size of 1000 patients, the effect is clinically trivial. I would NOT recommend implementing this treatment based on these results. This demonstrates how large studies can detect statistically significant but clinically meaningless differences. The p-value is misleading here - the confidence interval and comparison to MCID are far more informative for clinical decision-making.
KEY POINTS TO SCORE
Statistical significance (p less than 0.05) does NOT equal clinical importance
Large sample size can detect trivial differences as statistically significant
Effect size (3 points) and entire CI (1-5 points) below MCID (10 points)
Decision: Do NOT implement treatment despite p less than 0.05
COMMON TRAPS
✗Recommending treatment based solely on p less than 0.05
✗Not comparing effect size and CI to MCID
✗Not explaining how large sample size inflates statistical significance
✗Not emphasizing CI is more informative than p-value
LIKELY FOLLOW-UPS
"What is the Minimal Clinically Important Difference (MCID)?"
"Can you have clinical significance without statistical significance?"
"How does sample size affect the p-value?"

MCQ Practice Points

P-Value Definition

Q: What does a p-value of 0.04 mean? A: Assuming null hypothesis is true, there is 4% probability of observing data this extreme or more extreme by chance alone. It does NOT mean 4% probability null is true, nor 4% probability of Type I error, nor 4% effect size.

CI and Significance

Q: A 95% CI for mean difference is -2 to 8 points. Is this statistically significant at alpha = 0.05? A: No - the CI includes 0 (no difference), meaning the result is NOT statistically significant. If CI excluded 0, p would be less than 0.05.

Statistical vs Clinical Significance

Q: Can a result be statistically significant but not clinically significant? A: Yes - large studies can detect tiny differences with p less than 0.05 that are below the MCID threshold. Statistical significance depends on sample size; clinical significance depends on whether effect exceeds MCID.

CI Width and Sample Size

Q: What does a wide confidence interval indicate? A: Imprecise estimate due to small sample size or high variability. A wide CI crossing both clinically important and trivial effects means the study is inconclusive - you cannot determine if the true effect is meaningful or not. This indicates the study is underpowered and needs a larger sample.

P-Value and Type I Error

Q: If p = 0.03, what is the probability this result is a false positive (Type I error)? A: Unknown - cannot be determined from p-value alone. Alpha (0.05) is the Type I error rate set BEFORE the study. The p-value (0.03) is calculated FROM the data. Many students confuse these - p-value is NOT the probability of Type I error for THIS specific result.

Non-Significant Results

Q: Study shows no significant difference (p = 0.15) between two treatments. Can you conclude the treatments are equally effective? A: No - failure to reject null does NOT prove null is true. This could be: (1) True null (treatments truly equivalent), OR (2) Type II error (underpowered study missing a real difference). Check the power calculation - if power is below 80%, cannot trust negative result. To prove equivalence, need a specifically designed equivalence or non-inferiority trial.

Management Algorithm

📊 Management Algorithm
Management algorithm for P Values Confidence Intervals
Click to expand
Management algorithm for P Values Confidence IntervalsCredit: OrthoVellum

P-VALUES AND CONFIDENCE INTERVALS

High-Yield Exam Summary

P-Value Interpretation

  • •p-value = P(Data | Null is true), NOT P(Null is true | Data)
  • •p less than 0.05 = statistically significant (arbitrary convention)
  • •p-value does NOT indicate effect size or clinical importance
  • •Large sample can yield p less than 0.05 for trivial effects
  • •p greater than 0.05 does NOT prove null hypothesis (may be underpowered)

Confidence Interval Interpretation

  • •95% CI = range of plausible values for true effect
  • •If 95% CI excludes null (0 or 1), p less than 0.05
  • •Narrow CI = precise estimate; Wide CI = imprecise, underpowered
  • •CI provides effect size, precision, AND significance
  • •Check if entire CI exceeds MCID for clinical relevance

Statistical vs Clinical Significance

  • •Statistical significance = p less than 0.05, CI excludes null
  • •Clinical significance = effect exceeds MCID
  • •Can have statistical significance without clinical importance (large sample, trivial effect)
  • •Can have clinical importance without statistical significance (small sample, large effect)
  • •Always compare point estimate AND CI to MCID

Common Misconceptions

  • •p-value is NOT probability null is true
  • •p-value is NOT Type I error for this study (that is alpha)
  • •p greater than 0.05 does NOT prove equivalence (may be Type II error)
  • •0.05 threshold is arbitrary, not magic cutoff
  • •CI contains more information than p-value alone

Clinical Application

  • •Report effect sizes and CIs, not just p-values
  • •Check if CI crosses MCID threshold for clinical uncertainty
  • •Wide CI suggests need for larger study
  • •Borderline p (0.05-0.10) may indicate trend, check power
  • •Non-inferiority trials prove equivalence; superiority trials do not
Quick Stats
Reading Time57 min
Related Topics

Articular Cartilage Structure and Function

Bending Moment Distribution in Fracture Fixation

Biceps Femoris Short Head Anatomy

Biofilm Formation in Orthopaedic Infections