COMMON STATISTICAL TESTS
Test Selection | Parametric vs Non-Parametric | Interpretation
Test Selection by Data Type
Critical Must-Knows
- T-test: Compares means between 2 groups. Assumes normality, equal variance, independence.
- ANOVA: Compares means across 3 or more groups. Post-hoc tests needed to identify which groups differ.
- Chi-square: Tests association between categorical variables. Expected count should be over 5 in each cell.
- Regression: Models relationship between outcome and predictor(s). Linear for continuous outcomes, logistic for binary.
- Parametric vs Non-Parametric: Parametric assumes normal distribution (t-test, ANOVA). Non-parametric does not (Mann-Whitney, Kruskal-Wallis).
Examiner's Pearls
- "Use paired t-test for before-after comparisons, independent t-test for separate groups
- "ANOVA tells you IF groups differ, not WHICH groups - need post-hoc tests (Tukey, Bonferroni)
- "Fisher exact test preferred over chi-square when expected counts under 5
- "Correlation does NOT imply causation - confounders may explain association
Critical Test Selection Concepts
Data Type Determines Test
Continuous outcome: t-test, ANOVA, regression. Categorical outcome: Chi-square, Fisher exact, logistic regression. Always match test to data type.
Parametric Assumptions
Requirements: Normal distribution, equal variance, independence. Check normality: Histogram, Q-Q plot, Shapiro-Wilk test. If violated: Use non-parametric alternative.
Paired vs Independent
Paired: Same subjects measured twice (before-after). Use paired t-test. Independent: Different subjects in each group. Use independent t-test. Test choice depends on design.
Multiple Comparisons
Problem: Testing many groups inflates Type I error. Solution: Use ANOVA first (omnibus test), then post-hoc with correction (Tukey, Bonferroni). Do NOT run multiple t-tests.
At a Glance
Statistical test selection depends on data type and study design: t-test compares means between 2 groups (paired for before-after, independent for separate groups), ANOVA compares 3+ groups (requires post-hoc tests like Tukey/Bonferroni to identify which differ), chi-square tests associations between categorical variables (use Fisher exact when expected counts under 5), and regression models relationships between outcomes and predictors. Parametric tests (t-test, ANOVA) assume normal distribution—if violated, use non-parametric alternatives (Mann-Whitney U, Kruskal-Wallis). Key pitfall: running multiple t-tests inflates Type I error; use ANOVA first as an omnibus test. Correlation does not imply causation—confounders may explain observed associations.
DINGOChoosing the Right Test
Memory Hook:Follow the DINGO trail to find the right statistical test for your data!
NINEParametric Test Assumptions
Memory Hook:Check NINE assumptions before using parametric tests - or use non-parametric alternatives!
Overview and Introduction
Statistical tests are the foundation of evidence-based orthopaedics. Understanding when to use each test and how to interpret results is essential for critically appraising literature and conducting research. This topic covers the most common statistical tests used in orthopaedic research.
Concepts and Mechanisms
Fundamental Statistical Concepts
Hypothesis Testing Framework
- Null hypothesis (H0): Assumes no difference or no effect between groups
- Alternative hypothesis (H1): States there IS a difference or effect
- Type I error (α): Rejecting H0 when it's true (false positive) - typically set at 0.05
- Type II error (β): Failing to reject H0 when it's false (false negative)
- Power (1-β): Probability of detecting a true effect - aim for over 80%
Central Limit Theorem As sample size increases, the sampling distribution of the mean approaches a normal distribution, regardless of the population distribution. This is why parametric tests work with large samples even when data is skewed.
Parametric vs Non-Parametric Tests
| Aspect | Parametric | Non-Parametric |
|---|---|---|
| Assumptions | Normality, equal variance | No distribution assumptions |
| Power | Higher when assumptions met | Lower but more robust |
| Example | t-test | Mann-Whitney U |
Effect Size vs Statistical Significance
- p-value: Probability of observing result if null hypothesis is true
- Effect size: Magnitude of the difference (Cohen's d, odds ratio)
- Clinical significance: Whether the effect matters clinically
- A statistically significant result may not be clinically meaningful!
Tests for Continuous Outcomes
Comparing Two Groups
Independent Samples t-test
Use When:
- Comparing means between 2 independent groups
- Continuous outcome variable
- Data approximately normally distributed
- Equal variance between groups
Example: Comparing WOMAC scores between cemented vs uncemented THA groups.
Null Hypothesis: Mean outcome is equal in both groups.
Assumptions:
- Normal distribution in each group
- Independence of observations
- Equal variance (homoscedasticity)
Interpretation: p less than 0.05 indicates significant difference in means.
Alternatives if Assumptions Violated:
- Non-normal distribution: Mann-Whitney U test (non-parametric)
- Unequal variance: Welch t-test (does not assume equal variance)
The independent t-test is the most common test in orthopaedic research.
Comparing Three or More Groups
One-Way ANOVA
Use When:
- Comparing means across 3 or more independent groups
- Continuous outcome
- Data approximately normally distributed
- Equal variance across groups
Example: Comparing functional scores across 3 surgical approaches.
Null Hypothesis: All group means are equal.
Key Point: ANOVA tells you IF any groups differ, NOT which specific groups differ.
Post-Hoc Tests (if ANOVA significant):
- Tukey HSD: Compares all pairwise combinations, controls family-wise error
- Bonferroni: Conservative, divides alpha by number of comparisons
- Dunnett: Compares all groups to control group only
Assumptions:
- Normal distribution in each group
- Independence of observations
- Equal variance (homoscedasticity)
Alternative if Violated: Kruskal-Wallis test (non-parametric ANOVA).
Never run multiple independent t-tests instead of ANOVA - inflates Type I error.
Tests for Categorical Outcomes
Chi-Square Test
Use When:
- Comparing proportions between 2 or more groups
- Categorical outcome
- Independent observations
Example: Comparing complication rates (yes/no) across 3 surgical techniques.
Null Hypothesis: No association between variables (proportions are equal across groups).
Requirement: Expected count greater than 5 in each cell of contingency table.
- If violated: Use Fisher exact test (exact p-value, no expected count requirement).
Chi-Square Interpretation
Chi-Square vs Fisher Exact
| Test | When to Use | Advantage | Limitation |
|---|---|---|---|
| Chi-square | Expected count greater than 5 in all cells | Faster, widely available | Inaccurate with small sample or low expected counts |
| Fisher exact | ANY sample size, especially expected count under 5 | Exact p-value, no assumptions about expected counts | Computationally intensive for large tables |
Clinical Example: Comparing infection rates (categorical outcome) between smokers and non-smokers.
Understanding when to use chi-square vs Fisher exact prevents incorrect p-values.
Tests for Associations and Relationships
Correlation
Use When: Assessing strength and direction of relationship between 2 continuous variables.
Pearson Correlation (r)
Use When:
- Both variables continuous
- Linear relationship
- Bivariate normal distribution
Range: r = -1 to +1
- r = +1: Perfect positive correlation
- r = 0: No correlation
- r = -1: Perfect negative correlation
Interpretation:
- r = 0.0 to 0.3: Weak correlation
- r = 0.3 to 0.7: Moderate correlation
- r = 0.7 to 1.0: Strong correlation
Example: Correlation between age and functional score after THA.
Key Point: Correlation does NOT imply causation - confounders may explain association.
Pearson correlation is the most common for linear relationships.
Regression
Linear Regression
Use When:
- Modeling relationship between continuous outcome and predictor(s)
- Predicting outcome value based on predictors
Simple Linear Regression: 1 predictor
- Equation: Y = a + b×X
- b (slope): Change in Y for 1-unit increase in X
Multiple Linear Regression: 2 or more predictors
- Equation: Y = a + b₁×X₁ + b₂×X₂ + ...
- Adjusts for confounders: Each coefficient is adjusted for other variables
Example: Predicting functional score based on age, BMI, comorbidities.
Assumptions:
- Linear relationship
- Normal distribution of residuals
- Homoscedasticity (constant variance of residuals)
- Independence of observations
Interpretation: Coefficient represents change in outcome per unit change in predictor.
Multiple regression allows adjustment for confounders in observational studies.
Test Selection Guide
Choosing Statistical Tests
| Outcome Type | Number of Groups | Paired or Independent | Test |
|---|---|---|---|
| Continuous (normal) | 2 groups | Independent | Independent t-test |
| Continuous (normal) | 2 groups | Paired | Paired t-test |
| Continuous (non-normal) | 2 groups | Independent | Mann-Whitney U test |
| Continuous (non-normal) | 2 groups | Paired | Wilcoxon signed-rank test |
| Continuous (normal) | 3+ groups | Independent | One-way ANOVA |
| Continuous (normal) | 3+ groups | Repeated measures | Repeated measures ANOVA |
| Continuous (non-normal) | 3+ groups | Independent | Kruskal-Wallis test |
| Categorical | 2+ groups | Independent | Chi-square or Fisher exact |
Anatomy of Statistical Tests
Components of a Statistical Test
Test Statistic
The calculated value from your data:
- t-statistic (t-tests)
- F-statistic (ANOVA)
- Chi-square statistic (χ²)
- Z-score (large samples)
Interpretation:
- Larger absolute values = more extreme result
- Compared against critical value or used to calculate p-value
Degrees of Freedom (df)
Number of independent values:
- t-test: df = n₁ + n₂ - 2
- Paired t-test: df = n - 1
- Chi-square: df = (rows-1) × (columns-1)
- ANOVA: df between groups, df within groups
Impact:
- Affects critical value threshold
- More df = narrower confidence intervals
P-value
Probability of obtaining result if null is true:
- p less than 0.05: conventionally "significant"
- p less than 0.01: highly significant
- p less than 0.001: very highly significant
Common misinterpretations:
- NOT probability that null is true
- NOT probability that result is due to chance
Confidence Interval
Range containing true population parameter:
- 95% CI: 95% confidence true value is within range
- If 95% CI excludes null value → significant at p less than 0.05
- Width indicates precision
More informative than p-value alone:
- Shows magnitude and precision
- Aids clinical interpretation
Understanding these components allows proper interpretation of statistical test results.
Classification
Classification of Statistical Tests
Test Selection by Data Type and Design
| Outcome Type | 2 Groups (Independent) | 2 Groups (Paired) | 3+ Groups |
|---|---|---|---|
| Continuous (normal) | Independent t-test | Paired t-test | One-way ANOVA |
| Continuous (non-normal) | Mann-Whitney U | Wilcoxon signed-rank | Kruskal-Wallis |
| Categorical (2×2) | Chi-square or Fisher | McNemar test | Chi-square |
| Ordinal | Mann-Whitney U | Wilcoxon signed-rank | Kruskal-Wallis |
| Time-to-event | Log-rank test | N/A | Log-rank test |
Parametric vs Non-Parametric Classification
Parametric Tests
Assume underlying distribution (usually normal):
- Independent samples t-test
- Paired t-test
- One-way ANOVA
- Two-way ANOVA
- Pearson correlation
- Linear regression
When to use:
- Continuous data
- Normal distribution (or large n)
- Equal variance across groups
Non-Parametric Tests
No distribution assumptions:
- Mann-Whitney U (rank-sum)
- Wilcoxon signed-rank
- Kruskal-Wallis (H test)
- Friedman test
- Spearman correlation
When to use:
- Ordinal data
- Small sample sizes
- Skewed distributions
- Outliers present
Proper test classification ensures appropriate test selection for your research question.
Clinical Applications
Understanding statistical tests allows clinicians to critically appraise orthopaedic literature and make evidence-based decisions. Key applications include:
- Evaluating treatment outcomes: Comparing surgical vs conservative management
- Assessing prognostic factors: Identifying predictors of complications
- Quality improvement: Analyzing registry data for benchmarking
- Research design: Selecting appropriate tests for study protocols
Diagnostic Test Statistics
Evaluating Diagnostic Tests
Sensitivity
True positive rate:
- Proportion of diseased correctly identified
- Formula: TP / (TP + FN)
- High sensitivity = few false negatives
- "Rules OUT disease when negative" (SnNOut)
Example: If sensitivity = 95%, 5% of cases will be missed
Specificity
True negative rate:
- Proportion of non-diseased correctly identified
- Formula: TN / (TN + FP)
- High specificity = few false positives
- "Rules IN disease when positive" (SpPIn)
Example: If specificity = 90%, 10% will be false alarms
Positive Predictive Value
If test positive, probability of disease:
- Formula: TP / (TP + FP)
- Depends on disease prevalence
- Higher PPV with higher prevalence
Clinical meaning: "My patient tested positive - how likely are they to actually have it?"
Negative Predictive Value
If test negative, probability of no disease:
- Formula: TN / (TN + FN)
- Also depends on prevalence
- Higher NPV with lower prevalence
Clinical meaning: "My patient tested negative - how confident am I they're disease-free?"
2×2 Contingency Table
| Disease Present | Disease Absent | ||
|---|---|---|---|
| Test Positive | True Positive (TP) | False Positive (FP) | PPV = TP/(TP+FP) |
| Test Negative | False Negative (FN) | True Negative (TN) | NPV = TN/(TN+FN) |
| Sens = TP/(TP+FN) | Spec = TN/(TN+FP) |
Diagnostic test statistics are essential for evaluating imaging studies and clinical tests.
Performing Statistical Analysis
Step-by-Step Analysis Workflow
- Formulate null and alternative hypotheses
- Identify outcome variable(s)
- Identify predictor/exposure variable(s)
- Determine comparison type (difference, association, prediction)
- Check data type (continuous, categorical, ordinal)
- Assess distribution (histogram, Q-Q plot)
- Identify outliers and missing data
- Check for data entry errors
- Use DINGO mnemonic
- Match test to data type and design
- Choose parametric or non-parametric
- Consider confounders (multivariable analysis)
- Normality (Shapiro-Wilk, Q-Q plot)
- Equal variance (Levene test)
- Independence of observations
- Sample size adequacy
- Run analysis in software (SPSS, R, Stata)
- Report test statistic, df, p-value
- Include effect size and confidence interval
- Present results clearly (tables, figures)
Software Options
Common Statistical Software
| Software | Cost | Learning Curve | Best For |
|---|---|---|---|
| SPSS | Expensive | Easy | Beginners, basic analyses |
| R | Free | Steep | Advanced users, custom analyses |
| Stata | Moderate | Moderate | Epidemiology, panel data |
| Excel | Common | Easy | Simple calculations only |
| SAS | Expensive | Steep | Clinical trials, pharma |
Following a systematic approach ensures rigorous and reproducible statistical analysis.
Practical Examples in Orthopaedics
Common Orthopaedic Research Scenarios
Comparing Two Treatment Groups
Scenario: Cemented vs uncemented THA outcomes
Outcome: Harris Hip Score (continuous, 0-100) Test: Independent samples t-test If non-normal: Mann-Whitney U test
Example result: "Mean HHS was 85.2 (SD 12.1) in cemented vs 87.4 (SD 11.8) in uncemented group (t=-1.42, df=98, p=0.16)"
Before-After Comparison
Scenario: Knee ROM before vs after TKA
Outcome: ROM in degrees (continuous) Design: Same patients at 2 time points Test: Paired t-test If non-normal: Wilcoxon signed-rank test
Example result: "ROM improved from 92° (SD 18) to 115° (SD 12), mean difference 23° (95% CI: 18-28, p less than 0.001)"
Comparing Multiple Groups
Scenario: Pain scores across 4 fracture types
Outcome: VAS pain score (continuous) Groups: 4 fracture classifications Test: One-way ANOVA Post-hoc: Tukey or Bonferroni correction
Example result: "Significant difference in VAS between groups (F=5.23, df=3,96, p=0.002). Post-hoc: Type D higher than Types A,B (p less than 0.05)"
Association Between Categories
Scenario: Smoking status and nonunion rate
Outcome: Nonunion yes/no (categorical) Exposure: Smoker/non-smoker (categorical) Test: Chi-square test (or Fisher exact if expected less than 5)
Example result: "Nonunion rate was 15% in smokers vs 5% in non-smokers (χ²=6.8, df=1, p=0.009)"
These examples demonstrate common statistical scenarios in orthopaedic research.
Common Errors and Pitfalls
Statistical Errors to Avoid
Type I Error (False Positive)
Definition: Rejecting null hypothesis when it is true
Causes:
- Multiple comparisons without correction
- P-hacking (testing until p less than 0.05)
- Selective outcome reporting
Prevention:
- Pre-specify primary outcome
- Use Bonferroni or FDR correction for multiple tests
- Register study protocol before data collection
Type II Error (False Negative)
Definition: Failing to reject null when it is false
Causes:
- Underpowered study (sample too small)
- High variability in data
- Small true effect size
Prevention:
- Conduct a priori power calculation
- Aim for power greater than 80%
- Use sensitive outcome measures
Wrong Test Selection
Common mistakes:
- Using t-test when ANOVA needed (multiple comparisons)
- Using parametric test with skewed data
- Using independent test when data is paired
- Using chi-square when expected counts less than 5
Prevention:
- Follow decision tree (DINGO mnemonic)
- Check assumptions before analysis
- Consult statistician if unsure
Assumption Violations
Ignoring assumptions leads to:
- Biased p-values
- Invalid confidence intervals
- Unreliable conclusions
Must check:
- Normality (Q-Q plot, Shapiro-Wilk)
- Equal variance (Levene test)
- Independence (study design)
- Adequate sample size
Awareness of these errors helps avoid common statistical mistakes.
Reporting and Publishing Results
Reporting Statistical Results
CONSORT Guidelines
Randomized Controlled Trials:
- Report participant flow diagram
- State sample size calculation
- Report all outcomes - primary and secondary
- Include confidence intervals for main results
- Report actual p-values (not just p less than 0.05)
Key requirements:
- Intention-to-treat analysis
- Report losses to follow-up
- Baseline characteristics table
STROBE Guidelines
Observational Studies:
- Clear statement of study design
- Describe setting, dates, eligibility
- Report numbers at each stage
- Report outcome data with denominators
- Address confounding
Required elements:
- Case-control, cohort, cross-sectional clearly stated
- Bias assessment
- Sensitivity analyses
Presenting Statistical Results
| Result Type | Reporting Format | Example |
|---|---|---|
| Continuous outcomes | Mean (SD) or median (IQR) | Pain score: 3.2 (SD 1.4) |
| Proportions | n (%) with denominator | 23/50 (46%) achieved union |
| Risk comparison | RR or OR with 95% CI | RR 0.65 (95% CI 0.48-0.88) |
| Time-to-event | HR with 95% CI, survival curve | HR 0.72 (95% CI 0.55-0.94) |
| P-values | Exact value to 2-3 decimal places | p = 0.034 (not p less than 0.05) |
Exam Pearl
FRACS Viva Point: "What must be reported alongside any p-value?" Answer: The effect size (difference between groups) and 95% confidence interval - p-values alone do not indicate clinical importance or precision of the estimate.
Proper statistical reporting enables readers to evaluate findings and enables future meta-analyses.
Interpreting Research Outcomes
Statistical vs Clinical Significance
Statistical Significance
Definition: P-value less than chosen alpha (usually 0.05)
What it tells you:
- The difference is unlikely due to chance alone
- Nothing about magnitude or clinical importance
- Large samples can detect trivial differences
Common misinterpretation:
- "Statistically significant" ≠ "important"
- p = 0.04 is not much different from p = 0.06
Clinical Significance
Definition: Difference is large enough to change practice
Key concept - MCID:
- Minimal Clinically Important Difference
- Patient-centered threshold
- Varies by outcome measure
Examples in orthopaedics:
- VAS pain: 2 points (or 30% change)
- WOMAC: 15 points
- SF-36 Physical: 5 points
Key Outcome Measures
| Measure | Definition | Interpretation |
|---|---|---|
| Relative Risk (RR) | Risk in exposed / Risk in unexposed | RR = 2.0 means double the risk |
| Odds Ratio (OR) | Odds in cases / Odds in controls | Approximates RR when outcome rare (less than 10%) |
| Absolute Risk Reduction (ARR) | Control rate - Treatment rate | Actual percentage point reduction |
| Number Needed to Treat (NNT) | 1 / ARR | Patients to treat to prevent 1 event |
| Hazard Ratio (HR) | Instantaneous risk ratio over time | HR = 0.7 means 30% reduction in hazard |
Exam Pearl
FRACS Viva Question: "A drug reduces DVT risk from 4% to 2%. What is the NNT?" Answer: ARR = 4% - 2% = 2% = 0.02. NNT = 1/0.02 = 50. You need to treat 50 patients to prevent one DVT.
Understanding both statistical and clinical significance is essential for evidence-based practice.
Evidence Base
Statistical Tests in Orthopaedic Research
- Survey of statistical methods in orthopaedic journals
- t-tests and ANOVA most common for continuous outcomes
- Chi-square most common for categorical outcomes
- Many studies did not report checking parametric assumptions
- Recommendation: Report tests used, assumptions checked, and justification
Parametric vs Non-Parametric Tests
- Non-parametric tests generally less powerful than parametric if assumptions met
- Power loss is usually modest (5-10%) for non-parametric tests
- When in doubt, use non-parametric - more robust to violations
- For small samples (n under 30), normality testing unreliable - use non-parametric
- For large samples (n greater than 100), parametric tests robust even if non-normal
Regression in Observational Studies
- Regression allows adjustment for confounders in observational studies
- Linear regression for continuous outcomes, logistic for binary
- Cannot adjust for unmeasured confounders - residual confounding remains
- Overfitting occurs when too many predictors relative to sample size
- Rule of thumb: Minimum 10 events per variable in logistic regression
Exam Viva Scenarios
Practice these scenarios to excel in your viva examination
Scenario 1: Test Selection
"You are comparing functional scores between 3 different surgical approaches for rotator cuff repair. What statistical test would you use and why?"
Scenario 2: Regression Interpretation
"A study used logistic regression to identify predictors of nonunion after tibial fracture. Age had an odds ratio of 1.5 (95% CI 1.2 to 1.9, p = 0.001). What does this mean?"
MCQ Practice Points
ANOVA vs Multiple t-tests
Q: Why should you NOT run multiple independent t-tests when comparing 3 or more groups? A: Inflates Type I error rate. Each t-test has 5% false positive risk. Three t-tests (Group 1 vs 2, 1 vs 3, 2 vs 3) inflate family-wise error to approximately 14%. ANOVA controls overall Type I error at 5%, then post-hoc tests with correction identify specific differences.
Paired vs Independent t-test
Q: When would you use a paired t-test instead of an independent t-test? A: When comparing same subjects at two time points (e.g., before vs after surgery). Paired t-test accounts for within-subject correlation and is more powerful. Independent t-test is for comparing two separate groups of different subjects.
Chi-square Expected Count Rule
Q: When should you use Fisher exact test instead of chi-square? A: When expected count is under 5 in any cell of the contingency table. Chi-square approximation is inaccurate with small expected counts. Fisher exact provides exact p-value for any sample size.
Correlation Coefficient Interpretation
Q: How do you interpret the Pearson correlation coefficient r = 0.7? A: Strong positive linear relationship. r = 0.7 means 49% of variance in one variable is explained by the other (r² = 0.49). Clinical significance depends on context. Interpretation: r under 0.3 = weak, 0.3-0.7 = moderate, over 0.7 = strong. Note: correlation does not imply causation.
Non-parametric Test Selection
Q: How do you decide between parametric and non-parametric tests? A: Use non-parametric tests when: (1) data violate normality assumption (check with Shapiro-Wilk test), (2) ordinal data (e.g., Likert scales), (3) small sample size where normality cannot be verified, (4) extreme outliers present. Non-parametric tests are more robust but less powerful.
Australian Context
Australian Research Framework
NHMRC Guidelines
National Health and Medical Research Council:
- National Statement on Ethical Conduct in Human Research
- Mandatory ethics approval for human research
- Australian Code for Responsible Conduct of Research
Key requirements:
- Human Research Ethics Committee (HREC) approval
- Informed consent documentation
- Data management plans
- Reporting adverse events
ANZCTR Registration
Australian New Zealand Clinical Trials Registry:
- www.anzctr.org.au
- Mandatory for clinical trials
- Required before participant enrollment
- ICMJE requirement for publication
Registration includes:
- Primary and secondary outcomes
- Sample size calculation
- Statistical analysis plan
Australian Orthopaedic Data Sources
| Resource | Type | Application |
|---|---|---|
| AOANJRR | Registry | Joint replacement outcomes nationally |
| ACSQHC | Quality standards | Clinical care standards, indicators |
| AIHW | Health statistics | National injury and disease data |
| Medicare/PBS data | Administrative | Procedure rates, medication use |
| State trauma registries | Registry | Victoria, NSW trauma outcomes |
Exam Pearl
FRACS Viva Point: "What is the level of evidence of AOANJRR data?" Answer: Level III (retrospective cohort) - but with very high validity due to near-complete capture (greater than 98%) and validated data linkage for revision endpoints.
Australian trainees should be familiar with national research infrastructure and ethics requirements.
COMMON STATISTICAL TESTS
High-Yield Exam Summary
Tests for Continuous Outcomes
- •2 groups, independent, normal = Independent t-test
- •2 groups, paired (before-after), normal = Paired t-test
- •2 groups, independent, non-normal = Mann-Whitney U
- •2 groups, paired, non-normal = Wilcoxon signed-rank
- •3+ groups, independent, normal = One-way ANOVA + post-hoc
- •3+ groups, repeated measures, normal = Repeated measures ANOVA
- •3+ groups, independent, non-normal = Kruskal-Wallis
Tests for Categorical Outcomes
- •Comparing proportions, expected count over 5 = Chi-square
- •Comparing proportions, expected count under 5 = Fisher exact
- •Binary outcome with predictors = Logistic regression
- •Multiple categorical outcomes = Chi-square or multinomial regression
Tests for Associations
- •Correlation between 2 continuous, normal = Pearson correlation (r)
- •Correlation between 2 variables, non-normal or ordinal = Spearman correlation (rho)
- •Predicting continuous outcome from predictors = Linear regression
- •Predicting binary outcome from predictors = Logistic regression (OR)
- •Correlation does NOT imply causation - confounders may explain
Critical Test Selection Rules
- •Match test to outcome type (continuous vs categorical)
- •Check normality before parametric tests (histogram, Q-Q plot, Shapiro-Wilk)
- •Use paired tests for before-after, independent for separate groups
- •ANOVA first for 3+ groups, then post-hoc (never multiple t-tests)
- •Fisher exact when expected count under 5 (not chi-square)
Interpretation Principles
- •ANOVA tells IF groups differ, post-hoc tells WHICH groups
- •Pearson r: 0-0.3 weak, 0.3-0.7 moderate, 0.7-1.0 strong correlation
- •Logistic regression OR greater than 1 = increased odds, OR less than 1 = decreased odds
- •Regression coefficients are adjusted for other variables in model
- •Non-parametric tests less powerful but more robust to violations
Common Mistakes
- •Multiple t-tests instead of ANOVA (inflates Type I error)
- •Independent t-test for paired data (loses power)
- •Chi-square with expected count under 5 (inaccurate p-value)
- •Not checking parametric assumptions before using t-test or ANOVA
- •Confusing correlation with causation (observational data)