Machine learning for fracture detection, measurement and planning - decision support, not decision replacement
AI Application Categories
Detection: Fracture identification, abnormality flagging
Measurement: Automated angles, alignment metrics
Planning: Arthroplasty templating, surgical simulation
Prioritisation: Worklist triage by urgency
Key: AI augments clinical capability but requires human oversight
Critical Must-Knows
- AI tools are decision support - clinician remains responsible
- High sensitivity for fracture detection reduces missed injuries
- Best validated for wrist, hip, and chest radiograph applications
- Cannot replace clinical correlation and physical examination
- Regulatory clearance (FDA 510(k), CE/UKCA, or national equivalent) required for clinical use
Clinical Pearls
- "AI assists detection but does not replace clinical decision-making
- "Deep learning uses convolutional neural networks (CNNs)
- "Performance depends on training data quality and diversity
- "Particularly useful for reducing missed fractures in ED
Clinical Warning
AI in radiology is an emerging topic. For fellowship exams, understand the basic concepts (machine learning, deep learning), current validated applications (fracture detection), limitations (training bias, cannot replace clinical judgement), and the medicolegal position (clinician responsibility remains).
AI TAppraising an Imaging AI Tool
| V | Validation - prospective, peer-reviewed, on a population like yours Validation - prospective, peer-reviewed, on a population like yours |
| A | Approval - FDA/CE/UKCA or national regulator clearance for your jurisdiction Approval - FDA/CE/UKCA or national regulator clearance for your jurisdiction |
| L | Limitations - know the failure modes (occult, out-of-distribution, paediatric) Limitations - know the failure modes (occult, out-of-distribution, paediatric) |
| I | Integration - fits PACS/workflow; who reviews the output? Integration - fits PACS/workflow; who reviews the output? |
| D | Drift & audit - ongoing local monitoring of sensitivity/specificity Drift & audit - ongoing local monitoring of sensitivity/specificity |
| V | Validation - prospective, peer-reviewed, on a population like yours Validation - prospective, peer-reviewed, on a population like yours | I | Integration - fits PACS/workflow; who reviews the output? Integration - fits PACS/workflow; who reviews the output? |
| A | Approval - FDA/CE/UKCA or national regulator clearance for your jurisdiction Approval - FDA/CE/UKCA or national regulator clearance for your jurisdiction | D | Drift & audit - ongoing local monitoring of sensitivity/specificity Drift & audit - ongoing local monitoring of sensitivity/specificity |
| L | Limitations - know the failure modes (occult, out-of-distribution, paediatric) Limitations - know the failure modes (occult, out-of-distribution, paediatric) |
Hook:If a tool is not VALID for your jurisdiction and population, a high published AUC is irrelevant - never deploy on the vendor's numbers alone.
AI RWhy a 'Negative' AI Result Never Excludes a Fracture
| S | Sensitivity is not 100% - occult fractures are still missed Sensitivity is not 100% - occult fractures are still missed |
| A | Automation bias - do not let a confident output stop your reasoning Automation bias - do not let a confident output stop your reasoning |
| F | Findings clinically - examination and mechanism override the algorithm Findings clinically - examination and mechanism override the algorithm |
| E | Escalate - high suspicion warrants immobilise, repeat imaging or MRI/CT Escalate - high suspicion warrants immobilise, repeat imaging or MRI/CT |
| S | Sensitivity is not 100% - occult fractures are still missed Sensitivity is not 100% - occult fractures are still missed | F | Findings clinically - examination and mechanism override the algorithm Findings clinically - examination and mechanism override the algorithm |
| A | Automation bias - do not let a confident output stop your reasoning Automation bias - do not let a confident output stop your reasoning | E | Escalate - high suspicion warrants immobilise, repeat imaging or MRI/CT Escalate - high suspicion warrants immobilise, repeat imaging or MRI/CT |
Hook:The classic trap: AI says 'no fracture', the patient has snuffbox tenderness - you still treat as a scaphoid fracture. Clinical suspicion always wins.
Overview & Core Principles

AI Terminology
| Term | Definition | Example |
|---|---|---|
| Artificial Intelligence (AI) | Machines performing tasks requiring human intelligence | Any automated image analysis |
| Machine Learning (ML) | Algorithms that improve through experience | Learning from labelled examples |
| Deep Learning (DL) | Neural networks with multiple layers | Convolutional neural networks |
| Convolutional Neural Network (CNN) | Neural network for image analysis | Fracture detection models |
| Training Data | Labelled examples used to teach the algorithm | Radiographs with/without fractures |
| Inference | Applying trained model to new data | Analysing a new patient radiograph |
How AI Learns to Detect Fractures
Clinical Imaging Applications


AI Fracture Detection Performance
| Body Region | Typical Sensitivity | Clinical Utility |
|---|---|---|
| Wrist/hand | 90-95% | Reduces missed scaphoid, metacarpal fractures |
| Hip | 90-98% | Flags occult neck of femur fractures |
| Chest (ribs) | 85-95% | Detects subtle rib fractures |
| Spine | 85-92% | Identifies vertebral compression fractures |
| Ankle | 88-94% | Assists with subtle malleolar fractures |
| Paediatric elbow | 85-92% | Helps with occult fractures |
ED Workflow Integration
Performance Metrics



Understanding AI Performance
| Metric | Definition | Clinical Interpretation |
|---|---|---|
| Sensitivity | True positive rate (detects fractures) | High = few missed fractures |
| Specificity | True negative rate (correct negatives) | High = few false alarms |
| PPV | Positive predictive value | Probability positive result is true |
| NPV | Negative predictive value | Probability negative result is true |
| AUC-ROC | Area under ROC curve | Overall discriminative ability (0.5-1.0) |
| F1 Score | Harmonic mean of precision/recall | Balanced performance measure |
Sensitivity vs Specificity Trade-off
Limitations

AI Limitations in Radiology
| Limitation | Explanation | Mitigation |
|---|---|---|
| Training bias | Model reflects training data characteristics | Diverse, representative datasets |
| Out-of-distribution | Poor performance on unusual cases | Clinical oversight, flag uncertainty |
| Black box | Cannot explain reasoning | Explainability research, heatmaps |
| Data quality | Garbage in, garbage out | Quality training data curation |
| Regulatory lag | Approval slower than development | Use only approved tools clinically |
| Integration challenges | Technical/workflow barriers | PACS integration, user training |
Regulatory and Medicolegal

Regulatory Framework
| Aspect | Requirement | Notes |
|---|---|---|
| Classification | Medical device (software) | SaMD - Software as Medical Device (IMDRF framework) |
| FDA clearance (US) | Required for clinical use in USA | 510(k) pathway most common for fracture AI |
| CE/UKCA marking (EU/UK) | Required in EU (MDR) and UK (UKCA) | Most fracture tools are MDR Class IIa/IIb |
| National regulators | Required in each jurisdiction | TGA (Australia), Health Canada, PMDA (Japan), CDSCO (India) |
| Clinical validation | Performance data required | Prospective, locally representative studies preferred |
| Post-market surveillance | Ongoing monitoring + drift detection | Report adverse events; monitor performance over time |
Medicolegal Position
Guidelines, Registries & Global Practice
Missed fractures are the single most common diagnostic error in musculoskeletal imaging worldwide, and the burden falls hardest where specialist reporting is scarce - this is the global rationale for fracture-detection AI.
Society & Regulatory Positions on AI in Imaging (Side by Side)
| Body / Region | Position on imaging AI | Practical implication |
|---|---|---|
| FDA (US) | Clears most fracture tools via 510(k); evolving framework for adaptive/locked algorithms | Cleared tools are decision support; predicate-based clearance does not prove outcome benefit |
| ACR (US) | Endorses AI as an adjunct; runs the ACR AI registry (Assess-AI) and Data Science Institute use cases | Encourages local performance monitoring rather than blind adoption |
| RCR / NICE / NHS (UK) | RCR cautious endorsement; NICE early value assessment of fracture-detection AI (e.g. ED use) | Permits conditional use with evidence generation; human report still required |
| EFORT / European radiology bodies | Support AI as augmentation; emphasise CE/MDR compliance and explainability | MDR Class IIa/IIb obligations and post-market surveillance |
| AO Foundation | Promotes AI for classification, templating and surgical planning education | Focus on consistency of fracture classification and pre-operative planning |
| WHO / IMDRF | Frameworks for SaMD and ethics of AI in health, relevant to limited-resource scale-up | Stresses equity, validation in local populations, and governance |
High-Resource vs Limited-Resource Practice Variation
| Dimension | High-resource setting | Limited-resource setting |
|---|---|---|
| Primary role of AI | Second-read / worklist triage to support specialist radiologists | Front-line decision support where no radiologist is available |
| Connectivity | Integrated into PACS/RIS, on-premise or cloud inference | May rely on smartphone capture or intermittent connectivity |
| Main benefit | Efficiency, reduced miss rate, faster turnaround | Access to expertise that would otherwise be absent (task-shifting) |
| Main risk | Automation bias, alert fatigue, over-investigation | Deployment of unvalidated tools, distribution shift, no oversight |
| Governance maturity | Formal validation, audit and surveillance pathways | Often absent - the key barrier to safe scale-up |
Registry & Audit Note
Future Directions

Emerging AI Applications
| Area | Application | Potential Impact |
|---|---|---|
| Natural language processing | Automated report generation | Efficiency, consistency |
| Multimodal AI | Combined imaging and clinical data | More holistic assessment |
| Federated learning | Training without sharing data | Privacy-preserving improvement |
| Foundation models | Pre-trained, adaptable models | Faster development of new tools |
| Real-time guidance | Intraoperative AI assistance | Surgical precision |
| Outcome prediction | Predict treatment success | Personalised medicine |
Radiologist-AI Collaboration
Systematic Approach: A Negative AI Result Is Not a Differential
The most dangerous error is treating an AI "no fracture" output as a clinical answer. AI flags a pattern; the clinician must still work through the differential of why a region hurts despite a reassuring algorithm. The table below contrasts the entities a fracture-detection model is and is not built to resolve.
Painful Region with a 'Negative' or Equivocal AI Output - Differential
| Entity | Why AI may miss or mislabel it | Clinical action that overrides AI |
|---|---|---|
| Occult scaphoid fracture | Often invisible on initial radiograph (AI trained on radiographs cannot see what is not yet visible) | Snuffbox tenderness - immobilise, repeat imaging or MRI at 10-14 days |
| Occult hip / femoral neck fracture | Subtle trabecular disruption, frequent false negatives in osteopenic bone | Inability to weight-bear - CT or MRI regardless of AI output |
| Stress / insufficiency fracture | Radiographically silent for 2-3 weeks; outside most training distributions | History (load change, metabolic risk) - MRI or bone scan |
| Pathological fracture / bone lesion | Model trained on traumatic fractures may not flag underlying lesion | Atraumatic mechanism, lytic/blastic clues - cross-sectional imaging, oncology referral |
| Non-accidental injury (paediatric) | AI detects the fracture, not the pattern or social context | Recognise inconsistent history, multiple ages of injury - safeguarding pathway |
| Soft-tissue / ligamentous injury | Fracture model has no class for ligament or tendon | Examination, stress views, MRI / ultrasound |
| Out-of-distribution image | Unusual projection, hardware, paediatric physis, rare anatomy degrades performance | Treat AI output as unreliable; rely on clinical reasoning |
The Automation-Bias Trap
Controversies & Areas of Uncertainty
Evidence Base
Deep Learning Assistance Closes the Accuracy Gap in Fracture Detection Across Clinician Types
- Multi-reader multi-case study: 24 clinicians (radiologists, orthopaedic surgeons, PAs, primary care and emergency physicians) read 175 cases across 12 anatomical regions, aided and unaided by an FDA-cleared deep learning tool
- Reader accuracy rose with AI aid: AUC 0.90 unaided to 0.94 aided (difference 0.04, 95% CI 0.01 to 0.07)
- Sensitivity improved from 82% to 90% and specificity from 89% to 92% with AI assistance
- Clinicians with limited MSK imaging training reduced their fracture miss rate from 20% to 9%, matching radiologist performance (10%)
Deep Learning Tool to Improve Fracture Detection by Radiologists and Emergency Physicians on Extremity Radiographs
- Standalone deep learning performance on 2626 extremity radiographs: accuracy 0.986, sensitivity 0.987, specificity 0.885, with accuracy over 0.95 across body part, age, sex, view and scanner
- Multi-reader study (24 readers): with AI aid, accuracy rose by 0.047 (95% CI 0.034 to 0.061) and sensitivity improved from 0.865 to 0.955
- Average interpretation time fell by 7.1 seconds (27%) per examination
- Diagnostic gain was largest for emergency physicians and non-MSK radiologists
Clinical Decision Scenarios
Use these scenarios to practise clinical reasoning and management decisions

"Your hospital is considering implementing an AI tool for fracture detection on emergency department radiographs. What factors would you consider?"
"An ED registrar reviews a wrist X-ray and the AI tool reports 'no fracture detected'. The patient has snuffbox tenderness."
"You are asked to give a presentation on AI in orthopaedic imaging to your department. What key messages would you convey?"

AI in Orthopaedic Radiology Quick Reference
Clinical summary
Core Concepts
- •Deep learning uses CNNs for image analysis
- •Trained on labelled examples
- •Validated on separate test data
- •Regulatory clearance required for clinical use
Current Applications
- •Fracture detection (wrist, hip common)
- •Automated measurements (Cobb angle)
- •Arthroplasty templating
- •Worklist prioritisation
Performance
- •Sensitivity 90-95% for fracture detection
- •High sensitivity prioritised (few missed)
- •May have lower specificity (overcalling)
- •AI + clinician better than either alone
Key Principles
- •Decision support, not replacement
- •Clinical correlation essential
- •Clinician remains legally responsible
- •Negative AI does not exclude pathology