High-yield overview

Machine learning for fracture detection, measurement and planning - decision support, not decision replacement

—Fracture Detection

90-95%Sensitivity

—FDA Cleared Tools

—Multiple available

—Common Application

—Wrist, hip fractures

—Role

—Decision support

AI Application Categories

Detection: Fracture identification, abnormality flagging

Measurement: Automated angles, alignment metrics

Planning: Arthroplasty templating, surgical simulation

Prioritisation: Worklist triage by urgency

Key: AI augments clinical capability but requires human oversight

Critical Must-Knows

AI tools are decision support - clinician remains responsible
High sensitivity for fracture detection reduces missed injuries
Best validated for wrist, hip, and chest radiograph applications
Cannot replace clinical correlation and physical examination
Regulatory clearance (FDA 510(k), CE/UKCA, or national equivalent) required for clinical use

Clinical Pearls

"
AI assists detection but does not replace clinical decision-making
"
Deep learning uses convolutional neural networks (CNNs)
"
Performance depends on training data quality and diversity
"
Particularly useful for reducing missed fractures in ED

Clinical Warning

AI in radiology is an emerging topic. For fellowship exams, understand the basic concepts (machine learning, deep learning), current validated applications (fracture detection), limitations (training bias, cannot replace clinical judgement), and the medicolegal position (clinician responsibility remains).

Mnemonic

AI TAppraising an Imaging AI Tool

V	Validation - prospective, peer-reviewed, on a population like yours Validation - prospective, peer-reviewed, on a population like yours
A	Approval - FDA/CE/UKCA or national regulator clearance for your jurisdiction Approval - FDA/CE/UKCA or national regulator clearance for your jurisdiction
L	Limitations - know the failure modes (occult, out-of-distribution, paediatric) Limitations - know the failure modes (occult, out-of-distribution, paediatric)
I	Integration - fits PACS/workflow; who reviews the output? Integration - fits PACS/workflow; who reviews the output?
D	Drift & audit - ongoing local monitoring of sensitivity/specificity Drift & audit - ongoing local monitoring of sensitivity/specificity

V	Validation - prospective, peer-reviewed, on a population like yours Validation - prospective, peer-reviewed, on a population like yours	I	Integration - fits PACS/workflow; who reviews the output? Integration - fits PACS/workflow; who reviews the output?
A	Approval - FDA/CE/UKCA or national regulator clearance for your jurisdiction Approval - FDA/CE/UKCA or national regulator clearance for your jurisdiction	D	Drift & audit - ongoing local monitoring of sensitivity/specificity Drift & audit - ongoing local monitoring of sensitivity/specificity
L	Limitations - know the failure modes (occult, out-of-distribution, paediatric) Limitations - know the failure modes (occult, out-of-distribution, paediatric)

Hook:If a tool is not VALID for your jurisdiction and population, a high published AUC is irrelevant - never deploy on the vendor's numbers alone.

Mnemonic

AI RWhy a 'Negative' AI Result Never Excludes a Fracture

S	Sensitivity is not 100% - occult fractures are still missed Sensitivity is not 100% - occult fractures are still missed
A	Automation bias - do not let a confident output stop your reasoning Automation bias - do not let a confident output stop your reasoning
F	Findings clinically - examination and mechanism override the algorithm Findings clinically - examination and mechanism override the algorithm
E	Escalate - high suspicion warrants immobilise, repeat imaging or MRI/CT Escalate - high suspicion warrants immobilise, repeat imaging or MRI/CT

S	Sensitivity is not 100% - occult fractures are still missed Sensitivity is not 100% - occult fractures are still missed	F	Findings clinically - examination and mechanism override the algorithm Findings clinically - examination and mechanism override the algorithm
A	Automation bias - do not let a confident output stop your reasoning Automation bias - do not let a confident output stop your reasoning	E	Escalate - high suspicion warrants immobilise, repeat imaging or MRI/CT Escalate - high suspicion warrants immobilise, repeat imaging or MRI/CT

Hook:The classic trap: AI says 'no fracture', the patient has snuffbox tenderness - you still treat as a scaphoid fracture. Clinical suspicion always wins.

Overview & Core Principles

Orthopaedic radiology workstation with neural network style decision support beside an MSK radiograph — Core AI concept: image analysis is performed on radiographs, but the clinician still reviews the imaging context and final output.Credit: OrthoVellum

AI Terminology

Term	Definition	Example
Artificial Intelligence (AI)	Machines performing tasks requiring human intelligence	Any automated image analysis
Machine Learning (ML)	Algorithms that improve through experience	Learning from labelled examples
Deep Learning (DL)	Neural networks with multiple layers	Convolutional neural networks
Convolutional Neural Network (CNN)	Neural network for image analysis	Fracture detection models
Training Data	Labelled examples used to teach the algorithm	Radiographs with/without fractures
Inference	Applying trained model to new data	Analysing a new patient radiograph

How AI Learns to Detect Fractures

A deep learning model is trained on thousands of labelled radiographs (fracture vs no fracture). The CNN automatically learns features that distinguish fractures (cortical disruption, angulation, subtle lucent lines) without explicit programming. The model is then validated on a separate test set to assess real-world performance. More diverse training data generally improves generalisation.

Clinical Imaging Applications

Emergency MSK radiograph workstation showing fracture-detection highlight boxes across multiple regions — Clinical application visual: AI fracture detection can flag possible abnormalities across common emergency radiographs, functioning as a second-read support tool.Credit: OrthoVellum

Deep learning fracture-detection workflow and example orthopaedic radiograph outputs — Open-access fracture-detection workflow figure showing radiograph input, deep-learning processing, fracture output, and example MSK radiograph outputs.Credit: Nature Portfolio / npj Digital Medicine (CC BY 4.0)

AI Fracture Detection Performance

Body Region	Typical Sensitivity	Clinical Utility
Wrist/hand	90-95%	Reduces missed scaphoid, metacarpal fractures
Hip	90-98%	Flags occult neck of femur fractures
Chest (ribs)	85-95%	Detects subtle rib fractures
Spine	85-92%	Identifies vertebral compression fractures
Ankle	88-94%	Assists with subtle malleolar fractures
Paediatric elbow	85-92%	Helps with occult fractures

ED Workflow Integration

AI fracture detection tools integrate into the ED workflow by automatically analysing radiographs and flagging potential fractures. This can reduce missed fractures (particularly important for trainee coverage and high-volume departments), prioritise urgent cases, and provide a 'second read'. The clinician reviews all AI suggestions and makes the final determination.

Performance Metrics

Radiology workstation with wrist radiograph and abstract AI performance dashboard — Performance metrics visual: model outputs must be interpreted alongside sensitivity, specificity, and error-pattern monitoring rather than treated as a binary clinical answer.Credit: OrthoVellum

MSK radiograph workstation with abstract curves, bars, and performance grid — AI performance review uses threshold curves, false-positive and false-negative patterns, and local audit data to decide whether a model is clinically useful.Credit: OrthoVellum

Fracture-detection model performance by anatomical region — Open-access performance figure summarising fracture-detection model sensitivity, specificity, and AUC across anatomical regions.Credit: Nature Portfolio / npj Digital Medicine (CC BY 4.0)

Understanding AI Performance

Metric	Definition	Clinical Interpretation
Sensitivity	True positive rate (detects fractures)	High = few missed fractures
Specificity	True negative rate (correct negatives)	High = few false alarms
PPV	Positive predictive value	Probability positive result is true
NPV	Negative predictive value	Probability negative result is true
AUC-ROC	Area under ROC curve	Overall discriminative ability (0.5-1.0)
F1 Score	Harmonic mean of precision/recall	Balanced performance measure

Sensitivity vs Specificity Trade-off

In fracture detection, high sensitivity (few missed fractures) is prioritised over specificity. A sensitive AI tool may generate false positives (overcalling fractures) which are easily dismissed by the clinician. Missing a fracture (false negative) has more serious consequences. Most AI tools are tuned for high sensitivity, accepting some overcalling.

Limitations

Clinician correlating wrist symptoms with radiograph and AI warning symbol — Limitation visual: a reassuring algorithm output cannot override a focused clinical examination when suspicion remains high.Credit: OrthoVellum

AI Limitations in Radiology

Limitation	Explanation	Mitigation
Training bias	Model reflects training data characteristics	Diverse, representative datasets
Out-of-distribution	Poor performance on unusual cases	Clinical oversight, flag uncertainty
Black box	Cannot explain reasoning	Explainability research, heatmaps
Data quality	Garbage in, garbage out	Quality training data curation
Regulatory lag	Approval slower than development	Use only approved tools clinically
Integration challenges	Technical/workflow barriers	PACS integration, user training

Regulatory and Medicolegal

Healthcare governance meeting reviewing orthopaedic AI radiology validation workflow — Governance visual: AI radiology tools require validation, clinical workflow design, and accountability before clinical deployment.Credit: OrthoVellum

Regulatory Framework

Aspect	Requirement	Notes
Classification	Medical device (software)	SaMD - Software as Medical Device (IMDRF framework)
FDA clearance (US)	Required for clinical use in USA	510(k) pathway most common for fracture AI
CE/UKCA marking (EU/UK)	Required in EU (MDR) and UK (UKCA)	Most fracture tools are MDR Class IIa/IIb
National regulators	Required in each jurisdiction	TGA (Australia), Health Canada, PMDA (Japan), CDSCO (India)
Clinical validation	Performance data required	Prospective, locally representative studies preferred
Post-market surveillance	Ongoing monitoring + drift detection	Report adverse events; monitor performance over time

Medicolegal Position

The clinician remains legally responsible for the clinical decision, whether or not AI was used. AI is a decision support tool, not a decision-maker. If AI misses a fracture, the clinician is still responsible for the missed diagnosis if they did not exercise appropriate clinical judgement. Documentation should reflect that AI was used as an adjunct, not as the sole basis for the decision.

Guidelines, Registries & Global Practice

Missed fractures are the single most common diagnostic error in musculoskeletal imaging worldwide, and the burden falls hardest where specialist reporting is scarce - this is the global rationale for fracture-detection AI.

Society & Regulatory Positions on AI in Imaging (Side by Side)

Body / Region	Position on imaging AI	Practical implication
FDA (US)	Clears most fracture tools via 510(k); evolving framework for adaptive/locked algorithms	Cleared tools are decision support; predicate-based clearance does not prove outcome benefit
ACR (US)	Endorses AI as an adjunct; runs the ACR AI registry (Assess-AI) and Data Science Institute use cases	Encourages local performance monitoring rather than blind adoption
RCR / NICE / NHS (UK)	RCR cautious endorsement; NICE early value assessment of fracture-detection AI (e.g. ED use)	Permits conditional use with evidence generation; human report still required
EFORT / European radiology bodies	Support AI as augmentation; emphasise CE/MDR compliance and explainability	MDR Class IIa/IIb obligations and post-market surveillance
AO Foundation	Promotes AI for classification, templating and surgical planning education	Focus on consistency of fracture classification and pre-operative planning
WHO / IMDRF	Frameworks for SaMD and ethics of AI in health, relevant to limited-resource scale-up	Stresses equity, validation in local populations, and governance

High-Resource vs Limited-Resource Practice Variation

Dimension	High-resource setting	Limited-resource setting
Primary role of AI	Second-read / worklist triage to support specialist radiologists	Front-line decision support where no radiologist is available
Connectivity	Integrated into PACS/RIS, on-premise or cloud inference	May rely on smartphone capture or intermittent connectivity
Main benefit	Efficiency, reduced miss rate, faster turnaround	Access to expertise that would otherwise be absent (task-shifting)
Main risk	Automation bias, alert fatigue, over-investigation	Deployment of unvalidated tools, distribution shift, no oversight
Governance maturity	Formal validation, audit and surveillance pathways	Often absent - the key barrier to safe scale-up

Registry & Audit Note

Unlike arthroplasty, AI imaging tools have no single international implant-style registry, but post-market monitoring registries are emerging (for example the ACR Assess-AI registry in the US). The exam-relevant principle: any deployed tool needs continuous local audit of sensitivity, specificity and performance drift in the actual patient population, not a one-off validation figure quoted by the vendor.

Future Directions

Multimodal orthopaedic imaging workstation with radiograph, MRI, 3D bone model, and AI overlay — Future direction visual: orthopaedic imaging AI is moving from single-radiograph detection toward multimodal decision support and surgical planning.Credit: OrthoVellum

Emerging AI Applications

Area	Application	Potential Impact
Natural language processing	Automated report generation	Efficiency, consistency
Multimodal AI	Combined imaging and clinical data	More holistic assessment
Federated learning	Training without sharing data	Privacy-preserving improvement
Foundation models	Pre-trained, adaptable models	Faster development of new tools
Real-time guidance	Intraoperative AI assistance	Surgical precision
Outcome prediction	Predict treatment success	Personalised medicine

Radiologist-AI Collaboration

The future is likely radiologist-AI collaboration rather than replacement. AI handles routine detection and measurement tasks, freeing radiologists for complex interpretation, clinical correlation, and communication. Studies suggest radiologist + AI outperforms either alone for many tasks.

Systematic Approach: A Negative AI Result Is Not a Differential

The most dangerous error is treating an AI "no fracture" output as a clinical answer. AI flags a pattern; the clinician must still work through the differential of why a region hurts despite a reassuring algorithm. The table below contrasts the entities a fracture-detection model is and is not built to resolve.

Painful Region with a 'Negative' or Equivocal AI Output - Differential

Entity	Why AI may miss or mislabel it	Clinical action that overrides AI
Occult scaphoid fracture	Often invisible on initial radiograph (AI trained on radiographs cannot see what is not yet visible)	Snuffbox tenderness - immobilise, repeat imaging or MRI at 10-14 days
Occult hip / femoral neck fracture	Subtle trabecular disruption, frequent false negatives in osteopenic bone	Inability to weight-bear - CT or MRI regardless of AI output
Stress / insufficiency fracture	Radiographically silent for 2-3 weeks; outside most training distributions	History (load change, metabolic risk) - MRI or bone scan
Pathological fracture / bone lesion	Model trained on traumatic fractures may not flag underlying lesion	Atraumatic mechanism, lytic/blastic clues - cross-sectional imaging, oncology referral
Non-accidental injury (paediatric)	AI detects the fracture, not the pattern or social context	Recognise inconsistent history, multiple ages of injury - safeguarding pathway
Soft-tissue / ligamentous injury	Fracture model has no class for ligament or tendon	Examination, stress views, MRI / ultrasound
Out-of-distribution image	Unusual projection, hardware, paediatric physis, rare anatomy degrades performance	Treat AI output as unreliable; rely on clinical reasoning

The Automation-Bias Trap

The validated harm of AI is not the false negative itself but automation bias - the tendency to defer to a confident-looking output and stop thinking. Studies show readers can be anchored by an incorrect AI suggestion, occasionally being talked OUT of a correct call. The safe posture: read the image first, then consult the AI, and let a discordant high clinical suspicion always win.

Controversies & Areas of Uncertainty

Evidence Base

Deep Learning Assistance Closes the Accuracy Gap in Fracture Detection Across Clinician Types

Anderson PG, Baum GL, Keathley N, Sicular S, Venkatesh S, Sharma A, Daluiski A, Potter H, Hotchkiss R, Lindsey RV, Jones RM • Clin Orthop Relat Res (2022)

Key Findings:

Multi-reader multi-case study: 24 clinicians (radiologists, orthopaedic surgeons, PAs, primary care and emergency physicians) read 175 cases across 12 anatomical regions, aided and unaided by an FDA-cleared deep learning tool
Reader accuracy rose with AI aid: AUC 0.90 unaided to 0.94 aided (difference 0.04, 95% CI 0.01 to 0.07)
Sensitivity improved from 82% to 90% and specificity from 89% to 92% with AI assistance
Clinicians with limited MSK imaging training reduced their fracture miss rate from 20% to 9%, matching radiologist performance (10%)

Clinical Implication: AI assistance most benefits the non-specialist reader (ED physician, PA, junior trainee) who interprets a large share of musculoskeletal radiographs - exactly the after-hours setting where fractures are most often missed. It narrows, but does not eliminate, the expertise gap.

Limitation: Level III retrospective multi-reader design; reader behaviour in a study may not reflect real-world workflow, and the tool was developed by the sponsoring company.

Verify on PubMed (PMID 36083847)

Deep Learning Tool to Improve Fracture Detection by Radiologists and Emergency Physicians on Extremity Radiographs

Fu T, Viswanathan V, Attia A, Zerbib-Attal E, Kosaraju V, Barger R, Vidal J, Bittencourt LK, Faraji N • Acad Radiol (2023)

Key Findings:

Standalone deep learning performance on 2626 extremity radiographs: accuracy 0.986, sensitivity 0.987, specificity 0.885, with accuracy over 0.95 across body part, age, sex, view and scanner
Multi-reader study (24 readers): with AI aid, accuracy rose by 0.047 (95% CI 0.034 to 0.061) and sensitivity improved from 0.865 to 0.955
Average interpretation time fell by 7.1 seconds (27%) per examination
Diagnostic gain was largest for emergency physicians and non-MSK radiologists

Clinical Implication: Beyond accuracy, AI can shorten reading time and raise throughput - relevant to high-volume emergency departments and resource-limited settings with few specialist readers.

Limitation: Single-institution test set; retrospective reader study with a one-month washout, not a prospective outcome trial. Industry co-authorship.

Verify on PubMed (PMID 37993303)

Clinical Decision Scenarios

Use these scenarios to practise clinical reasoning and management decisions

Consultant and trainee reviewing MSK radiograph with AI heatmap overlay — Viva teaching visual: the expected answer is not whether AI is good or bad, but how the clinician validates, supervises, and integrates it safely.Credit: OrthoVellum

CLINICAL SCENARIOChallenging

CLINICAL PROMPT

"Your hospital is considering implementing an AI tool for fracture detection on emergency department radiographs. What factors would you consider?"

PRACTICAL APPROACH

Key considerations: (1) Evidence base - Is there published validation data? What is the sensitivity and specificity? Was it validated on a population similar to ours? (2) Regulatory - Does it carry the appropriate clearance for our jurisdiction (FDA 510(k) in the US, CE/UKCA mark in Europe/UK, or the relevant national regulator such as TGA, Health Canada, PMDA or CDSCO)? This is mandatory for clinical use. (3) Integration - Can it integrate with our PACS and workflow? Who reviews the AI output? (4) Clinical governance - Who is responsible for the final decision? How is AI use documented? What happens if AI misses a fracture? (5) Training - Do clinicians understand how to interpret AI results, including limitations? (6) Cost-benefit - What is the cost versus expected reduction in missed fractures and potential litigation? (7) Quality improvement - How will we audit AI performance in our population? (8) Bias - Does it perform equally across all patient demographics? The clinician always remains responsible for the final clinical decision.

KEY CLINICAL POINTS

Requires regulatory clearance for the jurisdiction

Published validation data essential

PACS integration and workflow design

Clinician remains legally responsible

Local validation and ongoing audit

COMMON PITFALLS

Assuming AI eliminates missed fractures

Using unapproved tools clinically

Over-reliance without clinical correlation

FURTHER QUESTIONS

"What would you do if the AI says 'no fracture' but you have high clinical suspicion? How would you explain AI use to a patient?"

CLINICAL SCENARIOCritical

CLINICAL PROMPT

"An ED registrar reviews a wrist X-ray and the AI tool reports 'no fracture detected'. The patient has snuffbox tenderness."

PRACTICAL APPROACH

The registrar should proceed based on clinical judgement, NOT solely on AI output. Management: (1) Snuffbox tenderness is a clinical indicator for suspected scaphoid fracture. (2) AI 'no fracture' does not exclude a fracture - scaphoid fractures are notoriously difficult to detect on initial radiographs (sensitivity approximately 70-80% even for experienced readers). (3) Apply standard scaphoid protocol: immobilise in scaphoid cast/splint, arrange follow-up X-ray in 10-14 days or MRI if available. (4) Document clinical findings and management reasoning. Key principles: AI is decision support, not decision replacement. Clinical correlation is essential. A negative AI result with positive clinical findings requires conservative management. The clinician is responsible for the final decision. Document that AI was reviewed but clinical judgement guided management.

KEY CLINICAL POINTS

Clinical findings override AI output

Scaphoid fractures often occult initially

Treat clinically suspected fracture appropriately

AI sensitivity is not 100%

Document clinical reasoning

COMMON PITFALLS

Relying solely on AI result

Discharging without appropriate follow-up

Not documenting clinical reasoning

FURTHER QUESTIONS

"What is the sensitivity of initial radiographs for scaphoid fracture? How would you explain to the patient why you're treating despite a 'normal' X-ray?"

CLINICAL SCENARIOStandard

CLINICAL PROMPT

"You are asked to give a presentation on AI in orthopaedic imaging to your department. What key messages would you convey?"

PRACTICAL APPROACH

Key messages: (1) Current state - AI is increasingly validated for fracture detection (particularly wrist, hip), automated measurements (Cobb angle, alignment), and surgical templating. Multiple tools have regulatory approval. (2) Performance - AI achieves 90-95% sensitivity for fracture detection in validated applications, potentially reducing missed diagnoses. (3) Role - AI is a decision support tool, not a replacement for clinical judgement. The combination of AI + clinician often outperforms either alone. (4) Limitations - AI cannot examine patients, consider mechanism, or integrate the full clinical picture. Performance depends on training data and may not generalise to all populations. (5) Responsibility - The clinician remains legally and ethically responsible for decisions, regardless of AI use. (6) Future - Expect increased integration into workflows, more sophisticated applications (outcome prediction, surgical guidance), and AI-radiologist collaboration models. (7) Implementation - Requires regulatory approval, clinical governance, training, and ongoing audit.

KEY CLINICAL POINTS

AI is decision support, not replacement

High sensitivity but not perfect

Clinical correlation always required

Clinician remains responsible

AI + clinician often outperforms either alone

COMMON PITFALLS

Over-promising AI capabilities

Underestimating implementation challenges

Ignoring regulatory requirements

FURTHER QUESTIONS

"How might AI change radiology training? What ethical considerations exist with AI in healthcare?"

Orthopaedic AI radiology study desk with wrist radiograph, blank cards, and bone model — Revision visual: AI radiology questions should be answered from first principles, including training data, model performance, workflow, and clinical responsibility.Credit: OrthoVellum

AI in Orthopaedic Radiology Quick Reference

Clinical summary

Core Concepts

•Deep learning uses CNNs for image analysis
•Trained on labelled examples
•Validated on separate test data
•Regulatory clearance required for clinical use

Current Applications

•Fracture detection (wrist, hip common)
•Automated measurements (Cobb angle)
•Arthroplasty templating
•Worklist prioritisation

Performance

•Sensitivity 90-95% for fracture detection
•High sensitivity prioritised (few missed)
•May have lower specificity (overcalling)
•AI + clinician better than either alone

Key Principles

•Decision support, not replacement
•Clinical correlation essential
•Clinician remains legally responsible
•Negative AI does not exclude pathology

Term

Definition

Example

Artificial Intelligence (AI)

Machines performing tasks requiring human intelligence

Any automated image analysis

Machine Learning (ML)

Algorithms that improve through experience

Learning from labelled examples

Deep Learning (DL)

Neural networks with multiple layers

Convolutional neural networks

Convolutional Neural Network (CNN)

Neural network for image analysis

Fracture detection models

Training Data

Labelled examples used to teach the algorithm

Radiographs with/without fractures

Inference

Applying trained model to new data

Analysing a new patient radiograph

Body Region

Typical Sensitivity

Clinical Utility

Wrist/hand

90-95%

Reduces missed scaphoid, metacarpal fractures

Hip

90-98%

Flags occult neck of femur fractures

Chest (ribs)

85-95%

Detects subtle rib fractures

Spine

85-92%

Identifies vertebral compression fractures

Ankle

88-94%

Assists with subtle malleolar fractures

Paediatric elbow

85-92%

Helps with occult fractures

Metric

Definition

Clinical Interpretation

Sensitivity

True positive rate (detects fractures)

High = few missed fractures

Specificity

True negative rate (correct negatives)

High = few false alarms

PPV

Positive predictive value

Probability positive result is true

NPV

Negative predictive value

Probability negative result is true

AUC-ROC

Area under ROC curve

Overall discriminative ability (0.5-1.0)

F1 Score

Harmonic mean of precision/recall

Balanced performance measure

Limitation

Explanation

Mitigation

Training bias

Model reflects training data characteristics

Diverse, representative datasets

Out-of-distribution

Poor performance on unusual cases

Clinical oversight, flag uncertainty

Black box

Cannot explain reasoning

Explainability research, heatmaps

Data quality

Garbage in, garbage out

Quality training data curation

Regulatory lag

Approval slower than development

Use only approved tools clinically

Integration challenges

Technical/workflow barriers

PACS integration, user training

Aspect

Requirement

Notes

Classification

Medical device (software)

SaMD - Software as Medical Device (IMDRF framework)

FDA clearance (US)

Required for clinical use in USA

510(k) pathway most common for fracture AI

CE/UKCA marking (EU/UK)

Required in EU (MDR) and UK (UKCA)

Most fracture tools are MDR Class IIa/IIb

National regulators

Required in each jurisdiction

TGA (Australia), Health Canada, PMDA (Japan), CDSCO (India)

Clinical validation

Performance data required

Prospective, locally representative studies preferred

Post-market surveillance

Ongoing monitoring + drift detection

Report adverse events; monitor performance over time

Body / Region

Position on imaging AI

Practical implication

FDA (US)

Clears most fracture tools via 510(k); evolving framework for adaptive/locked algorithms

Cleared tools are decision support; predicate-based clearance does not prove outcome benefit

ACR (US)

Endorses AI as an adjunct; runs the ACR AI registry (Assess-AI) and Data Science Institute use cases

Encourages local performance monitoring rather than blind adoption

RCR / NICE / NHS (UK)

RCR cautious endorsement; NICE early value assessment of fracture-detection AI (e.g. ED use)

Permits conditional use with evidence generation; human report still required

EFORT / European radiology bodies

Support AI as augmentation; emphasise CE/MDR compliance and explainability

MDR Class IIa/IIb obligations and post-market surveillance

AO Foundation

Promotes AI for classification, templating and surgical planning education

Focus on consistency of fracture classification and pre-operative planning

WHO / IMDRF

Frameworks for SaMD and ethics of AI in health, relevant to limited-resource scale-up

Stresses equity, validation in local populations, and governance

Dimension

High-resource setting

Limited-resource setting

Primary role of AI

Second-read / worklist triage to support specialist radiologists

Front-line decision support where no radiologist is available

Connectivity

Integrated into PACS/RIS, on-premise or cloud inference

May rely on smartphone capture or intermittent connectivity

Main benefit

Efficiency, reduced miss rate, faster turnaround

Access to expertise that would otherwise be absent (task-shifting)

Main risk

Automation bias, alert fatigue, over-investigation

Deployment of unvalidated tools, distribution shift, no oversight

Governance maturity

Formal validation, audit and surveillance pathways

Often absent - the key barrier to safe scale-up

Area

Application

Potential Impact

Natural language processing

Automated report generation

Efficiency, consistency

Multimodal AI

Combined imaging and clinical data

More holistic assessment

Federated learning

Training without sharing data

Privacy-preserving improvement

Foundation models

Pre-trained, adaptable models

Faster development of new tools

Real-time guidance

Intraoperative AI assistance

Surgical precision

Outcome prediction

Predict treatment success

Personalised medicine

Entity

Why AI may miss or mislabel it

Clinical action that overrides AI

Occult scaphoid fracture

Often invisible on initial radiograph (AI trained on radiographs cannot see what is not yet visible)

Snuffbox tenderness - immobilise, repeat imaging or MRI at 10-14 days

Occult hip / femoral neck fracture

Subtle trabecular disruption, frequent false negatives in osteopenic bone

Inability to weight-bear - CT or MRI regardless of AI output

Stress / insufficiency fracture

Radiographically silent for 2-3 weeks; outside most training distributions

History (load change, metabolic risk) - MRI or bone scan

Pathological fracture / bone lesion

Model trained on traumatic fractures may not flag underlying lesion

Atraumatic mechanism, lytic/blastic clues - cross-sectional imaging, oncology referral

Non-accidental injury (paediatric)

AI detects the fracture, not the pattern or social context

Recognise inconsistent history, multiple ages of injury - safeguarding pathway

Soft-tissue / ligamentous injury

Fracture model has no class for ligament or tendon

Examination, stress views, MRI / ultrasound

Out-of-distribution image

Unusual projection, hardware, paediatric physis, rare anatomy degrades performance

Treat AI output as unreliable; rely on clinical reasoning

Deep Learning Assistance Closes the Accuracy Gap in Fracture Detection Across Clinician Types

Anderson PG, Baum GL, Keathley N, Sicular S, Venkatesh S, Sharma A, Daluiski A, Potter H, Hotchkiss R, Lindsey RV, Jones RM • Clin Orthop Relat Res (2022)

Key Findings:

Multi-reader multi-case study: 24 clinicians (radiologists, orthopaedic surgeons, PAs, primary care and emergency physicians) read 175 cases across 12 anatomical regions, aided and unaided by an FDA-cleared deep learning tool
Reader accuracy rose with AI aid: AUC 0.90 unaided to 0.94 aided (difference 0.04, 95% CI 0.01 to 0.07)
Sensitivity improved from 82% to 90% and specificity from 89% to 92% with AI assistance
Clinicians with limited MSK imaging training reduced their fracture miss rate from 20% to 9%, matching radiologist performance (10%)

Limitation: Level III retrospective multi-reader design; reader behaviour in a study may not reflect real-world workflow, and the tool was developed by the sponsoring company.

Verify on PubMed (PMID 36083847)

Deep Learning Tool to Improve Fracture Detection by Radiologists and Emergency Physicians on Extremity Radiographs

Fu T, Viswanathan V, Attia A, Zerbib-Attal E, Kosaraju V, Barger R, Vidal J, Bittencourt LK, Faraji N • Acad Radiol (2023)

Key Findings:

Standalone deep learning performance on 2626 extremity radiographs: accuracy 0.986, sensitivity 0.987, specificity 0.885, with accuracy over 0.95 across body part, age, sex, view and scanner
Multi-reader study (24 readers): with AI aid, accuracy rose by 0.047 (95% CI 0.034 to 0.061) and sensitivity improved from 0.865 to 0.955
Average interpretation time fell by 7.1 seconds (27%) per examination
Diagnostic gain was largest for emergency physicians and non-MSK radiologists

Clinical Implication: Beyond accuracy, AI can shorten reading time and raise throughput - relevant to high-volume emergency departments and resource-limited settings with few specialist readers.

Limitation: Single-institution test set; retrospective reader study with a one-month washout, not a prospective outcome trial. Industry co-authorship.

Verify on PubMed (PMID 37993303)

Measurement	Application	Benefit
Hip-knee-ankle angle	Lower limb alignment	Consistent, time-saving
Cobb angle	Scoliosis assessment	Reduced variability
Acetabular angles	DDH assessment	Standardised measurement
Fracture angulation	Fracture displacement	Objective quantification
Joint space width	Arthritis grading	Reproducible assessment
Bone age	Skeletal maturity	Automated Greulich-Pyle

Application	Function	Status
Arthroplasty templating	Automated component sizing/positioning	Available, regulatory cleared
Spine instrumentation	Pedicle screw planning	Emerging
Deformity correction	Osteotomy simulation	Research/commercial
Fracture reduction	Reduction path planning	Research
Custom implant design	AI-assisted geometry optimisation	Research

Measurement	Application	Benefit
Hip-knee-ankle angle	Lower limb alignment	Consistent, time-saving
Cobb angle	Scoliosis assessment	Reduced variability
Acetabular angles	DDH assessment	Standardised measurement
Fracture angulation	Fracture displacement	Objective quantification
Joint space width	Arthritis grading	Reproducible assessment
Bone age	Skeletal maturity	Automated Greulich-Pyle

Application	Function	Status
Arthroplasty templating	Automated component sizing/positioning	Available, regulatory cleared
Spine instrumentation	Pedicle screw planning	Emerging
Deformity correction	Osteotomy simulation	Research/commercial
Fracture reduction	Reduction path planning	Research
Custom implant design	AI-assisted geometry optimisation	Research

AI in Orthopaedic Radiology

AI in Orthopaedic Radiology

AI Application Categories

Critical Must-Knows

Clinical Pearls

Clinical Warning

AI TAppraising an Imaging AI Tool

AI RWhy a 'Negative' AI Result Never Excludes a Fracture

Overview & Core Principles

AI Terminology

How AI Learns to Detect Fractures

Clinical Imaging Applications

AI Fracture Detection Performance

ED Workflow Integration

AI Measurement Applications

Measurement Reliability

AI Planning Applications

Performance Metrics

Understanding AI Performance

Sensitivity vs Specificity Trade-off

Limitations

AI Limitations in Radiology

Regulatory and Medicolegal

Regulatory Framework

Medicolegal Position

Guidelines, Registries & Global Practice

Society & Regulatory Positions on AI in Imaging (Side by Side)

High-Resource vs Limited-Resource Practice Variation

Registry & Audit Note

Future Directions

Emerging AI Applications

Radiologist-AI Collaboration

Systematic Approach: A Negative AI Result Is Not a Differential

Painful Region with a 'Negative' or Equivocal AI Output - Differential

The Automation-Bias Trap

Controversies & Areas of Uncertainty

Evidence Base

Deep Learning Assistance Closes the Accuracy Gap in Fracture Detection Across Clinician Types

Deep Learning Tool to Improve Fracture Detection by Radiologists and Emergency Physicians on Extremity Radiographs

Artificial Intelligence for X-ray Scaphoid Fracture Detection: Systematic Review and Diagnostic Test Accuracy Meta-analysis

Can a Deep Learning Algorithm Improve Detection of Occult Scaphoid Fractures? A Clinical Validation Study

Use of Deep Learning Methods for Hand Fracture Detection from Plain Hand Radiographs

SPIRIT-AI Extension: Guidelines for Clinical Trial Protocols for Interventions Involving Artificial Intelligence

Clinical Decision Scenarios

AI in Orthopaedic Radiology Quick Reference

Core Concepts

Current Applications

Performance

Key Principles

AI in Orthopaedic Radiology

AI in Orthopaedic Radiology

AI Application Categories

Critical Must-Knows

Clinical Pearls

Clinical Warning

AI TAppraising an Imaging AI Tool

AI RWhy a 'Negative' AI Result Never Excludes a Fracture

Overview & Core Principles

AI Terminology

How AI Learns to Detect Fractures

Clinical Imaging Applications

AI Fracture Detection Performance

ED Workflow Integration

AI Measurement Applications

Measurement Reliability

AI Planning Applications

Performance Metrics

Understanding AI Performance

Sensitivity vs Specificity Trade-off

Limitations

AI Limitations in Radiology

Regulatory and Medicolegal

Regulatory Framework

Medicolegal Position

Guidelines, Registries & Global Practice

Society & Regulatory Positions on AI in Imaging (Side by Side)

High-Resource vs Limited-Resource Practice Variation

Registry & Audit Note

Future Directions

Emerging AI Applications