Comprehensive Summary
This study evaluated the use of ChatGPT as a clinical assistant in an orthopedic hand clinic. A team of clinicians developed nine clinical summaries encompassing both common and uncommon hand pathologies. Researchers fed these summaries into ChatGPT and asked it to generate a diagnosis, recommend a workup, and provide treatment options. Responses were graded on a 5-point scale based on accuracy and clinical utility. ChatGPT correctly diagnosed seven out of nine cases but struggled when faced with more complex or atypical presentations. When intentionally provided with inaccurate information, ChatGPT failed to detect the error, underscoring its limited capacity for clinical reasoning. Overall, findings suggest that ChatGPT was more effective in suggesting treatment options than in identifying a specific diagnosis.
Outcomes and Implications
Large language models (LLMs) like ChatGPT draw on vast data sources, enabling them to address many common clinical problems. However, fellowship-trained hand surgeons concluded that its usefulness as an assistant in an orthopedic hand clinic is limited. ChatGPT often cited nonexistent or inaccurate references in its explanations, presenting risks in unsupervised clinical practice. The authors note that LLMs may have a role in future clinical practice, but substantial advances and validation are required before safe clinical implementation.