Comprehensive Summary
This study used Large Language Models (LLMs), ChatGPT and Microsoft CoPilot to analyze and compare the relative quality of information provided for patients on shoulder arthroplasty. Thirty frequently asked questions on anatomic total shoulder arthroplasty (aTSA) and reverse total shoulder arthroplasty (rTSA) were categorized using Rothwell criteria into Fact, Policy, and Value sections. Responses by AI were then rated on readability, quality, and reliability using Flesch-Kincaid Reading Ease Score (FRES) and Flesch-Kincaid Grade Level (FKGL), DISCERN scale, and Journal of the American Medical Association (JAMA) criteria, respectively. There was a statistically significant difference in performance between AI models, with CoPilot outperforming ChatGPT in quality, reliability, and readability. Overall, both groups were rated as good sources of information, with CoPilot being of a higher quality than ChatGPT in general. ChatGPT also had specific deficiencies in reliability, often fabricating sources and mismatching evidence.
Outcomes and Implications
This study is of major importance to patient medical education studies. With AI use increasing daily, more patients rely on it as a source of information. This necessitates research into the quality and accuracy of information provided, especially for surgeons to better inform patients. Future projects in LLM research can focus on improving readability and reliability in medical information. With advancements occurring daily, AI information has the potential to become a primary source of supplementary information in patient education.