Advancing Question-Answering in Ophthalmology With Retrieval-Augmented Generation: Benchmarking Open-Source and Proprietary Large Language Models

Back

Opthalmology

Advancing Question-Answering in Ophthalmology With Retrieval-Augmented Generation: Benchmarking Open-Source and Proprietary Large Language Models

Investigative Ophthalmology & Visual Science June 2025, Vol.66

Research Authors: Quang Ngoc Nguyen; Anh Duy Nguyen; Khang Dang; Siyin Liu; Khai Nguyen; Sophia Y Wang; William Woof; Praveen J Patel; Konstantinos Balaskas; Johan Thygesen; Honghan Wu; Nikolas Pontikos

AIIM Authors: Alex Xue, Zakariyya Siddiqui

Approved by President Reda Riffi

Publication Date: Nov 19, 2024

Comprehensive Summary

The following study presented by Nguyen et al. analyzes the performance of open-source and proprietary language models through the utilization of GPT-4-turbo and other open-source models such as Mixtral-8x7B and Llama-3-70B. Nguyen et al. performed the research by obtaining data from the Basic and Clinical Science Course (BCSC) textbook as well as the OphthoQuestion ophthalmic question datasets. Then, these datasets were fed to the models, revealing that the Retrieval-Augmented Generation (RAG) pipeline was able to provide context which then significantly improved the accuracy of the readings. These findings were furthered when compared to human references, with an accuracy of 71.91% higher when applied with GPT-4. Hence, Nguyen et al. demonstrated how the impact of RAG can create significantly stronger proprietary language models, demonstrating how there may be an alternative in privacy-preserving scenarios in which utilizing a normal third-party model is more tedious.

Outcomes and Implications

Nguyen et al. present findings which highlight the effectiveness of RAG in increasing the reliability of both accuracy and privacy-preservation within knowledge-intensive fields, as showcased by the use of GPT-4 as a proprietary model. As such, this is especially relevant to the field of medicine, in which diagnostic decisions require as much precision as possible, not just for the patient’s safety, but for the physician’s treatment decisions as well. Nguyen et al. detail how their findings display a high relevance of RAG integration into proprietary models in which obstacles such as cost and privacy are attended to due to the quantized language models which utilize resources more efficiently as compared to traditional LLMs. However, the authors also note that further application within realistic settings will be required in order to fully implement this new system of clinical decision making and diagnosis.

Our mission is to

Connect medicine with AI innovation.

No spam. Only the latest AI breakthroughs, simplified and relevant to your field.

Our mission is to

Connect medicine with AI innovation.

No spam. Only the latest AI breakthroughs, simplified and relevant to your field.

Our mission is to

Connect medicine with AI innovation.

No spam. Only the latest AI breakthroughs, simplified and relevant to your field.