Comprehensive Summary
In this study, researchers conducted a rapid review evaluating the use of Multimodal Large Language Models (MLLMs) in healthcare. Major databases were searched for articles published in 2022 and later, and 39 met the final inclusion criteria. Data were extracted into 3 categories: study characteristics, MLLM characteristics, and applications of MLLMs in healthcare, with deeper analysis on prompting strategies, model performance, and model deployment readiness. Most studies were published from North America or Asia, and 77% were published in 2024 Overall, 44% studies developed hybrid MLLMs, 28% evaluated models in clinical settings, 21% compared different MLLMs, and 7% built new frameworks on top of preexisting models. Additionally 49% of these studies focused on image-to-text outputs, such as x-rays, MRIs, or CT scans. 49% also reported that the use of specialised prompts improved output accuracy. Despite finding common themes, this review highlighted the lack of consistency in evaluation metrics, revealing the need for standard evaluation protocols to assess the accuracy and readiness of MLLMs in the medical field.
Outcomes and Implications
The expanded use of MLLMs in healthcare has potential to greatly improve efficiency in clinical practice. These models can support physicians in clinical decision-making, diagnoses, and streamlining routine tasks. This will ultimately provide more accurate diagnostics and possibly personalized treatment strategies. MLLMs may also benefit patients by providing clear and accessible explanations of medical conditions through image-to-text outputs. However, the lack of standard evaluation protocols limits the ability to compare results from different studies and assess real-world readiness. Further research in areas such as safety, protocol alignment, and clinical integration is needed before MLLMs become a regular tool to improve the healthcare field.