Northwestern Polytechnical University team: Potential of Multimodal Large Language Models for Data Mining of Medical Images and Free-text Reports

Published 09 January, 2025

In recent years, the advancement of multimodal large language models (MLLMs) has increasingly demonstrated their potential in medical data mining. However, the diversity and heterogeneity nature of medical images and radiology reports can pose significant challenges to the universality of data mining methods.

To address these challenges, a team led by Dr. Xin Zhang from the Institute of Medical Research, Northwestern Polytechnical University in Xi’an, China, systematically evaluated the performance of Gemini and GPT-series models across various medical tasks.

“Our study encompasses 14 diverse medical datasets, spanning dermatology, radiology, dentistry, ophthalmology and endoscopy image categories, as well as radiology report datasets,” shares Zhang. “The tasks evaluated include disease classification, lesion segmentation, anatomical localization, disease diagnosis and report generation.”

The results reveal that the Gemini series excels in report generation and lesion detection, while the GPT series demonstrates strengths in lesion segmentation and anatomical localization.

“The study highlights the promise of these multimodal models in alleviating the burden on clinicians and fostering the integration of AI into clinical practice, potentially mitigating healthcare resource constraints,” adds Zhang. “Nonetheless, further optimization and rigorous validation are required before clinical deployment.

The team published their findings in the KeAi journal Meta-Radiology.

By establishing benchmarks for the performance of multimodal AI systems, the team’s efforts provide a foundation for the continued development and application of such technologies, as well as future research on the multimodal integration of medical imaging and textual analysis.

SCHEMATIC OVERVIEW OF THE EVALUATION TASKS AND METHODS

Contact author name, affiliation, email address: Xin Zhang, Institute of Medical Research, Northwestern Polytechnical University, xzhang@nwpu.edu.cn

Conflict of interest: The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

See the article: Zhang, Yutong, et al. "Potential of multimodal large language models for data mining of medical images and free-text reports." Meta-Radiology 2.4 (2024): 100103. https://doi.org/10.1016/j.metrad.2024.100103

 

Back to News

Stay Informed

Register your interest and receive email alerts tailored to your needs. Sign up below.