Project - Developing Multimodal Large Language Models for Healthcare and Remote Sensing

Principal Investigator(s)
Erkut Erdem

Researcher(s)

Funded Student(s)

Project Duration
Jun, 2025 - Jun, 2028 (3 years)

Abstract Today, cutting-edge large language models like GPT-4(o) and Gemini, developed by technology companies such as OpenAI and Google, exhibit reasoning capabilities that sometimes match or even exceed human intelligence. Specifically, these multimodal models, those that can perform sophisticated analyses on data obtained from various information sources such as text, images and audio, have the potential to revolutionize many industrial processes and potentially transform our worldview. In healthcare and remote sensing, it is necessary to develop domain-specific models rather than general models, especially since the capacity to perform more accurate analyses is required by integrating heterogeneous data types. Our project focuses on achieving substantial advancements in these two critical domains by developing specialized multimodal large language models where multiple imaging procedures are used.

Healthcare is experiencing a significant transformation driven by advancements in artificial intelligence. Our project is dedicated to developing multimodal big language models specifically tailored for the medical sector. These models aim to enhance disease diagnosis, optimize administrative procedures, and facilitate more robust data-driven decision-making processes. In radiology, we are crafting novel multimodal models that will process a variety of radiological images and accompanying texts, thereby enhancing both diagnostic and therapeutic processes. This model will address the shortcomings of existing methods by seamlessly integrating in-context learning and multimodal tool use capabilities.

In remote sensing, our project aims to develop advanced multimodal big language models that can handle both optical and multispectral satellite imagery. These models are applicable across a diverse range of activities, from monitoring natural disasters and evaluating environmental impacts to the ongoing observation of agricultural and urban developments, as well as efficient resource management. Specifically designed to analyze data from both remote satellites and nearby ground-level imagery, these models pave the way for the next generation of remote sensing analysis tools. The integration of remote and near sensing capabilities will significantly improve decision-making processes in urban planning and environmental management, offering a more detailed and dynamic analysis of the data.

The extensive research activities planned in our project are not without their challenges. The foremost challenge is the difficulty of obtaining high-quality datasets necessary for training and evaluating multimodal big language models. To secure access to these crucial data resources, we plan to collaborate with various institutions and organizations operating in the fields of medicine and remote sensing. Another challenge is training and deploying these complex models on a large scale. To address this, we will explore cutting-edge, parameter-efficient training techniques.

Comprehensive analyses will be carried out, new evaluation criteria will be developed, and innovative benchmark tests will be designed to more clearly demonstrate the domain-specific capabilities of the developed models. These comprehensive tests will identify aspects where the large language models require enhancements and provide researchers with a roadmap for advancing model effectiveness in critical areas.

Related Publications No related publication found for this project.