Multimodal AI In Development

Vision-Language Medical Models

CLIP-based Architecture for Medical Imaging and Report Generation

Aydin Ayanzadeh

UMBC

2024 - Present

Overview

This project develops custom CLIP-based architectures specifically designed for medical imaging applications, enabling zero-shot classification and automated report generation.

By adapting vision-language models to the medical domain, the system can understand and describe medical images in natural language, assist radiologists in report writing, and provide preliminary diagnoses without task-specific fine-tuning.

Key Features

Zero-Shot Classification

Classify medical conditions without explicit training on specific diseases.

Report Generation

Automatically generate preliminary radiology reports from medical images.

Multimodal Understanding

Joint understanding of medical images and clinical text.

Clinical Integration

Designed for integration into clinical workflows.

Technologies Used

PyTorch Transformers CLIP Vision Transformers BERT Hugging Face MIMIC-CXR

Interested in Multimodal Medical AI?

Let's discuss vision-language models and their applications in healthcare.

Get in Touch