🧠 Temporal & Multi-CXR Chest X-ray Report Generation Model by HARI and MVL of Seoul National University Hospital

Welcome to the official repository of the Temporal & Multi-CXR Chest X-ray Report Generation Model developed by the Healthcare AI Research Institute (HARI) and Medical Vision Lab(MVL) at Seoul National University Hospital (SNUH).

This model generates chest X-ray (CXR) reports and is designed to leverage not only single-image inputs, but also multi-view CXRs (PA/AP/Lateral) and temporal pairs (current + prior). When available, it can additionally incorporate textual clinical context such as prior reports, indication, and time interval.

It is trained with instruction data tailored to different input configurations (current only / current + prior / current + prior + prior report), and applies report style constraints (structure, sentence count, temporal expressions, etc.) to reduce linguistic variation and encourage the model to focus more on clinically meaningful findings and temporal changes.

🚀 Model Overview

Model Name: snuh/mvl-rrg-1.0
Architecture: Large Multimodal Model (LMM)
Fine-tuning Objective: Radiology report generation
Primary Language: English
Domain: Chest X-ray
Performance: Achieves state-of-the-art performance on standard report generation benchmarks
Key Applications:
- Multi-view CXR inputs (PA/AP/Lateral)
- Temporal pairs CXR inputs (current + prior)
- Style-controlled report generation to reduce linguistic variance

📊 Training Data & Benchmark

This model was fine-tuned using a curated corpus of medical report generation data derived from publicly available, de-identified sources, including MIMIC-CXR and MIMIC-CXR reports. The training data focuses on radiology report generation from chest X-ray images.

Training Data Characteristics:
- Focused on generating radiology reports from chest X-ray images.
- Utilizes chest X-ray images and corresponding radiology reports from the MIMIC-CXR dataset.
- Incorporates longitudinal imaging data with two or more time points, enabling the model to understand sequential changes in patient conditions.
- Designed to reflect realistic radiological interpretation and documentation workflows.
- The current dataset consists of 80,136 training samples and 665 test samples, ensuring robust model training and evaluation.
- Samples in which the radiology report referenced a prior examination but no corresponding prior data could be mapped were excluded from the dataset.

Evaluation Scope and Benchmark Results

The reported benchmark results focus on current-only report generation, where each report is generated using a single, self-contained imaging context without explicit temporal inputs.

In medical imaging, this setting differs fundamentally from temporal (longitudinal) report generation, which requires reasoning over disease progression, treatment response, or follow-up changes. Temporal information can substantially alter clinical interpretation, even when surface-level imaging findings appear similar.

Accordingly, we distinguish between the following evaluation regimes:

Current-only evaluation
Single-image, single-context report generation.
All reported benchmark results are based on this setting.

Model ROUGE-L BLEU-1 BLEU-4 RadGraph F1 RadCliQ (↓)

Libra 25.6 33.0 9.1 24.5 0.92

MAIRA-2 29.9 44.7 14.9 34.7 1.27

mvl-rrg-1.0 34.1 44.6 18.6 34.9 1.23
Temporal evaluation (ongoing)
Time-aware report generation that incorporates prior imaging studies and longitudinal clinical changes.

Model Temporal RadGraph F1

Libra 54.8

MAIRA-2 52.5

mvl-rrg-1.0 79.9

Model	ROUGE-L	BLEU-1	BLEU-4	RadGraph F1	RadCliQ (↓)
Libra	25.6	33.0	9.1	24.5	0.92
MAIRA-2	29.9	44.7	14.9	34.7	1.27
mvl-rrg-1.0	34.1	44.6	18.6	34.9	1.23

Model	Temporal RadGraph F1
Libra	54.8
MAIRA-2	52.5
mvl-rrg-1.0	79.9

⚠️ These benchmarks are provided for research purposes only and do not imply clinical safety or efficacy.

🔐 Privacy & Ethical Compliance

We strictly adhere to ethical AI development and privacy protection:

✅ The model was trained exclusively on publicly available and de-identified data.
🔒 It does not include any real patient data or personally identifiable information (PII).
⚖️ Designed for safe, responsible, and research-oriented use in healthcare AI.

⚠️ This model is intended for research and educational purposes only and should not be used to make clinical decisions.

🏥 About HARI and MVL of Seoul National University Hospital

HARI – Healthcare AI Research Institute The Healthcare AI Research Institute (HARI) is a pioneering research group within Seoul National University Hospital, driving innovation in medical AI.

MVL - Medical Vison Lab The Medical Vison Lab (MVL) is a pioneering research group within Seoul National University Hospital, driving innovation in medical AI.

To develop AI technology-based applications that will aid doctors in fast and accurate diagnostic decisions helping patients have a comfortable life and eventually improve their life quality.

🌍 Vision & Mission

Vision: Shaping a sustainable and healthy future through pioneering AI research.
Mission:
- Develop clinically useful, trustworthy AI technologies.
- Foster cross-disciplinary collaboration in medicine and AI.
- Lead global healthcare AI commercialization and policy frameworks.
- Educate the next generation of AI-powered medical professionals.

🤝 Collaborate with Us

We welcome collaboration with:

AI research institutions and medical universities
Healthcare startups and technology partners
Policymakers shaping AI regulation in medicine
HARI 📧 Contact: [email protected]
🌐 Website: Seoul National University Hospital
MVL 📧 Contact: [email protected]
🌐 Website: Medical Vison Lab

🤗 Model Usage Example

from transformers import Qwen3VLForConditionalGeneration, AutoProcessor
import torch
from pathlib import Path
import os
from PIL import Image

# Load processor and model
model_name = "Qwen3VL_SNUH"

model = Qwen3VLForConditionalGeneration.from_pretrained(
    model_name,
    torch_dtype=torch.bfloat16,
    device_map="auto"
)
processor = AutoProcessor.from_pretrained(model_name)

# Image paths
current_frontal_image_path = "/**/current_frontal_image.png"
current_lateral_image_path = "/**/current_lateral_image.png"
prior_frontal_image_path = "/**/prior_frontal_image.png"

# Validate image paths exist
if current_frontal_image_path and not Path(current_frontal_image_path).exists():
    raise FileNotFoundError(f"Current frontal image file not found: {current_frontal_image_path}")
if current_lateral_image_path and not Path(current_lateral_image_path).exists():
    raise FileNotFoundError(f"Current lateral image file not found: {current_lateral_image_path}")
if prior_frontal_image_path and not Path(prior_frontal_image_path).exists():
    raise FileNotFoundError(f"Prior frontal image file not found: {prior_frontal_image_path}")

# Clinical context
prior_findings = "N/A"
prior_impression = "Developed pleural effusion, both\nInterval increased nodular opacity at LMLF"
indication = "F with chest pain // ?pna"
technique = "CHEST (PA AND LAT)"
comparison = "__."
time_interval = "1 month"

# Style attributes
findings_structure_type = "narrative_paragraph"
findings_temporal_comparison = "absent"
findings_sentence_count = 6
impression_structure_type = "narrative_paragraph"
impression_temporal_comparison = "absent"
impression_sentence_count = 1

# Instruction 
inputs_list = ["- Current frontal image: <image>"]
if current_lateral_image_path:
    inputs_list.append("- Current lateral image: <image>")
else:
    inputs_list.append("- Current lateral image: N/A")
if prior_frontal_image_path:
    inputs_list.append("- Prior frontal image: <image>")
else:
    inputs_list.append("- Prior frontal image: N/A")
inputs_list.extend([
    f"- Prior findings: {prior_findings}",
    f"- Prior impression: {prior_impression}"
])
inputs_text = "\n".join(inputs_list)

instruction = f"""You are an expert radiology assistant for chest X-ray (CXR) interpretation.

Inputs:
{inputs_text}

Clinical context:
- INDICATION: {indication}
- TECHNIQUE: {technique}
- COMPARISON: {comparison}
- TIME INTERVAL: {time_interval}
  (Time elapsed between the prior study date and the current study date)

Instructions:
1. Generate a chest X-ray report based on the current study.
2. Write a Findings section describing radiographic observations using standard clinical language.
3. Write an Impression section summarizing the key findings or overall assessment.
4. When applicable, include conditions related to CheXbert classes
   (e.g., cardiomegaly, lung opacity, pleural effusion, pneumothorax, pneumonia,
   support devices, or no acute abnormality).
5. If no significant abnormality is present, clearly state this.
6. Follow the provided style attributes exactly, applying them independently
   to the Findings and Impression sections:
   - Structure type controls the organizational pattern of the text.
   - Temporal comparison controls whether and how prior studies are referenced.
   - Sentence count controls the amount of text (small / medium / large).

Output format:
Return only a single JSON object with the following fields:

{{
  "findings": "<free-text radiology findings>",
  "impression": "<free-text radiology impression>"
}}

Style attributes:
- findings_structure_type: {findings_structure_type}
- findings_temporal_comparison: {findings_temporal_comparison}
- findings_sentence_count: {findings_sentence_count}
- impression_structure_type: {impression_structure_type}
- impression_temporal_comparison: {impression_temporal_comparison}
- impression_sentence_count: {impression_sentence_count}"""

content = []

# Current frontal image (always required)
current_frontal_image = Image.open(current_frontal_image_path)
content.append({
    "type": "images",
    "image": current_frontal_image,
})

# Current lateral image (optional)
if current_lateral_image_path:
    current_lateral_image = Image.open(current_lateral_image_path)
    content.append({
        "type": "images",
        "image": current_lateral_image,
    })

# Prior frontal image (optional)
if prior_frontal_image_path:
    prior_frontal_image = Image.open(prior_frontal_image_path)
    content.append({
        "type": "images",
        "image": prior_frontal_image,
    })

# Instruction
content.append({
    "type": "text",
    "text": instruction,
})

messages = [
    {
        "role": "user",
        "content": content,
    }
]

inputs = processor.apply_chat_template(
    messages,
    tokenize=True,
    add_generation_prompt=True,
    return_tensors="pt",
    return_dict=True,
)

inputs = {k: v.to(model.device) for k, v in inputs.items()}

with torch.no_grad():
    generated_ids = model.generate(
        **inputs,
        max_new_tokens=512
    )

prompt_len = inputs["input_ids"].shape[-1]
generated_ids_trimmed = generated_ids[:, prompt_len:]

response = processor.batch_decode(
    generated_ids_trimmed,
    skip_special_tokens=True,
    clean_up_tokenization_spaces=False,
)[0]

#result
print(response)

📄 License

Apache 2.0 License – Free for research and commercial use with attribution.

📢 Citation

If you use this model in your work, please cite:

@misc{mvl-rrg-1.0,
    title  = {mvl-rrg-1.0},
    url    = {https://huggingface.co/snuh/mvl-rrg-1.0},
    author = {Healthcare AI Research Institute(HARI) and Medical Vison Lab (MVL) of Seoul National University Hospital(SNUH)},
    month  = {January},
    year   = {2026}
}

🚀 Together, we are shaping the future of AI-driven healthcare.

Acknowlegments

This work was supported by Institute of Information & communications Technology Planning & Evaluation (IITP) grant funded by the Korea government(MSIT) (RS-2025-02653113, High-Performance Research AI Computing Infrastructure Support at the 2 PFLOPS Scale)

Downloads last month: 14

Safetensors

Model size

770k params

Tensor type

BF16