R-Super: Scaling AI for Multi-Tumor Early Detection
This model is an implementation of R-Super, presented in the paper Learning Segmentation from Radiology Reports (MICCAI 2025, best paper award, runner-up).
Abstract
Early tumor detection save lives. Each year, more than 300 million computed tomography (CT) scans are performed worldwide, offering a vast opportunity for effective cancer screening. However, detecting small or early-stage tumors on these CT scans remains challenging, even for experts. Artificial intelligence (AI) models can assist by highlighting suspicious regions, but training such models typically requires extensive tumor masks--detailed, voxel-wise outlines of tumors manually drawn by radiologists. Drawing these masks is costly, requiring years of effort and millions of dollars. In contrast, nearly every CT scan in clinical practice is already accompanied by medical reports describing the tumor's size, number, appearance, and sometimes, pathology results--information that is rich, abundant, and often underutilized for AI training. We introduce R-Super, which trains AI to segment tumors that match their descriptions in medical reports. This approach scales AI training with large collections of readily available medical reports, substantially reducing the need for manually drawn tumor masks. When trained on 101,654 reports, AI models achieved performance comparable to those trained on 723 masks. Combining reports and masks further improved sensitivity by +13% and specificity by +8%, surpassing radiologists in detecting five of the seven tumor types. Notably, R-Super enabled segmentation of tumors in the spleen, gallbladder, prostate, bladder, uterus, and esophagus, for which no public masks or AI models previously existed. This study challenges the long-held belief that large-scale, labor-intensive tumor mask creation is indispensable, establishing a scalable and accessible path toward early detection across diverse tumor types. We plan to release our trained models, code, and dataset at this https URL
This model, trained for pancreatic and kidney lesion segmentation, implements the Report Supervision (R-Super) training methodology, which learns tumor segmentation directly from radiology reports (through new loss functions). This checkpoint was trained with 5K lesion reports from the UCSF dataset, plus 2K lesion masks from AbdomenAtlas 3.0 Beta. As far as we know, this is the public segmentation AI trained with the largest number of lesion CT scans (7K).
Training data: 16K CT scans
- 2,229 pancreatic tumor CT-Report pairs (UCSF)
- 2,738 kidney tumor CT-Report pairs (UCSF)
- 1,674 kidney tumor CT-Mask pairs (AbdomenAtlas 3.0)
- 344 pancreatic tumor CT-Mask pairs (AbdomenAtlas 3.0)
- 8,995 controls (CT scans without kidney or pancreas tumors)
Performance improvements are expected for models trained on the released version of AbdomenAtlas 3.0. For the ofifical release of AbdomenAtlas 3.0 (ICCV 2025), please check our GitHub: https://github.com/MrGiovanni/RadGPT. The AI model architecture is MedFormer, its training methology is Report Supervision (R-Super).
Training and inference code: https://github.com/MrGiovanni/R-Super
Label order
['adrenal_gland_left',
'adrenal_gland_right',
'aorta',
'bladder',
'celiac_trunk',
'colon',
'common_bile_duct',
'duodenum',
'esophagus',
'femur_left',
'femur_right',
'gall_bladder',
'hepatic_vessel',
'intestine',
'kidney_left',
'kidney_lesion',
'kidney_right',
'liver',
'liver_lesion',
'liver_segment_1',
'liver_segment_2',
'liver_segment_3',
'liver_segment_4',
'liver_segment_5',
'liver_segment_6',
'liver_segment_7',
'liver_segment_8',
'lung_left',
'lung_right',
'pancreas',
'pancreas_body',
'pancreas_head',
'pancreas_tail',
'pancreatic_lesion',
'portal_vein_and_splenic_vein',
'postcava',
'prostate',
'rectum',
'spleen',
'stomach',
'superior_mesenteric_artery',
'veins']
Papers
Learning Segmentation from Radiology Reports
Pedro R. A. S. Bassi, Wenxuan Li, Jieneng Chen, Zheren Zhu, Tianyu Lin, Sergio Decherchi, Andrea Cavalli, Kang Wang, Yang Yang, Alan Yuille, Zongwei Zhou*
Johns Hopkins University
MICCAI 2025
Best Paper Award Runner-up (top 2 in 1,027 papers)
Scaling Artificial Intelligence for Multi-Tumor Early Detection with More Reports, Fewer Masks
Pedro R. A. S. Bassi, Wenxuan Li, Jieneng Chen, Zheren Zhu, Tianyu Lin, Sergio Decherchi, Andrea Cavalli, Kang Wang, Yang Yang, Alan Yuille, Zongwei Zhou*
Johns Hopkins University
RadGPT: Constructing 3D Image-Text Tumor Datasets
Pedro R. A. S. Bassi, Mehmet Yavuz, Kang Wang, Sezgin Er, Ibrahim E. Hamamci, Wenxuan Li, Xiaoxi Chen, Sergio Decherchi, Andrea Cavalli, Yang Yang, Alan Yuille, Zongwei Zhou*
Johns Hopkins University
ICCV, 2025
Inference
0- Download and installation.
[Optional] Install Anaconda on Linux
wget https://repo.anaconda.com/archive/Anaconda3-2024.06-1-Linux-x86_64.sh
bash Anaconda3-2024.06-1-Linux-x86_64.sh -b -p ./anaconda3
./anaconda3/bin/conda init
source ~/.bashrc
git clone https://github.com/MrGiovanni/R-Super
cd R-Super/rsuper_train
conda create -n rsuper python=3.10
conda activate rsuper
pip install -r requirements.txt
pip install -U "huggingface_hub[cli]"
hf download AbdomenAtlas/R-SuperPancreasKidney --local-dir ./R-SuperPancreasKidney
1- Pre-processing. Prepare your dataset in the format below. You can use symlinks instead of copying your data.
Dataset format.
/path/to/dataset/
├── BDMAP_0000001
| └── ct.nii.gz
├── BDMAP_0000002
| └── ct.nii.gz
...
2- Inference. The code below will inference, generating binary segmentation masks. To save probabilities, add the argument --save_probabilities or --save_probabilities_lesions (which saves only probabilities for lesions, not for organs). The optional argument --organ_mask_on_lesion will use organ segmentations (produced by the R-Super model itself, not ground-truth) to remove tumor predictions outside its organ.
python predict_abdomenatlas.py --load R-SuperPancreasKidney/R_Super_Kidney_Pancreas_UCSF_Atlas3/fold_0_latest.pth --img_path /path/to/test/dataset/ --class_list R-SuperPancreasKidney/labels_rsuper_mask.yaml --save_path /path/to/inference/output/ --organ_mask_on_lesion
Argument Details
- load: path to the model checkpoint (fold_0_latest.pth)
- img_path: path to dataset
- class_list: a yaml file with the class names of your model
- save_path: path to output, where masks will be saved
- ids: this is an optional argument. By default, the code will predict on all cases in --img_path. If you pass ids, the code will only test with the CT scans indicated in ids. You can use this to separate a test set: --ids /path/to/test/set/ids.csv. The csv file must have a 'BDMAP ID' column with the ids of the test cases.
For more details, see https://github.com/MrGiovanni/R-Super/tree/main/rsuper_train#test
Citations
If you use this data, please cite the 3 paper below:
@article{bassi2025learning,
title={Learning Segmentation from Radiology Reports},
author={Bassi, Pedro RAS and Li, Wenxuan and Chen, Jieneng and Zhu, Zheren and Lin, Tianyu and Decherchi, Sergio and Cavalli, Andrea and Wang, Kang and Yang, Yang and Yuille, Alan L and others},
journal={arXiv preprint arXiv:2507.05582},
year={2025}
}
@article{bassi2025radgpt,
title={Radgpt: Constructing 3d image-text tumor datasets},
author={Bassi, Pedro RAS and Yavuz, Mehmet Can and Wang, Kang and Chen, Xiaoxi and Li, Wenxuan and Decherchi, Sergio and Cavalli, Andrea and Yang, Yang and Yuille, Alan and Zhou, Zongwei},
journal={arXiv preprint arXiv:2501.04678},
year={2025}
}
@misc{bassi2025scaling,
title={Scaling Artificial Intelligence for Multi-Tumor Early Detection with More Reports, Fewer Masks},
author={Pedro R. A. S. Bassi and Xinze Zhou and Wenxuan Li and Szymon Płotka and Jieneng Chen and Qi Chen and Zheren Zhu and Jakub Prządo and Ibrahim E. Hamacı and Sezgin Er and Yuhan Wang and Ashwin Kumar and Bjoern Menze and Jarosław B. Ćwikła and Yuyin Zhou and Akshay S. Chaudhari and Curtis P. Langlotz and Sergio Decherchi and Andrea Cavalli and Kang Wang and Yang Yang and Alan L. Yuille and Zongwei Zhou},
year={2025},
eprint={2510.14803},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2510.14803},
}
Acknowledgement
This work was supported by the Lustgarten Foundation for Pancreatic Cancer Research, the Patrick J. McGovern Foundation Award, and the National Institutes of Health (NIH) under Award Number R01EB037669. We would like to thank the Johns Hopkins Research IT team in IT@JH for their support and infrastructure resources where some of these analyses were conducted; especially DISCOVERY HPC. Paper content is covered by patents pending.