Can-SAVE: Deploying Low-Cost and Population-Scale Cancer Screening via Survival Analysis Variables and EHR
The source code to implement the feature engineering step of the Can-SAVE method.
Installation
git clone https://huggingface.co/ai-lab/Can-SAVE
cd CanSave
pip install -r requirements.txt
requirements.txt
pandas==1.5.3
numpy==1.23.2
lifelines==0.27.4
scikit-learn==1.1.3
scipy==1.10.0
PyYAML==6.0
openpyxl==3.0.10
Repository Structure
- Can-SAVE/: Core implementation
- EHR/: Simulated sample of EHR data
- survival_models/: Output directory for fitted models (Kaplan-Meier estimators and AFT model)
Can-SAVE/
βββ EHR/
β βββ id_26.csv
βββ survival_models/
β βββ kaplan_meier_both.pkl
β βββ kaplan_meier_males.pkl
β βββ kaplan_meier_females.pkl
β βββ aft.pkl
βββ CanSave.py
βββ Example_How_To_Train_Survival_Models.py
βββ KaplanMeierEstimator.py
βββ CONFIG_CanSave.yaml
βββ icd10_groups.xlsx
βββ requirements.txt
βββ LICENSE
βββ README.md
Quick Start
1) How to Train Survival Models
$ python Example_How_To_Train_Survival_Models.py
2) How to Do Feature Engineering for Can-SAVE
Terminal
$ python CanSave.py
Python
# required libraries
import numpy as np
import pandas as pd
from CanSave import CanSave
# entry point
if __name__ == '__main__':
# Make new object for feature engineering
config_path = './CONFIG_CanSave.yaml'
cs = CanSave(CONFIG_PATH=config_path)
print(help(cs))
# Load the patient's EHR
path_ehr = './EHR/id_26.csv'
ehr = pd.read_csv(path_ehr, sep=';').set_index('patient_id')
sex = ehr['sex'].iloc[0]
birth_date = ehr['birth_date'].iloc[0]
# Make feature engineering for the risk prediction
features = cs.feature_engineering(
sex = sex, # sex of the patient
birth_date = birth_date, # birth date of the patient
ehr = ehr, # Electronic Health Records of the patient
date_pred = '2022-01-01', # date of the risk estimation
deep_weeks = 108 # deep of the EHR's history (in weeks)
)
Citation
If you find the work useful, please cite our work:
@misc{philonenko2025,
title={Can-SAVE: Deploying Low-Cost and Population-Scale Cancer
Screening via Survival Analysis Variables and EHR},
author={Petr Philonenko and Vladimir Kokh and Pavel Blinov},
year={2025},
eprint={2309.15039},
archivePrefix={arXiv},
primaryClass={cs.LG},
url={https://arxiv.org/abs/2309.15039},
}
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
π
Ask for provider support