You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Attribution

Original Source:

Hany H. (2020). Chest CT-Scan Images Dataset. Kaggle.
https://www.kaggle.com/datasets/hanyhossam/chest-ctscan-images-dataset

Original License:

Database: Open Database Commons Open Database License (ODbL v1.0)
https://opendatacommons.org/licenses/odbl/1-0/

Derived Dataset Author:

Ashley Blackwell (2025). Chest CT-Scan Images (Cleaned, Derived from Hany et al.). Hugging Face Datasets.
https://huggingface.co/datasets/ashleyblackwell/lung-ct-cleaned-hany

Cleaning & Preprocessing Summary

The original dataset was processed and curated to ensure consistency, quality, and reproducibility for use in deep-learning experiments (i.e.., the EfficientNet-B0 Lung CT Classifier).

Steps Performed

Integrity Checks: Removed corrupted or unreadable .jpg and .png files.
Resolution Standardization: Resized all images to 224 × 224 × 3 pixels.
Color Normalization: Converted grayscale scans to RGB format.
Class Organization: Verified folder structure for four diagnostic categories:
- Adenocarcinoma
- Large-Cell Carcinoma
- Squamous-Cell Carcinoma
- Normal
Stratified Splits:
- Train: 70%
- Validation: 20%
- Test: 10%
Metadata File: Generated metadata.csv containing filename, class label, and original resolution for traceability.

Dataset Overview

Split	Approx. Images	Notes
Train	~TODO	Stratified by class
Validation	~TODO	For hyperparameter tuning
Test	~TODO	Final evaluation set
Total	~TODO	All cleaned and standardized

Intended Use

Purpose:
Designed for research, coursework, and educational demonstrations in medical image classification, model interpretability (Grad-CAM), and reproducible machine learning pipelines.
Out of Scope:
This dataset must not be used for clinical diagnosis, treatment decisions, or commercial medical software development.

Legal & License Information

License

This dataset is distributed under the Open Data Commons Open Database License (ODbL v1.0).
You are free to:

Share: Copy, distribute, and use the database.
Create: Produce works from the database.
Adapt: Modify, transform, and build upon the database.

Full legal text:
https://opendatacommons.org/licenses/odbl/1-0/

Intended Use

Purpose:
Designed for research, coursework, and educational demonstrations in medical image classification, model interpretability (Grad-CAM), and reproducible machine learning pipelines.

Scope

Intended: Research, UMGC coursework, model-interpretability demos (Grad-CAM), benchmarking.

Out-of-scope: Clinical diagnosis, patient triage, or any safety-critical application.

Model Architecture
Backbone: EfficientNet-B0 (ImageNet-initialized, fine-tuned)
Input size: 224 × 224 × 3
Head: GlobalAveragePooling → Dropout (TODO: rate) → Dense(4, softmax)
Loss: Categorical Cross-Entropy
Optimizer: TODO (e.g., Adam, lr = 1e-4 with decay)
Epochs / Batch size: TODO
Class labels (index): 0: Adenocarcinoma 1: Large-Cell Carcinoma 2: Squamous-Cell Carcinoma 3: Normal

Data & Preprocessing

Source: Derived from Hany Lung Cancer CT Scan dataset (Kaggle). Corrupted and irregular-resolution images were removed and all remaining images standardized to 224×224. Split: Train/Val/Test = 70/20/10 (stratified). Transforms: Resize → RGB conversion → normalize to [0,1] or use preprocess_input. Artifacts logged: Confusion matrix, classification report, Grad-CAM overlays. Attribution: Credit original dataset per its license when sharing or publishing.

Evaluation

Test set size: TODO:N Metrics (macro): Accuracy, Precision, Recall, F1 Class Precision Recall F1 Support Adenocarcinoma TODO TODO TODO TODO Large-Cell TODO TODO TODO TODO Squamous TODO TODO TODO TODO Normal TODO TODO TODO TODO Macro Avg TODO TODO TODO N

Suggested Environment

tensorflow==2.15.0 keras==2.15.0 huggingface_hub>=0.23.0 numpy>=1.24

Explainability (Grad-CAM)

Last conv layer: top_conv for EfficientNet-B0. Tip: Use Grad-CAM to overlay heatmaps and validate that the model focuses on pathologically relevant regions.

Limitations, Bias & Ethical Considerations

Domain shift: CT protocols and scanners vary; may affect generalization.

Label noise: Community datasets can contain mislabels. Generalization: Model is not clinically validated. Mitigation: Use Grad-CAM audits and external validation before any applied use.

Training & Reproducibility

Hardware: TODO (e.g., NVIDIA T4 / A100 / local GPU). Training time: TODO Seed / Determinism: TODO Reproduction steps: TODO (link to notebook or script if available).

License

Model weights & code: CC BY-NC-SA 4.0 (non-commercial, share-alike, with attribution). Dataset (derived): Follow the original dataset’s license terms and provide credit to the creator.

Citation

If you use this model, please cite: Blackwell, A. (2025). EfficientNet-B0 Lung CT Classifier (4-class) [Computer software]. Hugging Face. https://huggingface.co/TODO @software{blackwell2025lungct, author = {Blackwell, Ashley}, title = {EfficientNet-B0 Lung CT Classifier (4-class)}, year = {2025}, publisher = {Hugging Face}, url = {https://huggingface.co/TODO} } 👩‍🏫 Maintainers Ashley Blackwell — Questions and feedback welcome via the Hugging Face Discussions tab. 🗒 Changelog 2025-10-06: Initial public release (.keras weights), added model card, class map, and metric placeholders.

Citation

If you use this dataset, please cite both the original source and the derived version:

Original dataset:

Hany H. (2020). Chest CT-Scan Images Dataset. Kaggle.
https://www.kaggle.com/datasets/hanyhossam/chest-ctscan-images-dataset

Derived version:

Blackwell, A. (2025). Chest CT-Scan Images (Cleaned, Derived from Hany et al.) [Dataset]. Hugging Face.
https://huggingface.co/datasets/ashleyblackwell/lung-ct-cleaned-hany

@dataset{hany2020chestct,
  author    = {Hany, H.},
  title     = {Chest CT-Scan Images Dataset},
  year      = {2020},
  publisher = {Kaggle},
  url       = {https://www.kaggle.com/datasets/hanyhossam/chest-ctscan-images-dataset}
}

@dataset{blackwell2025lungctcleaned,
  author    = {Blackwell, Ashley},
  title     = {Chest CT-Scan Images (Cleaned, Derived from Hany et al.)},
  year      = {2025},
  publisher = {Hugging Face},
  url       = {https://huggingface.co/datasets/ashleyblackwell/lung-ct-cleaned-hany}
}

---

## How to Use (Load & Inference)
**Option A — Download from the Hub**
- from huggingface_hub import hf_hub_download
import json, numpy as np, tensorflow as tf
from tensorflow.keras.preprocessing import image

REPO_ID = "TODO:your-username/efficientnetb0-lung-ct-4class"

model_path = hf_hub_download(repo_id=REPO_ID, filename="model.keras")
class_map_path = hf_hub_download(repo_id=REPO_ID, filename="class_map.json")

model = tf.keras.models.load_model(model_path, compile=False)
with open(class_map_path) as f:
    idx_to_label = json.load(f)

def preprocess(img_path):
    img = image.load_img(img_path, target_size=(224, 224))
    x = image.img_to_array(img)
    x = np.expand_dims(x, 0)
    x = x / 255.0  # or use tf.keras.applications.efficientnet.preprocess_input(x)
    return x

x = preprocess("path/to/ct_slice.png")
probs = model.predict(x, verbose=0)[0]
for i, p in enumerate(probs):
    print(f"{idx_to_label[str(i)]}: {p:.3f}")
print("Predicted:", idx_to_label[str(int(np.argmax(probs)))])
**Option B — Snapshot Download (Local Folder)**
from huggingface_hub import snapshot_download
local_dir = snapshot_download(repo_id="TODO:your-username/efficientnetb0-lung-ct-4class")
# loads ./model.keras and ./class_map.json from local_dir

---

Downloads last month: -

Space using ashbwell/efficientnetb0-ct 1

Evaluation results

accuracy on Hany Lung Cancer CT (derived; cleaned)
test set self-reported

TODO:0.XX
precision on Hany Lung Cancer CT (derived; cleaned)
test set self-reported

TODO:0.XX
recall on Hany Lung Cancer CT (derived; cleaned)
test set self-reported

TODO:0.XX
f1 on Hany Lung Cancer CT (derived; cleaned)
test set self-reported

TODO:0.XX