You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Log in or Sign Up to review the conditions and access this model content.

Attribution

Original Source:

Hany H. (2020). Chest CT-Scan Images Dataset. Kaggle.
https://www.kaggle.com/datasets/hanyhossam/chest-ctscan-images-dataset

Original License:

Database: Open Database Commons Open Database License (ODbL v1.0)
https://opendatacommons.org/licenses/odbl/1-0/

Derived Dataset Author:

Ashley Blackwell (2025). Chest CT-Scan Images (Cleaned, Derived from Hany et al.). Hugging Face Datasets.
https://huggingface.co/datasets/ashleyblackwell/lung-ct-cleaned-hany


Cleaning & Preprocessing Summary

The original dataset was processed and curated to ensure consistency, quality, and reproducibility for use in deep-learning experiments (i.e.., the EfficientNet-B0 Lung CT Classifier).

Steps Performed

  1. Integrity Checks: Removed corrupted or unreadable .jpg and .png files.
  2. Resolution Standardization: Resized all images to 224 Γ— 224 Γ— 3 pixels.
  3. Color Normalization: Converted grayscale scans to RGB format.
  4. Class Organization: Verified folder structure for four diagnostic categories:
    • Adenocarcinoma
    • Large-Cell Carcinoma
    • Squamous-Cell Carcinoma
    • Normal
  5. Stratified Splits:
    • Train: 70%
    • Validation: 20%
    • Test: 10%
  6. Metadata File: Generated metadata.csv containing filename, class label, and original resolution for traceability.

Dataset Overview

Split Approx. Images Notes
Train ~TODO Stratified by class
Validation ~TODO For hyperparameter tuning
Test ~TODO Final evaluation set
Total ~TODO All cleaned and standardized

Intended Use

  • Purpose:
    Designed for research, coursework, and educational demonstrations in medical image classification, model interpretability (Grad-CAM), and reproducible machine learning pipelines.

  • Out of Scope:
    This dataset must not be used for clinical diagnosis, treatment decisions, or commercial medical software development.


Legal & License Information

License

This dataset is distributed under the Open Data Commons Open Database License (ODbL v1.0).
You are free to:

  • Share: Copy, distribute, and use the database.
  • Create: Produce works from the database.
  • Adapt: Modify, transform, and build upon the database.

Full legal text:
https://opendatacommons.org/licenses/odbl/1-0/


Intended Use

  • Purpose:
    Designed for research, coursework, and educational demonstrations in medical image classification, model interpretability (Grad-CAM), and reproducible machine learning pipelines.

Scope

  • Intended: Research, UMGC coursework, model-interpretability demos (Grad-CAM), benchmarking.

Out-of-scope: Clinical diagnosis, patient triage, or any safety-critical application.

  • Model Architecture
  • Backbone: EfficientNet-B0 (ImageNet-initialized, fine-tuned)
  • Input size: 224 Γ— 224 Γ— 3
  • Head: GlobalAveragePooling β†’ Dropout (TODO: rate) β†’ Dense(4, softmax)
  • Loss: Categorical Cross-Entropy
  • Optimizer: TODO (e.g., Adam, lr = 1e-4 with decay)
  • Epochs / Batch size: TODO
  • Class labels (index): 0: Adenocarcinoma 1: Large-Cell Carcinoma 2: Squamous-Cell Carcinoma 3: Normal

Data & Preprocessing

Source: Derived from Hany Lung Cancer CT Scan dataset (Kaggle). Corrupted and irregular-resolution images were removed and all remaining images standardized to 224Γ—224. Split: Train/Val/Test = 70/20/10 (stratified). Transforms: Resize β†’ RGB conversion β†’ normalize to [0,1] or use preprocess_input. Artifacts logged: Confusion matrix, classification report, Grad-CAM overlays. Attribution: Credit original dataset per its license when sharing or publishing.


Evaluation

Test set size: TODO:N Metrics (macro): Accuracy, Precision, Recall, F1 Class Precision Recall F1 Support Adenocarcinoma TODO TODO TODO TODO Large-Cell TODO TODO TODO TODO Squamous TODO TODO TODO TODO Normal TODO TODO TODO TODO Macro Avg TODO TODO TODO N

Suggested Environment

tensorflow==2.15.0 keras==2.15.0 huggingface_hub>=0.23.0 numpy>=1.24


Explainability (Grad-CAM)

Last conv layer: top_conv for EfficientNet-B0. Tip: Use Grad-CAM to overlay heatmaps and validate that the model focuses on pathologically relevant regions.

Limitations, Bias & Ethical Considerations

Domain shift: CT protocols and scanners vary; may affect generalization.

Label noise: Community datasets can contain mislabels. Generalization: Model is not clinically validated. Mitigation: Use Grad-CAM audits and external validation before any applied use.


Training & Reproducibility

Hardware: TODO (e.g., NVIDIA T4 / A100 / local GPU). Training time: TODO Seed / Determinism: TODO Reproduction steps: TODO (link to notebook or script if available).

License

Model weights & code: CC BY-NC-SA 4.0 (non-commercial, share-alike, with attribution). Dataset (derived): Follow the original dataset’s license terms and provide credit to the creator.

Citation

If you use this model, please cite: Blackwell, A. (2025). EfficientNet-B0 Lung CT Classifier (4-class) [Computer software]. Hugging Face. https://huggingface.co/TODO @software{blackwell2025lungct, author = {Blackwell, Ashley}, title = {EfficientNet-B0 Lung CT Classifier (4-class)}, year = {2025}, publisher = {Hugging Face}, url = {https://huggingface.co/TODO} } πŸ‘©β€πŸ« Maintainers Ashley Blackwell β€” Questions and feedback welcome via the Hugging Face Discussions tab. πŸ—’ Changelog 2025-10-06: Initial public release (.keras weights), added model card, class map, and metric placeholders.


Citation

If you use this dataset, please cite both the original source and the derived version:

Original dataset:

Hany H. (2020). Chest CT-Scan Images Dataset. Kaggle.
https://www.kaggle.com/datasets/hanyhossam/chest-ctscan-images-dataset

Derived version:

Blackwell, A. (2025). Chest CT-Scan Images (Cleaned, Derived from Hany et al.) [Dataset]. Hugging Face.
https://huggingface.co/datasets/ashleyblackwell/lung-ct-cleaned-hany

@dataset{hany2020chestct,
  author    = {Hany, H.},
  title     = {Chest CT-Scan Images Dataset},
  year      = {2020},
  publisher = {Kaggle},
  url       = {https://www.kaggle.com/datasets/hanyhossam/chest-ctscan-images-dataset}
}

@dataset{blackwell2025lungctcleaned,
  author    = {Blackwell, Ashley},
  title     = {Chest CT-Scan Images (Cleaned, Derived from Hany et al.)},
  year      = {2025},
  publisher = {Hugging Face},
  url       = {https://huggingface.co/datasets/ashleyblackwell/lung-ct-cleaned-hany}
}

---

## How to Use (Load & Inference)
**Option A β€” Download from the Hub**
- from huggingface_hub import hf_hub_download
import json, numpy as np, tensorflow as tf
from tensorflow.keras.preprocessing import image

REPO_ID = "TODO:your-username/efficientnetb0-lung-ct-4class"

model_path = hf_hub_download(repo_id=REPO_ID, filename="model.keras")
class_map_path = hf_hub_download(repo_id=REPO_ID, filename="class_map.json")

model = tf.keras.models.load_model(model_path, compile=False)
with open(class_map_path) as f:
    idx_to_label = json.load(f)

def preprocess(img_path):
    img = image.load_img(img_path, target_size=(224, 224))
    x = image.img_to_array(img)
    x = np.expand_dims(x, 0)
    x = x / 255.0  # or use tf.keras.applications.efficientnet.preprocess_input(x)
    return x

x = preprocess("path/to/ct_slice.png")
probs = model.predict(x, verbose=0)[0]
for i, p in enumerate(probs):
    print(f"{idx_to_label[str(i)]}: {p:.3f}")
print("Predicted:", idx_to_label[str(int(np.argmax(probs)))])
**Option B β€” Snapshot Download (Local Folder)**
from huggingface_hub import snapshot_download
local_dir = snapshot_download(repo_id="TODO:your-username/efficientnetb0-lung-ct-4class")
# loads ./model.keras and ./class_map.json from local_dir

---
Downloads last month
-
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Space using ashbwell/efficientnetb0-ct 1

Evaluation results