Add comprehensive unified model card for m-hood collection
Browse files
README.md
ADDED
|
@@ -0,0 +1,201 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
license: mit
|
| 3 |
+
library_name: ultralytics
|
| 4 |
+
tags:
|
| 5 |
+
- object-detection
|
| 6 |
+
- computer-vision
|
| 7 |
+
- yolov10
|
| 8 |
+
- faster-rcnn
|
| 9 |
+
- pytorch
|
| 10 |
+
- bdd100k
|
| 11 |
+
- pascal-voc
|
| 12 |
+
- kitti
|
| 13 |
+
- autonomous-driving
|
| 14 |
+
- hallucination-mitigation
|
| 15 |
+
- out-of-distribution
|
| 16 |
+
- BDD 100K
|
| 17 |
+
- Pascal-VOC
|
| 18 |
+
pipeline_tag: object-detection
|
| 19 |
+
datasets:
|
| 20 |
+
- bdd100k
|
| 21 |
+
- pascal-voc
|
| 22 |
+
- kitti
|
| 23 |
+
widget:
|
| 24 |
+
- src: https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/bounding-boxes-sample.png
|
| 25 |
+
example_title: "Sample Image"
|
| 26 |
+
model-index:
|
| 27 |
+
- name: m-hood
|
| 28 |
+
results:
|
| 29 |
+
- task:
|
| 30 |
+
type: object-detection
|
| 31 |
+
dataset:
|
| 32 |
+
type: multi-dataset
|
| 33 |
+
name: BDD 100K, Pascal VOC, KITTI
|
| 34 |
+
metrics:
|
| 35 |
+
- type: mean_average_precision
|
| 36 |
+
name: mAP
|
| 37 |
+
value: "TBD"
|
| 38 |
+
---
|
| 39 |
+
|
| 40 |
+
# M-Hood: Multi-Dataset Object Detection Model Collection
|
| 41 |
+
|
| 42 |
+
**M-Hood** is a comprehensive collection of object detection models trained on multiple datasets using different architectures and training strategies. This unified repository contains both **YOLOv10** and **Faster R-CNN** models trained on **BDD 100K**, **Pascal VOC**, and **KITTI** datasets.
|
| 43 |
+
|
| 44 |
+
The collection includes both **vanilla models** (trained from scratch) and **fine-tuned models** specifically designed to **mitigate hallucination on out-of-distribution data**.
|
| 45 |
+
|
| 46 |
+
## π― Key Features
|
| 47 |
+
|
| 48 |
+
- **Dual Architecture Support**: Both YOLOv10 and Faster R-CNN models
|
| 49 |
+
- **Multi-Dataset Training**: BDD 100K, Pascal VOC, and KITTI datasets
|
| 50 |
+
- **Hallucination Mitigation**: Fine-tuned models for robust out-of-distribution performance
|
| 51 |
+
- **Real-world Applications**: Autonomous driving and general object detection
|
| 52 |
+
|
| 53 |
+
## π Model Performance Overview
|
| 54 |
+
|
| 55 |
+
### YOLOv10 Models
|
| 56 |
+
|
| 57 |
+
| Model | Dataset | Training Type | Size | Description | Download |
|
| 58 |
+
|-------|---------|---------------|------|-------------|----------|
|
| 59 |
+
| **yolov10-bdd-vanilla.pt** | BDD 100K | Vanilla | 62MB | Real-time detection for autonomous driving | [Download](./yolov10-bdd-vanilla.pt) |
|
| 60 |
+
| **yolov10-voc-vanilla.pt** | Pascal VOC | Vanilla | 63MB | General purpose object detection | [Download](./yolov10-voc-vanilla.pt) |
|
| 61 |
+
| **yolov10-kitti-vanilla.pt** | KITTI | Vanilla | 16MB | Lightweight autonomous driving detection | [Download](./yolov10-kitti-vanilla.pt) |
|
| 62 |
+
| **yolov10-bdd-finetune.pt** | BDD 100K | Fine-tuned | 62MB | OOD-robust autonomous driving detection | [Download](./yolov10-bdd-finetune.pt) |
|
| 63 |
+
| **yolov10-voc-finetune.pt** | Pascal VOC | Fine-tuned | 94MB | OOD-robust general object detection | [Download](./yolov10-voc-finetune.pt) |
|
| 64 |
+
| **yolov10-kitti-finetune.pt** | KITTI | Fine-tuned | 52MB | OOD-robust autonomous driving detection | [Download](./yolov10-kitti-finetune.pt) |
|
| 65 |
+
|
| 66 |
+
### Faster R-CNN Models
|
| 67 |
+
|
| 68 |
+
| Model | Dataset | Training Type | Size | Description | Download |
|
| 69 |
+
|-------|---------|---------------|------|-------------|----------|
|
| 70 |
+
| **faster-rcnn-bdd-vanilla.pth** | BDD 100K | Vanilla | 315MB | High-accuracy autonomous driving detection | [Download](./faster-rcnn-bdd-vanilla.pth) |
|
| 71 |
+
| **faster-rcnn-voc-vanilla.pth** | Pascal VOC | Vanilla | 315MB | High-accuracy general object detection | [Download](./faster-rcnn-voc-vanilla.pth) |
|
| 72 |
+
| **faster-rcnn-kitti-vanilla.pth** | KITTI | Vanilla | 315MB | High-accuracy autonomous driving detection | [Download](./faster-rcnn-kitti-vanilla.pth) |
|
| 73 |
+
| **faster-rcnn-bdd-finetune.pth** | BDD 100K | Fine-tuned | 158MB | OOD-robust high-accuracy detection | [Download](./faster-rcnn-bdd-finetune.pth) |
|
| 74 |
+
| **faster-rcnn-voc-finetune.pth** | Pascal VOC | Fine-tuned | 158MB | OOD-robust high-accuracy detection | [Download](./faster-rcnn-voc-finetune.pth) |
|
| 75 |
+
| **faster-rcnn-kitti-finetune.pth** | KITTI | Fine-tuned | 158MB | OOD-robust high-accuracy detection | [Download](./faster-rcnn-kitti-finetune.pth) |
|
| 76 |
+
|
| 77 |
+
## π Quick Start
|
| 78 |
+
|
| 79 |
+
### YOLOv10 Usage
|
| 80 |
+
|
| 81 |
+
```python
|
| 82 |
+
from ultralytics import YOLO
|
| 83 |
+
|
| 84 |
+
# Load a vanilla YOLOv10 model
|
| 85 |
+
model = YOLO('yolov10-bdd-vanilla.pt')
|
| 86 |
+
|
| 87 |
+
# Run inference
|
| 88 |
+
results = model('path/to/image.jpg')
|
| 89 |
+
|
| 90 |
+
# Process results
|
| 91 |
+
for result in results:
|
| 92 |
+
boxes = result.boxes.xyxy # bounding boxes
|
| 93 |
+
scores = result.boxes.conf # confidence scores
|
| 94 |
+
classes = result.boxes.cls # class predictions
|
| 95 |
+
```
|
| 96 |
+
|
| 97 |
+
### Faster R-CNN Usage
|
| 98 |
+
|
| 99 |
+
```python
|
| 100 |
+
import torch
|
| 101 |
+
|
| 102 |
+
# Load a Faster R-CNN model
|
| 103 |
+
model = torch.load('faster-rcnn-bdd-vanilla.pth')
|
| 104 |
+
model.eval()
|
| 105 |
+
|
| 106 |
+
# Run inference
|
| 107 |
+
with torch.no_grad():
|
| 108 |
+
predictions = model(image_tensor)
|
| 109 |
+
|
| 110 |
+
# Process results
|
| 111 |
+
boxes = predictions[0]['boxes']
|
| 112 |
+
scores = predictions[0]['scores']
|
| 113 |
+
labels = predictions[0]['labels']
|
| 114 |
+
```
|
| 115 |
+
|
| 116 |
+
## π― Fine-tuning Objective
|
| 117 |
+
|
| 118 |
+
The **fine-tuned models** in this collection have been specifically trained to **mitigate hallucination on out-of-distribution (OOD) data**. This means:
|
| 119 |
+
|
| 120 |
+
- **Improved Robustness**: Better performance when encountering images different from training distribution
|
| 121 |
+
- **Reduced False Positives**: Lower tendency to detect objects that aren't actually present
|
| 122 |
+
- **Enhanced Reliability**: More trustworthy predictions in real-world deployment scenarios
|
| 123 |
+
|
| 124 |
+
## π Dataset Information
|
| 125 |
+
|
| 126 |
+
### BDD 100K (Berkeley DeepDrive)
|
| 127 |
+
- **100,000+** driving images with diverse weather and lighting conditions
|
| 128 |
+
- **Object Classes**: car, truck, bus, motorcycle, bicycle, person, traffic light, traffic sign, train, rider
|
| 129 |
+
- **Application**: Autonomous driving scenarios
|
| 130 |
+
|
| 131 |
+
### Pascal VOC (Visual Object Classes)
|
| 132 |
+
- Standard benchmark dataset for object detection
|
| 133 |
+
- **20 Object Classes**: aeroplane, bicycle, bird, boat, bottle, bus, car, cat, chair, cow, diningtable, dog, horse, motorbike, person, pottedplant, sheep, sofa, train, tvmonitor
|
| 134 |
+
- **Application**: General computer vision applications
|
| 135 |
+
|
| 136 |
+
### KITTI Object Detection
|
| 137 |
+
- Real-world autonomous driving dataset
|
| 138 |
+
- **Object Classes**: car, pedestrian, cyclist
|
| 139 |
+
- **Application**: Autonomous driving with focus on urban scenarios
|
| 140 |
+
|
| 141 |
+
## ποΈ Architecture Comparison
|
| 142 |
+
|
| 143 |
+
### YOLOv10 (Real-time Detection)
|
| 144 |
+
- **Type**: Single-stage detector
|
| 145 |
+
- **Speed**: High (real-time inference)
|
| 146 |
+
- **Accuracy**: Good
|
| 147 |
+
- **Use Case**: Real-time applications, edge deployment
|
| 148 |
+
|
| 149 |
+
### Faster R-CNN (High-accuracy Detection)
|
| 150 |
+
- **Type**: Two-stage detector
|
| 151 |
+
- **Speed**: Moderate
|
| 152 |
+
- **Accuracy**: High
|
| 153 |
+
- **Use Case**: High-accuracy requirements, research applications
|
| 154 |
+
|
| 155 |
+
## π Model Selection Guide
|
| 156 |
+
|
| 157 |
+
| Use Case | Recommended Model | Reason |
|
| 158 |
+
|----------|-------------------|---------|
|
| 159 |
+
| **Real-time autonomous driving** | `yolov10-bdd-finetune.pt` | Fast + OOD robust + driving-specific |
|
| 160 |
+
| **High-accuracy autonomous driving** | `faster-rcnn-bdd-finetune.pth` | High accuracy + OOD robust + driving-specific |
|
| 161 |
+
| **General object detection (fast)** | `yolov10-voc-finetune.pt` | Fast + OOD robust + general purpose |
|
| 162 |
+
| **General object detection (accurate)** | `faster-rcnn-voc-finetune.pth` | High accuracy + OOD robust + general purpose |
|
| 163 |
+
| **Research/Baseline** | Any vanilla model | Standard training baseline |
|
| 164 |
+
|
| 165 |
+
## π¬ Research Applications
|
| 166 |
+
|
| 167 |
+
This model collection is particularly useful for research in:
|
| 168 |
+
- **Out-of-distribution detection**
|
| 169 |
+
- **Domain adaptation**
|
| 170 |
+
- **Robust object detection**
|
| 171 |
+
- **Autonomous driving perception**
|
| 172 |
+
- **Multi-dataset learning**
|
| 173 |
+
|
| 174 |
+
## π Citations
|
| 175 |
+
|
| 176 |
+
If you use these models in your research, please cite:
|
| 177 |
+
|
| 178 |
+
```bibtex
|
| 179 |
+
@article{yolov10,
|
| 180 |
+
title={YOLOv10: Real-Time End-to-End Object Detection},
|
| 181 |
+
author={Wang, Ao and Chen, Hui and Liu, Lihao and Chen, Kai and Lin, Zijia and Han, Jungong and Ding, Guiguang},
|
| 182 |
+
journal={arXiv preprint arXiv:2405.14458},
|
| 183 |
+
year={2024}
|
| 184 |
+
}
|
| 185 |
+
|
| 186 |
+
@article{ren2015faster,
|
| 187 |
+
title={Faster r-cnn: Towards real-time object detection with region proposal networks},
|
| 188 |
+
author={Ren, Shaoqing and He, Kaiming and Girshick, Ross and Sun, Jian},
|
| 189 |
+
journal={Advances in neural information processing systems},
|
| 190 |
+
volume={28},
|
| 191 |
+
year={2015}
|
| 192 |
+
}
|
| 193 |
+
```
|
| 194 |
+
|
| 195 |
+
## π License
|
| 196 |
+
|
| 197 |
+
This model collection is released under the MIT License.
|
| 198 |
+
|
| 199 |
+
## π·οΈ Keywords
|
| 200 |
+
|
| 201 |
+
Object Detection, Computer Vision, YOLOv10, Faster R-CNN, BDD 100K, Pascal-VOC, KITTI, Autonomous Driving, Hallucination Mitigation, Out-of-Distribution, Deep Learning, PyTorch
|