HugoHE commited on
Commit
7699613
Β·
verified Β·
1 Parent(s): 81295b7

Add comprehensive unified model card for m-hood collection

Browse files
Files changed (1) hide show
  1. README.md +201 -0
README.md ADDED
@@ -0,0 +1,201 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: mit
3
+ library_name: ultralytics
4
+ tags:
5
+ - object-detection
6
+ - computer-vision
7
+ - yolov10
8
+ - faster-rcnn
9
+ - pytorch
10
+ - bdd100k
11
+ - pascal-voc
12
+ - kitti
13
+ - autonomous-driving
14
+ - hallucination-mitigation
15
+ - out-of-distribution
16
+ - BDD 100K
17
+ - Pascal-VOC
18
+ pipeline_tag: object-detection
19
+ datasets:
20
+ - bdd100k
21
+ - pascal-voc
22
+ - kitti
23
+ widget:
24
+ - src: https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/bounding-boxes-sample.png
25
+ example_title: "Sample Image"
26
+ model-index:
27
+ - name: m-hood
28
+ results:
29
+ - task:
30
+ type: object-detection
31
+ dataset:
32
+ type: multi-dataset
33
+ name: BDD 100K, Pascal VOC, KITTI
34
+ metrics:
35
+ - type: mean_average_precision
36
+ name: mAP
37
+ value: "TBD"
38
+ ---
39
+
40
+ # M-Hood: Multi-Dataset Object Detection Model Collection
41
+
42
+ **M-Hood** is a comprehensive collection of object detection models trained on multiple datasets using different architectures and training strategies. This unified repository contains both **YOLOv10** and **Faster R-CNN** models trained on **BDD 100K**, **Pascal VOC**, and **KITTI** datasets.
43
+
44
+ The collection includes both **vanilla models** (trained from scratch) and **fine-tuned models** specifically designed to **mitigate hallucination on out-of-distribution data**.
45
+
46
+ ## 🎯 Key Features
47
+
48
+ - **Dual Architecture Support**: Both YOLOv10 and Faster R-CNN models
49
+ - **Multi-Dataset Training**: BDD 100K, Pascal VOC, and KITTI datasets
50
+ - **Hallucination Mitigation**: Fine-tuned models for robust out-of-distribution performance
51
+ - **Real-world Applications**: Autonomous driving and general object detection
52
+
53
+ ## πŸ“Š Model Performance Overview
54
+
55
+ ### YOLOv10 Models
56
+
57
+ | Model | Dataset | Training Type | Size | Description | Download |
58
+ |-------|---------|---------------|------|-------------|----------|
59
+ | **yolov10-bdd-vanilla.pt** | BDD 100K | Vanilla | 62MB | Real-time detection for autonomous driving | [Download](./yolov10-bdd-vanilla.pt) |
60
+ | **yolov10-voc-vanilla.pt** | Pascal VOC | Vanilla | 63MB | General purpose object detection | [Download](./yolov10-voc-vanilla.pt) |
61
+ | **yolov10-kitti-vanilla.pt** | KITTI | Vanilla | 16MB | Lightweight autonomous driving detection | [Download](./yolov10-kitti-vanilla.pt) |
62
+ | **yolov10-bdd-finetune.pt** | BDD 100K | Fine-tuned | 62MB | OOD-robust autonomous driving detection | [Download](./yolov10-bdd-finetune.pt) |
63
+ | **yolov10-voc-finetune.pt** | Pascal VOC | Fine-tuned | 94MB | OOD-robust general object detection | [Download](./yolov10-voc-finetune.pt) |
64
+ | **yolov10-kitti-finetune.pt** | KITTI | Fine-tuned | 52MB | OOD-robust autonomous driving detection | [Download](./yolov10-kitti-finetune.pt) |
65
+
66
+ ### Faster R-CNN Models
67
+
68
+ | Model | Dataset | Training Type | Size | Description | Download |
69
+ |-------|---------|---------------|------|-------------|----------|
70
+ | **faster-rcnn-bdd-vanilla.pth** | BDD 100K | Vanilla | 315MB | High-accuracy autonomous driving detection | [Download](./faster-rcnn-bdd-vanilla.pth) |
71
+ | **faster-rcnn-voc-vanilla.pth** | Pascal VOC | Vanilla | 315MB | High-accuracy general object detection | [Download](./faster-rcnn-voc-vanilla.pth) |
72
+ | **faster-rcnn-kitti-vanilla.pth** | KITTI | Vanilla | 315MB | High-accuracy autonomous driving detection | [Download](./faster-rcnn-kitti-vanilla.pth) |
73
+ | **faster-rcnn-bdd-finetune.pth** | BDD 100K | Fine-tuned | 158MB | OOD-robust high-accuracy detection | [Download](./faster-rcnn-bdd-finetune.pth) |
74
+ | **faster-rcnn-voc-finetune.pth** | Pascal VOC | Fine-tuned | 158MB | OOD-robust high-accuracy detection | [Download](./faster-rcnn-voc-finetune.pth) |
75
+ | **faster-rcnn-kitti-finetune.pth** | KITTI | Fine-tuned | 158MB | OOD-robust high-accuracy detection | [Download](./faster-rcnn-kitti-finetune.pth) |
76
+
77
+ ## πŸš€ Quick Start
78
+
79
+ ### YOLOv10 Usage
80
+
81
+ ```python
82
+ from ultralytics import YOLO
83
+
84
+ # Load a vanilla YOLOv10 model
85
+ model = YOLO('yolov10-bdd-vanilla.pt')
86
+
87
+ # Run inference
88
+ results = model('path/to/image.jpg')
89
+
90
+ # Process results
91
+ for result in results:
92
+ boxes = result.boxes.xyxy # bounding boxes
93
+ scores = result.boxes.conf # confidence scores
94
+ classes = result.boxes.cls # class predictions
95
+ ```
96
+
97
+ ### Faster R-CNN Usage
98
+
99
+ ```python
100
+ import torch
101
+
102
+ # Load a Faster R-CNN model
103
+ model = torch.load('faster-rcnn-bdd-vanilla.pth')
104
+ model.eval()
105
+
106
+ # Run inference
107
+ with torch.no_grad():
108
+ predictions = model(image_tensor)
109
+
110
+ # Process results
111
+ boxes = predictions[0]['boxes']
112
+ scores = predictions[0]['scores']
113
+ labels = predictions[0]['labels']
114
+ ```
115
+
116
+ ## 🎯 Fine-tuning Objective
117
+
118
+ The **fine-tuned models** in this collection have been specifically trained to **mitigate hallucination on out-of-distribution (OOD) data**. This means:
119
+
120
+ - **Improved Robustness**: Better performance when encountering images different from training distribution
121
+ - **Reduced False Positives**: Lower tendency to detect objects that aren't actually present
122
+ - **Enhanced Reliability**: More trustworthy predictions in real-world deployment scenarios
123
+
124
+ ## πŸ“ Dataset Information
125
+
126
+ ### BDD 100K (Berkeley DeepDrive)
127
+ - **100,000+** driving images with diverse weather and lighting conditions
128
+ - **Object Classes**: car, truck, bus, motorcycle, bicycle, person, traffic light, traffic sign, train, rider
129
+ - **Application**: Autonomous driving scenarios
130
+
131
+ ### Pascal VOC (Visual Object Classes)
132
+ - Standard benchmark dataset for object detection
133
+ - **20 Object Classes**: aeroplane, bicycle, bird, boat, bottle, bus, car, cat, chair, cow, diningtable, dog, horse, motorbike, person, pottedplant, sheep, sofa, train, tvmonitor
134
+ - **Application**: General computer vision applications
135
+
136
+ ### KITTI Object Detection
137
+ - Real-world autonomous driving dataset
138
+ - **Object Classes**: car, pedestrian, cyclist
139
+ - **Application**: Autonomous driving with focus on urban scenarios
140
+
141
+ ## πŸ—οΈ Architecture Comparison
142
+
143
+ ### YOLOv10 (Real-time Detection)
144
+ - **Type**: Single-stage detector
145
+ - **Speed**: High (real-time inference)
146
+ - **Accuracy**: Good
147
+ - **Use Case**: Real-time applications, edge deployment
148
+
149
+ ### Faster R-CNN (High-accuracy Detection)
150
+ - **Type**: Two-stage detector
151
+ - **Speed**: Moderate
152
+ - **Accuracy**: High
153
+ - **Use Case**: High-accuracy requirements, research applications
154
+
155
+ ## πŸ“ˆ Model Selection Guide
156
+
157
+ | Use Case | Recommended Model | Reason |
158
+ |----------|-------------------|---------|
159
+ | **Real-time autonomous driving** | `yolov10-bdd-finetune.pt` | Fast + OOD robust + driving-specific |
160
+ | **High-accuracy autonomous driving** | `faster-rcnn-bdd-finetune.pth` | High accuracy + OOD robust + driving-specific |
161
+ | **General object detection (fast)** | `yolov10-voc-finetune.pt` | Fast + OOD robust + general purpose |
162
+ | **General object detection (accurate)** | `faster-rcnn-voc-finetune.pth` | High accuracy + OOD robust + general purpose |
163
+ | **Research/Baseline** | Any vanilla model | Standard training baseline |
164
+
165
+ ## πŸ”¬ Research Applications
166
+
167
+ This model collection is particularly useful for research in:
168
+ - **Out-of-distribution detection**
169
+ - **Domain adaptation**
170
+ - **Robust object detection**
171
+ - **Autonomous driving perception**
172
+ - **Multi-dataset learning**
173
+
174
+ ## πŸ“„ Citations
175
+
176
+ If you use these models in your research, please cite:
177
+
178
+ ```bibtex
179
+ @article{yolov10,
180
+ title={YOLOv10: Real-Time End-to-End Object Detection},
181
+ author={Wang, Ao and Chen, Hui and Liu, Lihao and Chen, Kai and Lin, Zijia and Han, Jungong and Ding, Guiguang},
182
+ journal={arXiv preprint arXiv:2405.14458},
183
+ year={2024}
184
+ }
185
+
186
+ @article{ren2015faster,
187
+ title={Faster r-cnn: Towards real-time object detection with region proposal networks},
188
+ author={Ren, Shaoqing and He, Kaiming and Girshick, Ross and Sun, Jian},
189
+ journal={Advances in neural information processing systems},
190
+ volume={28},
191
+ year={2015}
192
+ }
193
+ ```
194
+
195
+ ## πŸ“œ License
196
+
197
+ This model collection is released under the MIT License.
198
+
199
+ ## 🏷️ Keywords
200
+
201
+ Object Detection, Computer Vision, YOLOv10, Faster R-CNN, BDD 100K, Pascal-VOC, KITTI, Autonomous Driving, Hallucination Mitigation, Out-of-Distribution, Deep Learning, PyTorch