File size: 3,940 Bytes
85d7f58
fb7bd9e
85d7f58
 
 
 
 
fb7bd9e
85d7f58
fb7bd9e
85d7f58
 
 
 
fb7bd9e
85d7f58
 
 
 
 
 
 
fb7bd9e
85d7f58
fb7bd9e
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
85d7f58
 
 
 
 
 
 
 
 
 
fb7bd9e
85d7f58
 
 
 
 
 
fb7bd9e
 
 
 
 
 
 
 
 
85d7f58
 
fb7bd9e
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
85d7f58
 
 
 
 
 
fb7bd9e
 
 
 
 
 
85d7f58
fb7bd9e
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
---
library_name: transformers
tags:
- object-detection
- grounding
- vision
- custom-dataset
- groundingdino
license: mit
pipeline_tag: object-detection
---

# Custom GroundingDINO Model

This is a custom trained GroundingDINO model for object detection and grounding, compatible with the Hugging Face Transformers library.

## Model Details

- **Model Type**: GroundingDINO
- **Number of Classes**: 1180
- **Training Dataset**: Custom dataset with 1180 object classes
- **Architecture**: GroundingDINO with Swin-T backbone
- **Transformers Compatible**: ✅ Yes

## Usage with Transformers

```python
from transformers import AutoModel, AutoConfig, AutoTokenizer
import torch
from PIL import Image

# Load model and config
model = AutoModel.from_pretrained("your_username/your_model_name")
config = AutoConfig.from_pretrained("your_username/your_model_name")

# Load label map
import json
with open("label_map.json", "r") as f:
    label_map = json.load(f)

# Prepare text prompt
text_prompt = ". ".join(list(label_map.values())[:100]) + "."

# Load and preprocess image
image = Image.open("your_image.jpg").convert("RGB")
# Add your image preprocessing here

# Run inference
with torch.no_grad():
    outputs = model(images=image, text_prompts=[text_prompt])
    logits = outputs.logits
    boxes = outputs.boxes
```

## Usage with Original Implementation

```python
from model_loader import ModelLoader, quick_inference

# Quick inference
results = quick_inference('your_image.jpg')

# Or load model manually
model = ModelLoader.load_model(
    checkpoint_path='pytorch_model.bin',
    config_path='original_config.py',
    device='cuda'
)

label_map = ModelLoader.load_label_map('label_map.json')
```

## Model Files

- `pytorch_model.bin`: Model weights (transformers format)
- `config.json`: Transformers configuration
- `modeling_groundingdino.py`: Custom model class
- `tokenizer_config.json`: Tokenizer configuration
- `label_map.json`: Class label mapping (1180 classes)
- `original_config.py`: Original training configuration

## Classes

This model can detect 1180 unique object classes including:

- blue and purple polka dot block
- blue and purple polka dot bowl
- blue and purple polka dot container
- blue and purple polka dot cross
- blue and purple polka dot diamond
- blue and purple polka dot flower
- blue and purple polka dot frame
- blue and purple polka dot heart
- blue and purple polka dot hexagon
- blue and purple polka dot l-shaped block
- blue and purple polka dot letter a
- blue and purple polka dot letter e
- blue and purple polka dot letter g
- blue and purple polka dot letter m
- blue and purple polka dot letter r
- blue and purple polka dot letter t
- blue and purple polka dot letter v
- blue and purple polka dot line
- blue and purple polka dot pallet
- blue and purple polka dot pan

... and 1160 more classes.


## Installation

```bash
pip install transformers torch torchvision
```

## Example Classes

The model can detect objects with various:
- **Colors**: blue, red, green, yellow, purple, etc.
- **Patterns**: polka dot, stripe, paisley, swirl, checkerboard
- **Shapes**: block, bowl, container, cross, diamond, flower
- **Combinations**: "blue and purple polka dot block", "red stripe heart"

## Performance

- **Model Size**: ~1.1 GB
- **Parameters**: ~172M
- **Training**: 12 epochs on custom dataset
- **Memory Usage**: ~2-4 GB GPU memory during inference

## Citation

If you use this model, please cite the original GroundingDINO paper:

```bibtex
@article{{liu2023grounding,
  title={{Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection}},
  author={{Liu, Shilong and Zeng, Zhaoyang and Ren, Tianhe and Li, Feng and Zhang, Hao and Yang, Jie and Li, Chunyuan and Yang, Jianwei and Su, Hang and Zhu, Jun and others}},
  journal={{arXiv preprint arXiv:2303.05499}},
  year={{2023}}
}}
```

## License

This model is released under the MIT License.