File size: 3,940 Bytes
85d7f58 fb7bd9e 85d7f58 fb7bd9e 85d7f58 fb7bd9e 85d7f58 fb7bd9e 85d7f58 fb7bd9e 85d7f58 fb7bd9e 85d7f58 fb7bd9e 85d7f58 fb7bd9e 85d7f58 fb7bd9e 85d7f58 fb7bd9e 85d7f58 fb7bd9e |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 |
---
library_name: transformers
tags:
- object-detection
- grounding
- vision
- custom-dataset
- groundingdino
license: mit
pipeline_tag: object-detection
---
# Custom GroundingDINO Model
This is a custom trained GroundingDINO model for object detection and grounding, compatible with the Hugging Face Transformers library.
## Model Details
- **Model Type**: GroundingDINO
- **Number of Classes**: 1180
- **Training Dataset**: Custom dataset with 1180 object classes
- **Architecture**: GroundingDINO with Swin-T backbone
- **Transformers Compatible**: ✅ Yes
## Usage with Transformers
```python
from transformers import AutoModel, AutoConfig, AutoTokenizer
import torch
from PIL import Image
# Load model and config
model = AutoModel.from_pretrained("your_username/your_model_name")
config = AutoConfig.from_pretrained("your_username/your_model_name")
# Load label map
import json
with open("label_map.json", "r") as f:
label_map = json.load(f)
# Prepare text prompt
text_prompt = ". ".join(list(label_map.values())[:100]) + "."
# Load and preprocess image
image = Image.open("your_image.jpg").convert("RGB")
# Add your image preprocessing here
# Run inference
with torch.no_grad():
outputs = model(images=image, text_prompts=[text_prompt])
logits = outputs.logits
boxes = outputs.boxes
```
## Usage with Original Implementation
```python
from model_loader import ModelLoader, quick_inference
# Quick inference
results = quick_inference('your_image.jpg')
# Or load model manually
model = ModelLoader.load_model(
checkpoint_path='pytorch_model.bin',
config_path='original_config.py',
device='cuda'
)
label_map = ModelLoader.load_label_map('label_map.json')
```
## Model Files
- `pytorch_model.bin`: Model weights (transformers format)
- `config.json`: Transformers configuration
- `modeling_groundingdino.py`: Custom model class
- `tokenizer_config.json`: Tokenizer configuration
- `label_map.json`: Class label mapping (1180 classes)
- `original_config.py`: Original training configuration
## Classes
This model can detect 1180 unique object classes including:
- blue and purple polka dot block
- blue and purple polka dot bowl
- blue and purple polka dot container
- blue and purple polka dot cross
- blue and purple polka dot diamond
- blue and purple polka dot flower
- blue and purple polka dot frame
- blue and purple polka dot heart
- blue and purple polka dot hexagon
- blue and purple polka dot l-shaped block
- blue and purple polka dot letter a
- blue and purple polka dot letter e
- blue and purple polka dot letter g
- blue and purple polka dot letter m
- blue and purple polka dot letter r
- blue and purple polka dot letter t
- blue and purple polka dot letter v
- blue and purple polka dot line
- blue and purple polka dot pallet
- blue and purple polka dot pan
... and 1160 more classes.
## Installation
```bash
pip install transformers torch torchvision
```
## Example Classes
The model can detect objects with various:
- **Colors**: blue, red, green, yellow, purple, etc.
- **Patterns**: polka dot, stripe, paisley, swirl, checkerboard
- **Shapes**: block, bowl, container, cross, diamond, flower
- **Combinations**: "blue and purple polka dot block", "red stripe heart"
## Performance
- **Model Size**: ~1.1 GB
- **Parameters**: ~172M
- **Training**: 12 epochs on custom dataset
- **Memory Usage**: ~2-4 GB GPU memory during inference
## Citation
If you use this model, please cite the original GroundingDINO paper:
```bibtex
@article{{liu2023grounding,
title={{Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection}},
author={{Liu, Shilong and Zeng, Zhaoyang and Ren, Tianhe and Li, Feng and Zhang, Hao and Yang, Jie and Li, Chunyuan and Yang, Jianwei and Su, Hang and Zhu, Jun and others}},
journal={{arXiv preprint arXiv:2303.05499}},
year={{2023}}
}}
```
## License
This model is released under the MIT License.
|