Simple Open-Vocabulary Object Detection with Vision Transformers
Paper
• 2205.06230 • Published
• 4
Note OWL-VIT is the seminal paper on Open-set object detection. It take a pretrained CLIP model, upsamples the model and ads detection box heads. The model is fully-finetuned while training these box heads.