Open-vocabulary object detection (OVD). - a marcusinthesky Collection

marcusinthesky 's Collections

DS

Open-vocabulary object detection (OVD).

Multi-modal Mamba

Multimodal Embeddings

Tiny VLM Decoder

Decoder Upcycled to Embeddings

Open-vocabulary object detection (OVD).

updated Apr 13, 2024

Simple Open-Vocabulary Object Detection with Vision Transformers

Paper • 2205.06230 • Published May 12, 2022 • 4
google/owlvit-base-patch32

Zero-Shot Object Detection • 0.2B • Updated Dec 12, 2023 • 108k • 145

Note OWL-VIT is the seminal paper on Open-set object detection. It take a pretrained CLIP model, upsamples the model and ads detection box heads. The model is fully-finetuned while training these box heads.
Region-Aware Pretraining for Open-Vocabulary Object Detection with Vision Transformers

Paper • 2305.07011 • Published May 11, 2023 • 6
Multi-Modal Classifiers for Open-Vocabulary Object Detection

Paper • 2306.05493 • Published Jun 8, 2023 • 6
Scaling Open-Vocabulary Object Detection

Paper • 2306.09683 • Published Jun 16, 2023 • 16
Open-Vocabulary Universal Image Segmentation with MaskCLIP

Paper • 2208.08984 • Published Aug 18, 2022
Contrastive Feature Masking Open-Vocabulary Vision Transformer

Paper • 2309.00775 • Published Sep 2, 2023 • 10
RegionCLIP: Region-based Language-Image Pretraining

Paper • 2112.09106 • Published Dec 16, 2021
Side Adapter Network for Open-Vocabulary Semantic Segmentation

Paper • 2302.12242 • Published Feb 23, 2023
Transferable and Principled Efficiency for Open-Vocabulary Segmentation

Paper • 2404.07448 • Published Apr 11, 2024 • 12