Instructions to use ostris/CLIP-ViT-H-14-448 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use ostris/CLIP-ViT-H-14-448 with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("zero-shot-image-classification", model="ostris/CLIP-ViT-H-14-448") pipe( "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/hub/parrots.png", candidate_labels=["animals", "humans", "landscape"], )# Load model directly from transformers import AutoTokenizer, CLIPVisionModelWithProjection tokenizer = AutoTokenizer.from_pretrained("ostris/CLIP-ViT-H-14-448") model = CLIPVisionModelWithProjection.from_pretrained("ostris/CLIP-ViT-H-14-448") - Notebooks
- Google Colab
- Kaggle
You probably do not need this unless you are training your own IP Adapters.
Modified version of the vision encoder of CLIP-ViT-H-14-laion2B-s32B-b79K to handle 448 x 448 inputs vs the original 224 x 224 inputs. It will probbaly not work for classification (as is), but will DIP work for for IP+ adapters that use CLIP-ViT-H, though they will need to be fine tuned a little more.
Hidden layer outputs go from (257, 1280) to (1025, 1280), which can be digested by the Resampler without modification or weight resizing.
- Downloads last month
- 9