Vision Transformer (ViT) trained using DINOv2 on ImageNet-1K only

Reproduction of the ViT-L/16 results from the DINOv2 repo (which uses only ImageNet-1K in 224x224 resolution).

The original work uses the much larger LVD142M dataset and distills a larger model (g/14) into a L/14 model.

How to use

import torch

model = torch.hub.load("BenediktAlkin/torchhub-ssl", "in1k_dinov2_l16")
image = torch.randn(1, 3, 224, 224)
features = model(image)
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Dataset used to train BenediktAlkin/DINOv2