CoVT Checkpoint (Segmentation, Depth, and DINO Aligned)

Checkpoint of https://huggingface.co/papers/2511.19418.

Model Description

This CoVT checkpoint is aligned with 8 Segmentation tokens, 4 Depth tokens, and 4 DINO tokens.
These task-specific tokens are integrated into the model’s embedding space to enhance 2D-awareness, 3D-awareness, and patch-level feature representations.

Downloads last month: 140

Safetensors

Model size

8B params

Tensor type

BF16

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Collection including Wakals/CoVT-7B-seg_depth_dino

CoVT: Chain-of-Visual-Thought

Collection

Enrich VLMs’ vision-centric reasoning capabilities via Chain-of-Visual-Thought! • 7 items • Updated 13 days ago • 5