facebook/ijepa_vith14_1k
Image Feature Extraction
•
0.6B
•
Updated
•
3.61k
•
15
None defined yet.
TV2TV: A Unified Framework for Interleaved Language and Video Generation
TUNA: Taming Unified Visual Representations for Native Unified Multimodal Models