Any-to-Any
Transformers
Safetensors
English
xoron
multimodal
Mixture of Experts
text-to-image
image editing
image to video
text-to-video
video editing
text-to-speech
speech-to-text
speech-to-speech
image-to-text
video-to-text
agentic
tool-use
flow-matching
3d-rope
titok
vidtok
dual-stream-attention
zero-shot-voice-cloning
bigvgan
snake-activation
multi-receptive-field-fusion
custom_code
File size: 131 Bytes
091039b | 1 2 3 4 | version https://git-lfs.github.com/spec/v1
oid sha256:58d97993c85ab0fcc5c2fcba938c763696091bd193d73ff3dadedd6bf71a0f23
size 271268
|