bytedance-research/HuMo
Image-to-Video
•
Updated
•
147
•
213
UMO based on OmniGen2
inpaint images using Qwen Image with inpainting Controlnet
Chat with a powerful language model
Detect objects in images and videos
Transcribe uploaded audio to text with language detection
Generate images from text prompts