|
|
--- |
|
|
base_model: |
|
|
- Wan-AI/Wan2.1-T2V-14B |
|
|
license: cc-by-nc-sa-4.0 |
|
|
pipeline_tag: text-to-video |
|
|
library_name: diffusers |
|
|
tags: |
|
|
- text-to-video |
|
|
- diffusion |
|
|
- merged-model |
|
|
- video-generation |
|
|
- wan2.1 |
|
|
widget: |
|
|
- text: 'Prompt: A gritty close-up of an elven princess kneeling in a rocky ravine, |
|
|
calming a wounded, desert dragon. Its scales are cracked, dry, She wears a crimson |
|
|
sash over bone-colored armor, her auburn hair half-tied back. The camera dollies |
|
|
in rapidly as she reaches for its eye ridge. Lighting comes from golden sunlight |
|
|
reflecting off surrounding rock, casting a warm, earthy hue with no artificial |
|
|
glow.' |
|
|
output: |
|
|
url: videos/Video_00063.mp4 |
|
|
- text: 'Prompt: Tight close-up of her smiling lips and sparkling eyes, catching golden |
|
|
hour sunlight. She wears a white sundress with floral prints and a wide-brimmed |
|
|
straw hat. Camera pulls back in a dolly motion, revealing her twirling under a |
|
|
cherry blossom tree. Petals flutter in the air, casting playful shadows. Soft |
|
|
lens flares enhance the euphoric, dreamlike vibe. (Before vs After — Left: Wan2.1 |
|
|
| Right: Merged model Wan14BT2V_MasterModel)' |
|
|
output: |
|
|
url: videos/AnimateDiff_00001.mp4 |
|
|
- text: 'Prompt: A gritty close-up of a dwarven beastmaster’s face, his grey beard |
|
|
braided tightly, brows furrowed as he looks just off-camera. The camera dollies |
|
|
out over his shoulder, revealing a perched gryphon watching him from a boulder, |
|
|
its feathers rustling slightly in the breeze. The moment holds stillness and mutual |
|
|
trust. Lighting is early daylight, clean and sharp with strong environmental clarity.' |
|
|
output: |
|
|
url: videos/FusionX_00012.mp4 |
|
|
- text: 'Prompt: A gritty close-up of a jungle tracker crouching low, face flushed |
|
|
with focus as she watches a perched macaw a few feet ahead. Her cheek twitches |
|
|
as she shifts forward, beads of sweat visible on her brow. The camera slowly dollies |
|
|
in from below her line of sight, capturing the moment her eyes widen in fascination. |
|
|
Lighting is rich and directional from above, creating a warm glow over her face |
|
|
with minimal shadows.' |
|
|
output: |
|
|
url: videos/FusionX_00005.mp4 |
|
|
- text: 'Prompt: A gritty close-up of a battle-worn ranger kneeling in a scorched |
|
|
clearing, calming a wounded gryphon whose wing is torn and bloodied. Its feathers |
|
|
are dusky bronze with streaks of ash-gray. She wears soot-covered hunter green |
|
|
armor, her blonde hair pulled into a loose braid. The camera dollies in as her |
|
|
hand brushes the creature''s sharp beak. Lighting comes from late afternoon sun |
|
|
filtering through smoke, casting a burnt-orange haze across the frame.' |
|
|
output: |
|
|
url: videos/Video_00069.mp4 |
|
|
--- |
|
|
|
|
|
# 🌀 Wan2.1_14B_FusionX |
|
|
|
|
|
This model, Wan2.1_14B_FusionX, incorporates advancements from the research on [Radial Attention: $O(n\log n)$ Sparse Attention with Energy Decay for Long Video Generation](https://huggingface.co/papers/2506.19852). |
|
|
|
|
|
Project Page: https://hanlab.mit.edu/projects/radial-attention |
|
|
Code: https://github.com/mit-han-lab/radial-attention |
|
|
|
|
|
**High-Performance Merged Text-to-Video Model** |
|
|
Built on WAN 2.1 and fused with research-grade components for cinematic motion, detail, and speed — optimized for ComfyUI and rapid iteration in as few as 6 steps. |
|
|
|
|
|
Merged models for faster, richer motion & detail — high performance even at just 8 steps. |
|
|
|
|
|
> 📌 Important: To match the quality shown here, use the linked workflows or make sure to follow the recommended settings outlined below. |
|
|
|
|
|
--- |
|
|
|
|
|
## 🚀 Overview |
|
|
|
|
|
A powerful text-to-video model built on top of **WAN 2.1 14B**, merged with several research-grade models to boost: |
|
|
|
|
|
- Motion quality |
|
|
- Scene consistency |
|
|
- Visual detail |
|
|
|
|
|
Comparable with closed-source solutions, but open and optimized for **ComfyUI** workflows. |
|
|
|
|
|
--- |
|
|
|
|
|
## 💡 Inside the Fusion |
|
|
|
|
|
This model is made up of the following which is on TOP of Wan 2.1 14B 720p(FusionX would not be what it is without these Models): |
|
|
|
|
|
- **CausVid** – [Causal motion modeling for better flow and dynamics](https://github.com/tianweiy/CausVid) |
|
|
- **AccVideo** – [Better temporal alignment and speed boost](https://github.com/aejion/AccVideo) |
|
|
- **MoviiGen1.1** – [Cinematic smoothness and lighting](https://huggingface.co/ZuluVision/MoviiGen1.1) |
|
|
- **MPS Reward LoRA** – [Tuned for motion and detail](https://huggingface.co/alibaba-pai/Wan2.1-Fun-Reward-LoRAs) |
|
|
- **Custom LoRAs** – For texture, clarity, and small detail enhancements (Set at a very low level) |
|
|
|
|
|
All merged models are provided for research and non-commercial use only. |
|
|
Some components are subject to licenses such as CC BY-NC-SA 4.0, and do not fall under permissive licenses like Apache 2.0 or MIT. |
|
|
Please refer to each model’s original license for full usage terms. |
|
|
|
|
|
--- |
|
|
|
|
|
## 🚨✨**Hey guys! Just a quick update!** |
|
|
|
|
|
We finally cooked up **FusionX LoRAs**!! 🧠💥 |
|
|
This is huge – now you can plug FusionX into your favorite workflows as a LoRA on top of the Wan base models and SkyReels models!🔌💫 |
|
|
You can still stick with the base FusionX Model if you already use it, but if you would rather have more control over the "FusionX" strength and a speed boost, then this might be for you. |
|
|
|
|
|
Oh, and there’s a **nice speed boost** too! ⚡ |
|
|
**Example:** *(RTX 5090)* |
|
|
- FusionX as a full base model: **8 steps = 160s** ⏱️ |
|
|
- FusionX as a **LoRA on Wan 2.1 14B fp8 T2V**: **8 steps = 120s** 🚀 |
|
|
|
|
|
**Bonus:** You can bump up the FusionX LoRA strength and lower your steps for a **huge speed boost** while testing/drafting. |
|
|
Example: strength `2.00` with `3 steps` takes `72 seconds`. |
|
|
Or lower the strength to experiment with a **less “FusionX” look**. ⚡🔍 |
|
|
|
|
|
We’ve got: |
|
|
- **T2V (Text to Video)** 🎬 – works perfectly with **VACE** ⚙️ |
|
|
- **I2V (Image to Video)** 🖼️➡️📽️ |
|
|
- A dedicated **Phantom LoRA** 👻 |
|
|
The new LoRA's are [HERE](https://huggingface.co/vrgamedevgirl84/Wan14BT2VFusioniX/tree/main/FusionX_LoRa) |
|
|
Note: The LoRa's are not meant to be put on top of the FusionX main models and instead you would use them with the Wan base models. |
|
|
**New workflows** are [HERE](https://civitai.com/models/1681541) 🛠️🚀 |
|
|
|
|
|
--- |
|
|
|
|
|
After lots of testing 🧪, the video quality with the LoRA is **just as good** (and sometimes **even better**! 💯) |
|
|
That’s thanks to it being trained on the **fp16 version** of FusionX 🧬💎 |
|
|
|
|
|
--- |
|
|
|
|
|
### 🌀 Preview Gallery |
|
|
*These are compressed GIF previews for quick viewing — final video outputs are higher quality.* |
|
|
|
|
|
 |
|
|
 |
|
|
 |
|
|
 |
|
|
 |
|
|
 |
|
|
 |
|
|
 |
|
|
 |
|
|
 |
|
|
 |
|
|
 |
|
|
|
|
|
--- |
|
|
|
|
|
|
|
|
## 📂 Workflows & Model Downloads |
|
|
|
|
|
- 💡 **ComfyUI workflows** can be found here: |
|
|
👉 [Workflow Collection (WIP)](https://civitai.com/models/1663553) |
|
|
|
|
|
- 📦 **Model files (T2V, I2V, Phantom, VACE)**: |
|
|
👉 [Main Hugging Face Repo](https://huggingface.co/vrgamedevgirl84/Wan14BT2VFusioniX/tree/main) |
|
|
|
|
|
### 🧠 GGUF Variants: |
|
|
- 🖼️ [FusionX Image-to-Video (GGUF)](https://huggingface.co/QuantStack/Wan2.1_I2V_14B_FusionX-GGUF/tree/main) |
|
|
- 🎥 [FusionX Text-to-Video (GGUF)](https://huggingface.co/QuantStack/Wan2.1_T2V_14B_FusionX-GGUF/tree/main) |
|
|
- 🎞️ [FusionX T2V VACE (for native)](https://huggingface.co/QuantStack/Wan2.1_T2V_14B_FusionX_VACE-GGUF/tree/main) |
|
|
- 👻 [FusionX Phantom](https://huggingface.co/QuantStack/Phantom_Wan_14B_FusionX-GGUF/tree/main) |
|
|
|
|
|
--- |
|
|
## 🎬 Example Videos |
|
|
|
|
|
Want to see what FusionX can do? Check out these real outputs generated using the latest workflows and settings: |
|
|
|
|
|
- **Text-to-Video** |
|
|
👉 [Watch Examples](https://civitai.com/posts/17874424) |
|
|
|
|
|
- **Image-to-Video** |
|
|
👉 [Watch Examples](https://civitai.com/posts/18029174) |
|
|
|
|
|
- **Phantom Mode** |
|
|
👉 [Watch Examples](https://civitai.com/posts/17986906) |
|
|
|
|
|
- **VACE Integration** |
|
|
👉 [Watch Examples](https://civitai.com/posts/18080876) |
|
|
|
|
|
--- |
|
|
|
|
|
## 🔧 Usage Details |
|
|
|
|
|
### Text-to-Video |
|
|
|
|
|
- **CGF**: Must be set to `1` |
|
|
- **Shift**: |
|
|
- `1024x576`: Start at `1` |
|
|
- `1080x720`: Start at `2` |
|
|
- For realism → lower values |
|
|
- For stylized → test `3–9` |
|
|
- **Scheduler**: |
|
|
- Recommended: `uni_pc` |
|
|
- Alternative: `flowmatch_causvid` (better for some details) |
|
|
|
|
|
### Image-to-Video |
|
|
|
|
|
- **CGF**: `1` |
|
|
- **Shift**: `2` works best in most cases |
|
|
- **Scheduler**: |
|
|
- Recommended: `dmp++_sde/beta` |
|
|
- To boost motion and reduce slow-mo effect: |
|
|
- Frame count: `121` |
|
|
- FPS: `24` |
|
|
|
|
|
--- |
|
|
|
|
|
## 🛠 Technical Notes |
|
|
|
|
|
- Works in as few as **6 steps** |
|
|
- Best quality at **8–10 steps** |
|
|
- Drop-in replacement for `Wan2.1-T2V-14B` |
|
|
- Up to **50% faster rendering**, especially with **SageAttn** |
|
|
- Works natively and with **Kaji Wan Wrapper** |
|
|
[Wrapper GitHub](https://github.com/kijai/ComfyUI-WanVideoWrapper) |
|
|
- Do **not** re-add merged LoRAs (CausVid, AccVideo, MPS) |
|
|
- Feel free to add **other LoRAs** for style/variation |
|
|
- Native WAN workflows also supported (slightly slower) |
|
|
|
|
|
--- |
|
|
|
|
|
## 🧪 Performance Tips |
|
|
|
|
|
- RTX 5090 → ~138 sec/video at 1024x576 / 81 frames |
|
|
- If VRAM is limited: |
|
|
- Enable block swapping |
|
|
- Start with `5` blocks and adjust as needed |
|
|
- Use **SageAttn** for ~30% speedup (wrapper only) |
|
|
- Do **not** use `teacache` |
|
|
- "Enhance a video" (tested): Adds vibrance (try values 2–4) |
|
|
- "SLG" not tested — feel free to explore |
|
|
|
|
|
--- |
|
|
|
|
|
## 🧠 Prompt Help |
|
|
|
|
|
Want better cinematic prompts? Try the **WAN Cinematic Video Prompt Generator GPT** — it adds visual richness and makes a big difference in quality. [Download Here](https://chatgpt.com/g/g-67c3a6d6d19c81919b3247d2bfd01d0b-wan-cinematic-video-prompt-generator) |
|
|
|
|
|
--- |
|
|
|
|
|
## 📣 Join The Community |
|
|
|
|
|
We’re building a friendly space to chat, share outputs, and get help. |
|
|
|
|
|
- Motion LoRAs coming soon |
|
|
- Tips, updates, and support from other users |
|
|
|
|
|
👉 [Join the Discord](https://discord.com/invite/hxPmmXmRW3) |
|
|
|
|
|
--- |
|
|
|
|
|
## ⚖️ License |
|
|
|
|
|
Some merged components use permissive licenses (Apache 2.0 / MIT), |
|
|
**but others** — such as those from research models like *CausVid* — may be released under **non-commercial licenses** (e.g., [CC BY-NC-SA 4.0](https://creativecommons.org/licenses/by-nc-sa/4.0/)). |
|
|
|
|
|
- ✅ You **can** use, modify, and redistribute **under original license terms** |
|
|
- ❗ You **must** retain and respect the license of each component |
|
|
- ⚠️ **Commercial use is not permitted** for models or components under non-commercial licenses |
|
|
- 📌 Outputs are **not automatically licensed** — do your own due diligence |
|
|
|
|
|
This model is intended for **research, education, and personal use only**. |
|
|
For commercial use or monetization, please consult a legal advisor and verify all component licenses. |
|
|
|
|
|
--- |
|
|
|
|
|
## 🙏 Credits |
|
|
|
|
|
- WAN Team (base model) |
|
|
- aejion (AccVideo) |
|
|
- Tianwei Yin (CausVid) |
|
|
- ZuluVision (MoviiGen) |
|
|
- Alibaba PAI (MPS LoRA) |
|
|
- Kijai (ComfyUI Wrapper) |
|
|
|
|
|
And thanks to the open-source community! |
|
|
|
|
|
--- |