Visual Generation Models
Collection
23 items • Updated • 1
How to use BiliSakura/CAFM-diffusers with Diffusers:
pip install -U diffusers transformers accelerate
import torch
from diffusers import DiffusionPipeline
# switch to "mps" for apple devices
pipe = DiffusionPipeline.from_pretrained("BiliSakura/CAFM-diffusers", dtype=torch.bfloat16, device_map="cuda")
prompt = "Astronaut in a jungle, cold color palette, muted colors, detailed, 8k"
image = pipe(prompt).images[0]Self-contained Continuous Adversarial Flow Models checkpoints for Hugging Face diffusers.
Converted from ByteDance-Seed/Adversarial-Flow-Models using libs/AFM-diffusers/scripts/convert_cafm_to_diffusers.py.
Z-Image weights are bundled self-contained under CAFM-Z-Image-T2I/.
CAFM-JiT-H-16-256 — class 207 (golden retriever), seed 0, 100 NFE (Heun):
Each variant folder includes demo.png generated with the same prompt settings.
| Model | Space | NFE | FID | Checkpoint |
|---|---|---|---|---|
| CAFM JiT-H/16 | pixel | 100 | 1.80 | CAFM-JiT-H-16-256/ |
| CAFM SiT-XL/2 | latent | 250 | 1.53 | CAFM-SiT-XL-2-256/ |
| CAFM Z-Image | latent T2I | 25 | — | CAFM-Z-Image-T2I/ |
| Variant | Backbone | Steps | Solver |
|---|---|---|---|
CAFM-JiT-H-16-256/ |
JIT | 100 | heun |
CAFM-SiT-XL-2-256/ |
SIT | 250 | heun |
CAFM-Z-Image-T2I/ |
Z-IMAGE | 25 | euler |
from pathlib import Path
import torch
from diffusers import DiffusionPipeline
model_dir = Path("./CAFM-SiT-XL-2-256")
pipe = DiffusionPipeline.from_pretrained(
str(model_dir),
local_files_only=True,
custom_pipeline=str(model_dir / "pipeline.py"),
trust_remote_code=True,
torch_dtype=torch.bfloat16,
).to("cuda")
image = pipe(class_labels="golden retriever", num_inference_steps=250, sampler="heun").images[0]
from pathlib import Path
import torch
from diffusers import DiffusionPipeline
model_dir = Path("./CAFM-Z-Image-T2I")
pipe = DiffusionPipeline.from_pretrained(
str(model_dir),
local_files_only=True,
custom_pipeline=str(model_dir / "pipeline.py"),
trust_remote_code=True,
torch_dtype=torch.bfloat16,
)
pipe.enable_model_cpu_offload() # recommended for single-GPU inference
image = pipe(
prompt="A golden retriever sitting in a sunny park, photo realistic.",
height=512,
width=512,
num_inference_steps=25,
sampler="euler",
).images[0]